Redis to Valkey: A Production Migration Guide for Amazon MemoryDB

Migrating a production data store is often compared to changing a plane's engine mid-flight. When Redis Inc. announced its licensing changes, many teams began looking toward Valkey, the Linux Foundation’s open-source, community-governed fork.

This guide documents a real-world migration from a standalone Redis 7.4 instance to a clustered Valkey 7.2 environment on Amazon MemoryDB. It covers the architectural shifts, queue refactoring, and a zero-loss data migration strategy.

1. The Why: Moving to Managed Valkey

Transitioning to Valkey on Amazon MemoryDB isn't just about staying open-source; it’s an infrastructure upgrade. By moving to this managed service, you gain:

High Durability: Multi-AZ transactional logging ensures data isn't just in-memory, but persisted across zones.
Native Scalability: Built-in Redis Cluster support allows for horizontal scaling as load grows.
Security by Default: Mandatory TLS and IAM-integrated access control.

Note: While Redis 7.4 and Valkey 7.2 are highly feature-compatible, the move from a standalone instance to a Cluster is the most significant change for your application code.

2. The Architectural Constraint: Redis Cluster Mode

Amazon MemoryDB operates exclusively in Redis Cluster mode. This introduces a fundamental rule: The Cross-Slot Constraint.

In a standalone setup, you can run operations on any set of keys. In a Cluster, keys are distributed across many different hash slots. Multi-key operations (like MGET or Lua scripts) will fail with a CROSSSLOT error unless all keys involved share the same hash slot.

The Casualty: RSMQ

From my own experience working with a Node.js server, the package RSMQ (Redis Simple Message Queue) was easy and simple enough to handle connections to a single Redis instance. However, RSMQ relies on multi-key patterns that aren't natively cluster-aware. To solve this, moving to a more established package like BullMQ made a ton of sense.

Why BullMQ?

Cluster Native: It uses ioredis under the hood, which handles hash slots seamlessly.
Robustness: Includes built-in support for parent/child jobs, retries, and concurrency control.
Maintenance: It is actively maintained with modern TypeScript definitions.

3. The Migration Strategy: The "Strangler" Pattern

To avoid a "big bang" migration that risked losing messages, we used a Strangler Pattern to swap the queue logic behind a RedisMessageService abstraction.

Introduce an Abstraction: Wrap all queue operations in a common interface.
The Dual-Write Phase: Update the service to write new jobs to BullMQ while legacy workers continued to drain the old RSMQ queues.
The Reconciliation Job: Run a background utility to scan RSMQ for "stuck" or unacknowledged messages and re-enqueue them into BullMQ.
The Clean Break: Once the RSMQ queues hit zero, decommission the old logic.

4. Moving the Data: RDB Snapshots via S3

MemoryDB handles data ingestion via RDB snapshots stored in S3. Here is the workflow we used to ensure integrity:

Step A: Export & Local Validation

We used rdb-cli to pull a snapshot from the source Redis 7.4 instance.

Pro-Tip: Before moving to AWS, load the snapshot into a local Valkey 7.2 Docker container. This allows you to verify that your data structures behave as expected in the new version before you even touch the cloud.

Step B: The S3 Handoff

Upload your verified .rdb file to a private S3 bucket. When provisioning your MemoryDB cluster, you can select the "Import data from S3" option and point to this file.

5. Critical "Gotchas" for Developers

TLS is Mandatory (`rediss://`)

MemoryDB requires encryption in transit. If your connection string starts with redis:// instead of rediss:// (note the double 's'), your application will hang. This often manifests as vague "Connection Timeout" or "502 Bad Gateway" errors in your application logs.

Connection Management

In Cluster mode, clients maintain connections to every shard. If you have many microservices, this can lead to a "connection storm."

The Fix: If using the ioredis client, enable minimizeConnections: true to keep the overhead low and prevent overwhelming the cluster during deployments.

6. The Production Runbook

When it’s time for the final cutover, follow this checklist to ensure a clean transition:

Freeze Writes: Put your application into a maintenance state. A static dataset is required for a clean RDB snapshot.
Generate & Upload: Create the final RDB from your source and move it to S3.
Provision MemoryDB: Launch the cluster using the S3 import. Wait for the status to reach available.
Validate: Use a bastion host with valkey-cli to SCAN for key prefixes and check HLEN on critical hashes to ensure the data is there.
Update Environment Variables: Point your services to the new MemoryDB Cluster endpoint.
Flip the Switch: Restart your application services, enable BullMQ workers, and monitor your error rates.

Final Thoughts

This migration is more than a simple version bump; it’s a move toward a more resilient, distributed architecture. By abstracting our queue logic and using a staged RDB import, we moved our entire production stack from Redis to Valkey with zero data loss and minimal downtime.

Ready to migrate? Start by auditing your code for multi-key operations, that’s where 90% of your cluster challenges will live.