Skip to content

Migration Strategy — 6Rs & Cloud Adoption

Migration is the process of onboarding workloads into the enterprise landing zone. Applications move from on-prem data centers into workload accounts/projects, consuming networking from the Hub, deploying onto EKS/GKE clusters, and using centralized security and observability.

Migration overview from on-premises to cloud landing zone


AWS Cloud Adoption Framework organizes migration planning across six perspectives:

AWS Cloud Adoption Framework 6 perspectives

Migration Phases (AWS):

AWS migration phases: Assess, Mobilize, Migrate and Modernize

GCP Cloud Adoption Framework focuses on four themes with maturity levels:

GCP Adoption Framework 4 themes

Migration Tools (GCP):

ToolPurpose
Migrate to Virtual MachinesVM migration from on-prem/AWS to Compute Engine
Migrate to ContainersConvert VMs directly to containers for GKE
Database Migration ServiceMySQL, PostgreSQL, SQL Server, Oracle to Cloud SQL/AlloyDB
Transfer ServiceLarge-scale data transfer (S3 to GCS, on-prem to GCS)
BigQuery Data TransferAutomated data ingestion into BigQuery

Every application in the portfolio must be categorized into one of six strategies:

6Rs migration decision framework

StrategyWhat It MeansExampleEffortRisk
RehostMove as-is to cloud VMsJava app on VMware → EC2/GCELowLow
ReplatformMinor optimizations during moveOracle → Aurora, IIS → containersMediumMedium
RefactorRe-architect for cloud-nativeMonolith → microservices on EKSHighHigh
RepurchaseReplace with SaaSCustom CRM → SalesforceMediumMedium
RetireDecommission (not needed)Legacy reporting no one usesLowLow
RetainKeep on-prem (for now)Mainframe, regulatory constraintsNoneNone

Applications are grouped into migration waves. Each wave contains 5-15 applications with shared dependencies, migrated together.

Wave planning timeline for migration

Application dependency mapping for wave planning


For large-scale migrations (100+ apps), establish a repeatable, assembly-line process:

Migration factory assembly line pattern


AWS DMS replication architecture

GCP Database Migration Service architecture

resource "aws_dms_replication_instance" "migration" {
replication_instance_id = "oracle-to-aurora-migration"
replication_instance_class = "dms.r5.4xlarge" # Size for 10TB
allocated_storage = 500 # Storage for logs and cache
multi_az = true # HA for production migrations
vpc_security_group_ids = [aws_security_group.dms.id]
replication_subnet_group_id = aws_dms_replication_subnet_group.main.id
publicly_accessible = false
}
resource "aws_dms_endpoint" "source_oracle" {
endpoint_id = "source-oracle"
endpoint_type = "source"
engine_name = "oracle"
server_name = var.oracle_host
port = 1521
username = var.oracle_user
password = var.oracle_password
database_name = "PRODDB"
extra_connection_attributes = "useLogminerReader=N;useBfile=Y"
}
resource "aws_dms_endpoint" "target_aurora" {
endpoint_id = "target-aurora-pg"
endpoint_type = "target"
engine_name = "aurora-postgresql"
server_name = aws_rds_cluster.aurora.endpoint
port = 5432
username = var.aurora_admin_user
password = var.aurora_admin_password
database_name = "production"
}
resource "aws_dms_replication_task" "oracle_to_aurora" {
replication_task_id = "oracle-to-aurora-full-cdc"
replication_instance_arn = aws_dms_replication_instance.migration.replication_instance_arn
source_endpoint_arn = aws_dms_endpoint.source_oracle.endpoint_arn
target_endpoint_arn = aws_dms_endpoint.target_aurora.endpoint_arn
migration_type = "full-load-and-cdc" # Full load then ongoing replication
table_mappings = jsonencode({
rules = [
{
rule-type = "selection"
rule-id = "1"
rule-name = "include-all-tables"
object-locator = {
schema-name = "APP_SCHEMA"
table-name = "%"
}
rule-action = "include"
}
]
})
replication_task_settings = jsonencode({
TargetMetadata = {
TargetSchema = "public"
SupportLobs = true
FullLobMode = false
LobChunkSize = 64
LimitedSizeLobMode = true
LobMaxSize = 32768
}
Logging = {
EnableLogging = true
}
ControlTablesSettings = {
historyTimeslotInMinutes = 5
}
})
}

During migration, you operate in a hybrid state where some applications are on-prem and some are in the cloud. This is the hardest phase to manage.

Hybrid architecture during migration

What the Central Infra Team Must Have Ready Before Migration Starts

Section titled “What the Central Infra Team Must Have Ready Before Migration Starts”
Landing Zone Readiness Checklist:
Networking:
[x] Direct Connect / Cloud Interconnect established (primary + backup)
[x] Transit Gateway / Shared VPC configured
[x] DNS forwarding between on-prem and cloud
[x] VPN as failover for Direct Connect
[x] Network Firewall / Cloud NGFW inspecting traffic
[x] CIDR ranges allocated (no overlap with on-prem)
Security:
[x] GuardDuty / SCC enabled org-wide
[x] Security Hub / SCC aggregating findings
[x] SCPs / org policies applied
[x] KMS keys created for data encryption
[x] IAM Identity Center connected to corporate AD
[x] Break-glass roles in every account
Compute:
[x] EKS / GKE clusters provisioned in workload accounts
[x] Node groups sized for initial wave
[x] Container registry (ECR / Artifact Registry) ready
[x] ArgoCD deployed for GitOps
Observability:
[x] Prometheus + Grafana in Shared Services
[x] Log aggregation (Loki / CloudWatch) configured
[x] Alerting rules for core infrastructure
[x] On-call rotation established
Data:
[x] DMS replication instances provisioned
[x] Target databases created (Aurora, Cloud SQL)
[x] Backup and DR strategy documented and tested
[x] Data validation scripts prepared

Scenario 1: “Your Bank Has 200 Applications On-Prem. Design the Migration Strategy and Timeline”

Section titled “Scenario 1: “Your Bank Has 200 Applications On-Prem. Design the Migration Strategy and Timeline””

Strong Answer:

“I would follow a three-phase approach over 9-12 months:

Phase 1 — Assess and Mobilize (Month 1-3): Use AWS Migration Hub and Application Discovery Service to inventory all 200 apps. Map dependencies — which apps call which APIs, which share databases. Score each app on business value (revenue impact, user count) and technical complexity (coupling, custom hardware, licensing). Categorize by 6R: I expect roughly 120 rehost, 50 replatform, 15 refactor, 10 repurchase, 5 retire. Build the landing zone in parallel — Control Tower, Transit Gateway, Direct Connect, security baseline.

Phase 2 — Execute Migration Waves (Month 4-9): 5-6 waves, each taking 2-3 weeks:

  • Wave 1: Internal tools, dev environments (prove the process)
  • Wave 2: Databases (DMS full load + CDC for continuous sync)
  • Wave 3-4: Core applications (rehost Java/.NET apps to EC2 or containers)
  • Wave 5: Complex integrations (ERP connectors, batch processing)
  • Wave 6: Final stragglers and cutover

Each wave follows the migration factory: discover, design, build, migrate, validate, cutover. We run 2-3 waves in parallel with different teams.

Phase 3 — Optimize and Modernize (Month 10-12): Right-size instances, purchase Reserved Instances. Begin modernization of rehosted apps to containers. Decommission on-prem data center. Complete DR testing.

Staffing: 2 cloud architects for overall design, 8-10 engineers for execution (split into 2 wave teams), 1 PM per wave, security and networking specialists embedded.

Key risk: Database migration. Oracle to Aurora is the highest-risk migration. I would run DMS with CDC for 2-4 weeks before cutover, validate data integrity with checksums, and plan a rollback window.”


Scenario 2: “How Do You Decide Between Rehost and Refactor for a Legacy Java Monolith?”

Section titled “Scenario 2: “How Do You Decide Between Rehost and Refactor for a Legacy Java Monolith?””

Strong Answer:

“The decision depends on four factors:

1. Business urgency: If the data center lease expires in 6 months, rehost. There is no time to refactor. Get to cloud first, modernize later.

2. Application lifespan: If the app is being replaced in 12-18 months (vendor switch, new platform), rehost — do not invest engineering effort in refactoring something that will be retired.

3. Technical debt and scale challenges: If the monolith is struggling with performance at current scale and the business expects 10x growth, refactoring to microservices on Kubernetes makes sense. But only if you have the engineering team and time budget.

4. Cost of change: Refactoring a 500KLOC Java monolith into microservices is a 6-12 month project requiring deep domain knowledge. Rehosting to an EC2 instance takes 2-3 weeks.

My recommendation for most cases: Rehost to EC2 first (Week 1-3), then replatform to containers (Month 2-3), then refactor to microservices incrementally (Month 4-12). This is the strangler fig pattern — extract one service at a time while the monolith continues to run.

Strangler Fig Pattern:
Month 1: [========== Monolith (on EC2) ==========]
Month 3: [=== User Svc (EKS) ===][=== Monolith ===]
Month 6: [User][Order][Product][=== Monolith ===]
Month 12: [User][Order][Product][Payment][Inventory]
(monolith fully decomposed)

Never rewrite from scratch. Extract services one at a time, route traffic to the new service, and slowly shrink the monolith.”


Scenario 3: “Design the Database Migration Strategy for a 5TB Oracle DB Moving to Aurora PostgreSQL”

Section titled “Scenario 3: “Design the Database Migration Strategy for a 5TB Oracle DB Moving to Aurora PostgreSQL””

Strong Answer:

“Oracle to Aurora PostgreSQL is a heterogeneous migration — different engine, different SQL dialect. This requires schema conversion plus data migration.

Step 1 — Schema Conversion (Week 1-2): Use AWS Schema Conversion Tool (SCT) to convert Oracle PL/SQL to PostgreSQL. SCT flags incompatible constructs: Oracle-specific functions, sequences, materialized views, synonyms. Manual intervention needed for ~20-30% of stored procedures. Create migration playbook documenting each conversion.

Step 2 — Data Migration Setup (Week 2-3): Deploy DMS replication instance (r5.4xlarge for 5TB). Configure source endpoint (Oracle via Direct Connect) and target endpoint (Aurora PostgreSQL). Start with full-load-and-cdc migration type — DMS does a full table-by-table copy, then switches to change data capture (CDC) using Oracle LogMiner.

Step 3 — Parallel Running (Week 3-6): DMS runs continuously, keeping Aurora in sync with Oracle. Application still points to Oracle. During this phase:

  • Validate data: compare row counts, checksums, sample queries between Oracle and Aurora
  • Performance test: run application test suite against Aurora
  • Fix data type mapping issues (Oracle NUMBER → PostgreSQL numeric precision)
  • Test stored procedure conversions

Step 4 — Cutover (Week 6, planned maintenance window):

  • Stop writes to Oracle (application maintenance mode)
  • Wait for DMS CDC lag to reach zero
  • Verify final data consistency
  • Update application connection strings to Aurora endpoint
  • Start application, validate core workflows
  • Rollback plan: revert connection strings to Oracle (DMS keeps Oracle updated via reverse CDC if configured)

Step 5 — Post-Migration (Week 7-8):

  • Monitor Aurora performance (CloudWatch, Performance Insights)
  • Optimize: adjust work_mem, shared_buffers, connection pooling (PgBouncer)
  • Enable Aurora read replicas for reporting queries
  • Decommission Oracle (after 2-week bake period)

Key risks:

  • LOB columns (CLOB/BLOB) — DMS handles these but slowly; may need increased LobMaxSize
  • Oracle-specific SQL in application code — needs application-level changes
  • Sequence gaps — Oracle sequences may not map 1:1 to PostgreSQL sequences”

Scenario 4: “What Does the Central Infra Team Need to Have Ready Before App Teams Can Migrate?”

Section titled “Scenario 4: “What Does the Central Infra Team Need to Have Ready Before App Teams Can Migrate?””

Strong Answer:

“The landing zone must be fully operational before any application team starts migration. Here is the readiness checklist:

Networking (critical path): Direct Connect with redundant connections (primary + backup VPN). Transit Gateway with route tables for workload account VPCs. DNS forwarding so cloud apps can resolve on-prem hostnames and vice versa. Network Firewall in the inspection VPC for all egress traffic.

Identity: IAM Identity Center connected to corporate Active Directory via SAML. Permission sets defined for developer, admin, and read-only access. Break-glass roles tested. Service accounts for CI/CD (OIDC federation for GitHub Actions).

Compute: EKS/GKE clusters provisioned with node groups sized for initial waves. ArgoCD deployed for GitOps. ECR/Artifact Registry with vulnerability scanning enabled. Namespace vending automation ready (team requests namespace, Terraform creates it with RBAC, quotas, and ESO).

Security: GuardDuty/SCC enabled across all accounts. Security Hub aggregating findings. SCPs applied (deny public S3, restrict regions, require encryption). KMS keys for data encryption.

Observability: Prometheus + Grafana for metrics. Loki or CloudWatch for logs. Alerting rules for infrastructure (node health, pod restarts, disk usage). PagerDuty integration for on-call.

Data: DMS replication instances pre-provisioned. Target databases created. Backup policies configured. Data validation tooling ready.

If any of these are missing, the migration stalls. I have seen migrations fail because Direct Connect was not ready, and teams tried to migrate over VPN — latency was unacceptable.”


Scenario 5: “How Do You Handle the Hybrid State Where Some Apps Are On-Prem and Some Are in Cloud?”

Section titled “Scenario 5: “How Do You Handle the Hybrid State Where Some Apps Are On-Prem and Some Are in Cloud?””

Strong Answer:

“The hybrid state is the most complex phase of any migration. Four critical areas:

Networking: Direct Connect is the backbone. On-prem apps calling cloud APIs and vice versa must have sub-10ms latency. DNS is split — Route 53/Cloud DNS for cloud-hosted services, on-prem DNS for remaining services. We use conditional forwarders so each side can resolve the other. Traffic between environments goes through the Network Hub inspection VPC.

Identity: Single identity source (Active Directory) federated to both on-prem LDAP and cloud IAM (Identity Center / BeyondCorp). Users log in once (SSO), access both environments. Service accounts for cross-environment API calls use short-lived tokens (OAuth2/OIDC).

Data consistency: DMS runs CDC continuously for databases that are shared between on-prem and cloud apps. The source of truth remains on-prem until the final cutover. Application teams must agree on which system is the writer — dual-write is dangerous and should be avoided.

Monitoring: A single pane of glass. We forward on-prem metrics (Prometheus remote write or Telegraf) to the cloud-based Grafana instance. Dashboards show both environments side by side. Alert rules cover both. The goal is to make the hybrid state visible and measurable — how much traffic is still on-prem vs cloud?”


Scenario 6: “A Critical App Migration Failed and You Need to Rollback. What Is Your Plan?”

Section titled “Scenario 6: “A Critical App Migration Failed and You Need to Rollback. What Is Your Plan?””

Strong Answer:

“Rollback planning must be defined BEFORE the migration, not during the incident. Here is the structured rollback:

Immediate (within minutes):

  1. Revert DNS — point the application’s DNS record back to the on-prem IP. If using Route 53 weighted routing, shift 100% back to on-prem. TTL should have been lowered to 60 seconds before cutover.
  2. Revert load balancer — if using F5 or ALB, switch the backend pool back to on-prem servers.
  3. The application is now serving from on-prem again. Users see a brief blip (DNS TTL propagation).

Data rollback (within hours): 4. If DMS CDC was running in both directions (forward and reverse replication), the on-prem database has been kept in sync with cloud writes. No data loss. 5. If DMS was one-directional, any data written to the cloud database during the failed migration needs to be replayed to on-prem. This is why we keep the cutover window short and have transaction logs.

Post-mortem (within 24 hours): 6. Root cause analysis — was it a performance issue? Network latency? Missing dependency? Application bug in the new environment? 7. Fix the issue in a staging environment 8. Re-test the migration end-to-end 9. Schedule the next attempt with the fix applied

Key lesson: Never migrate without a tested rollback plan. During the migration rehearsal (which we run 1-2 times before the real cutover), we also rehearse the rollback. If the rollback does not work in rehearsal, we do not proceed with the real migration.”