Migration Strategy — 6Rs & Cloud Adoption
Where This Fits
Section titled “Where This Fits”Migration is the process of onboarding workloads into the enterprise landing zone. Applications move from on-prem data centers into workload accounts/projects, consuming networking from the Hub, deploying onto EKS/GKE clusters, and using centralized security and observability.
Cloud Adoption Framework
Section titled “Cloud Adoption Framework”AWS Cloud Adoption Framework
Section titled “AWS Cloud Adoption Framework”AWS Cloud Adoption Framework organizes migration planning across six perspectives:
Migration Phases (AWS):
GCP Cloud Adoption Framework
Section titled “GCP Cloud Adoption Framework”GCP Cloud Adoption Framework focuses on four themes with maturity levels:
Migration Tools (GCP):
| Tool | Purpose |
|---|---|
| Migrate to Virtual Machines | VM migration from on-prem/AWS to Compute Engine |
| Migrate to Containers | Convert VMs directly to containers for GKE |
| Database Migration Service | MySQL, PostgreSQL, SQL Server, Oracle to Cloud SQL/AlloyDB |
| Transfer Service | Large-scale data transfer (S3 to GCS, on-prem to GCS) |
| BigQuery Data Transfer | Automated data ingestion into BigQuery |
The 6Rs of Migration
Section titled “The 6Rs of Migration”Every application in the portfolio must be categorized into one of six strategies:
| Strategy | What It Means | Example | Effort | Risk |
|---|---|---|---|---|
| Rehost | Move as-is to cloud VMs | Java app on VMware → EC2/GCE | Low | Low |
| Replatform | Minor optimizations during move | Oracle → Aurora, IIS → containers | Medium | Medium |
| Refactor | Re-architect for cloud-native | Monolith → microservices on EKS | High | High |
| Repurchase | Replace with SaaS | Custom CRM → Salesforce | Medium | Medium |
| Retire | Decommission (not needed) | Legacy reporting no one uses | Low | Low |
| Retain | Keep on-prem (for now) | Mainframe, regulatory constraints | None | None |
Wave Planning
Section titled “Wave Planning”Applications are grouped into migration waves. Each wave contains 5-15 applications with shared dependencies, migrated together.
Dependency Mapping
Section titled “Dependency Mapping”Migration Factory Pattern
Section titled “Migration Factory Pattern”For large-scale migrations (100+ apps), establish a repeatable, assembly-line process:
Database Migration
Section titled “Database Migration”AWS Database Migration Service
Section titled “AWS Database Migration Service”GCP Database Migration Service
Section titled “GCP Database Migration Service”resource "aws_dms_replication_instance" "migration" { replication_instance_id = "oracle-to-aurora-migration" replication_instance_class = "dms.r5.4xlarge" # Size for 10TB allocated_storage = 500 # Storage for logs and cache multi_az = true # HA for production migrations
vpc_security_group_ids = [aws_security_group.dms.id] replication_subnet_group_id = aws_dms_replication_subnet_group.main.id
publicly_accessible = false}
resource "aws_dms_endpoint" "source_oracle" { endpoint_id = "source-oracle" endpoint_type = "source" engine_name = "oracle" server_name = var.oracle_host port = 1521 username = var.oracle_user password = var.oracle_password database_name = "PRODDB"
extra_connection_attributes = "useLogminerReader=N;useBfile=Y"}
resource "aws_dms_endpoint" "target_aurora" { endpoint_id = "target-aurora-pg" endpoint_type = "target" engine_name = "aurora-postgresql" server_name = aws_rds_cluster.aurora.endpoint port = 5432 username = var.aurora_admin_user password = var.aurora_admin_password database_name = "production"}
resource "aws_dms_replication_task" "oracle_to_aurora" { replication_task_id = "oracle-to-aurora-full-cdc" replication_instance_arn = aws_dms_replication_instance.migration.replication_instance_arn source_endpoint_arn = aws_dms_endpoint.source_oracle.endpoint_arn target_endpoint_arn = aws_dms_endpoint.target_aurora.endpoint_arn migration_type = "full-load-and-cdc" # Full load then ongoing replication
table_mappings = jsonencode({ rules = [ { rule-type = "selection" rule-id = "1" rule-name = "include-all-tables" object-locator = { schema-name = "APP_SCHEMA" table-name = "%" } rule-action = "include" } ] })
replication_task_settings = jsonencode({ TargetMetadata = { TargetSchema = "public" SupportLobs = true FullLobMode = false LobChunkSize = 64 LimitedSizeLobMode = true LobMaxSize = 32768 } Logging = { EnableLogging = true } ControlTablesSettings = { historyTimeslotInMinutes = 5 } })}resource "google_database_migration_service_connection_profile" "source" { connection_profile_id = "source-postgres" location = "me-central1" display_name = "On-Prem PostgreSQL"
postgresql { host = var.onprem_pg_host port = 5432 username = var.pg_user password = var.pg_password
ssl { type = "SERVER_ONLY" }
connectivity { static_ip_connectivity {} # Or private connectivity via VPN/Interconnect } }}
resource "google_database_migration_service_connection_profile" "target" { connection_profile_id = "target-cloudsql" location = "me-central1" display_name = "Cloud SQL Target"
cloudsql { settings { tier = "db-custom-16-61440" # 16 vCPU, 60GB RAM edition = "ENTERPRISE" database_version = "POSTGRES_14" storage_auto_resize = true
ip_config { enable_ipv4 = false private_network = var.vpc_self_link }
data_disk_type = "PD_SSD"
database_flags { name = "max_connections" value = "500" } } }}
resource "google_database_migration_service_migration_job" "pg_migration" { migration_job_id = "postgres-migration" location = "me-central1" display_name = "PostgreSQL to Cloud SQL" type = "CONTINUOUS" # Full dump + CDC
source = google_database_migration_service_connection_profile.source.name destination = google_database_migration_service_connection_profile.target.name destination_database { provider = "CLOUDSQL" }}Hybrid State Management
Section titled “Hybrid State Management”During migration, you operate in a hybrid state where some applications are on-prem and some are in the cloud. This is the hardest phase to manage.
What the Central Infra Team Must Have Ready Before Migration Starts
Section titled “What the Central Infra Team Must Have Ready Before Migration Starts”Landing Zone Readiness Checklist:
Networking: [x] Direct Connect / Cloud Interconnect established (primary + backup) [x] Transit Gateway / Shared VPC configured [x] DNS forwarding between on-prem and cloud [x] VPN as failover for Direct Connect [x] Network Firewall / Cloud NGFW inspecting traffic [x] CIDR ranges allocated (no overlap with on-prem)
Security: [x] GuardDuty / SCC enabled org-wide [x] Security Hub / SCC aggregating findings [x] SCPs / org policies applied [x] KMS keys created for data encryption [x] IAM Identity Center connected to corporate AD [x] Break-glass roles in every account
Compute: [x] EKS / GKE clusters provisioned in workload accounts [x] Node groups sized for initial wave [x] Container registry (ECR / Artifact Registry) ready [x] ArgoCD deployed for GitOps
Observability: [x] Prometheus + Grafana in Shared Services [x] Log aggregation (Loki / CloudWatch) configured [x] Alerting rules for core infrastructure [x] On-call rotation established
Data: [x] DMS replication instances provisioned [x] Target databases created (Aurora, Cloud SQL) [x] Backup and DR strategy documented and tested [x] Data validation scripts preparedInterview Scenarios
Section titled “Interview Scenarios”Scenario 1: “Your Bank Has 200 Applications On-Prem. Design the Migration Strategy and Timeline”
Section titled “Scenario 1: “Your Bank Has 200 Applications On-Prem. Design the Migration Strategy and Timeline””Strong Answer:
“I would follow a three-phase approach over 9-12 months:
Phase 1 — Assess and Mobilize (Month 1-3): Use AWS Migration Hub and Application Discovery Service to inventory all 200 apps. Map dependencies — which apps call which APIs, which share databases. Score each app on business value (revenue impact, user count) and technical complexity (coupling, custom hardware, licensing). Categorize by 6R: I expect roughly 120 rehost, 50 replatform, 15 refactor, 10 repurchase, 5 retire. Build the landing zone in parallel — Control Tower, Transit Gateway, Direct Connect, security baseline.
Phase 2 — Execute Migration Waves (Month 4-9): 5-6 waves, each taking 2-3 weeks:
- Wave 1: Internal tools, dev environments (prove the process)
- Wave 2: Databases (DMS full load + CDC for continuous sync)
- Wave 3-4: Core applications (rehost Java/.NET apps to EC2 or containers)
- Wave 5: Complex integrations (ERP connectors, batch processing)
- Wave 6: Final stragglers and cutover
Each wave follows the migration factory: discover, design, build, migrate, validate, cutover. We run 2-3 waves in parallel with different teams.
Phase 3 — Optimize and Modernize (Month 10-12): Right-size instances, purchase Reserved Instances. Begin modernization of rehosted apps to containers. Decommission on-prem data center. Complete DR testing.
Staffing: 2 cloud architects for overall design, 8-10 engineers for execution (split into 2 wave teams), 1 PM per wave, security and networking specialists embedded.
Key risk: Database migration. Oracle to Aurora is the highest-risk migration. I would run DMS with CDC for 2-4 weeks before cutover, validate data integrity with checksums, and plan a rollback window.”
Scenario 2: “How Do You Decide Between Rehost and Refactor for a Legacy Java Monolith?”
Section titled “Scenario 2: “How Do You Decide Between Rehost and Refactor for a Legacy Java Monolith?””Strong Answer:
“The decision depends on four factors:
1. Business urgency: If the data center lease expires in 6 months, rehost. There is no time to refactor. Get to cloud first, modernize later.
2. Application lifespan: If the app is being replaced in 12-18 months (vendor switch, new platform), rehost — do not invest engineering effort in refactoring something that will be retired.
3. Technical debt and scale challenges: If the monolith is struggling with performance at current scale and the business expects 10x growth, refactoring to microservices on Kubernetes makes sense. But only if you have the engineering team and time budget.
4. Cost of change: Refactoring a 500KLOC Java monolith into microservices is a 6-12 month project requiring deep domain knowledge. Rehosting to an EC2 instance takes 2-3 weeks.
My recommendation for most cases: Rehost to EC2 first (Week 1-3), then replatform to containers (Month 2-3), then refactor to microservices incrementally (Month 4-12). This is the strangler fig pattern — extract one service at a time while the monolith continues to run.
Strangler Fig Pattern:
Month 1: [========== Monolith (on EC2) ==========]
Month 3: [=== User Svc (EKS) ===][=== Monolith ===]
Month 6: [User][Order][Product][=== Monolith ===]
Month 12: [User][Order][Product][Payment][Inventory] (monolith fully decomposed)Never rewrite from scratch. Extract services one at a time, route traffic to the new service, and slowly shrink the monolith.”
Scenario 3: “Design the Database Migration Strategy for a 5TB Oracle DB Moving to Aurora PostgreSQL”
Section titled “Scenario 3: “Design the Database Migration Strategy for a 5TB Oracle DB Moving to Aurora PostgreSQL””Strong Answer:
“Oracle to Aurora PostgreSQL is a heterogeneous migration — different engine, different SQL dialect. This requires schema conversion plus data migration.
Step 1 — Schema Conversion (Week 1-2): Use AWS Schema Conversion Tool (SCT) to convert Oracle PL/SQL to PostgreSQL. SCT flags incompatible constructs: Oracle-specific functions, sequences, materialized views, synonyms. Manual intervention needed for ~20-30% of stored procedures. Create migration playbook documenting each conversion.
Step 2 — Data Migration Setup (Week 2-3):
Deploy DMS replication instance (r5.4xlarge for 5TB). Configure source endpoint (Oracle via Direct Connect) and target endpoint (Aurora PostgreSQL). Start with full-load-and-cdc migration type — DMS does a full table-by-table copy, then switches to change data capture (CDC) using Oracle LogMiner.
Step 3 — Parallel Running (Week 3-6): DMS runs continuously, keeping Aurora in sync with Oracle. Application still points to Oracle. During this phase:
- Validate data: compare row counts, checksums, sample queries between Oracle and Aurora
- Performance test: run application test suite against Aurora
- Fix data type mapping issues (Oracle NUMBER → PostgreSQL numeric precision)
- Test stored procedure conversions
Step 4 — Cutover (Week 6, planned maintenance window):
- Stop writes to Oracle (application maintenance mode)
- Wait for DMS CDC lag to reach zero
- Verify final data consistency
- Update application connection strings to Aurora endpoint
- Start application, validate core workflows
- Rollback plan: revert connection strings to Oracle (DMS keeps Oracle updated via reverse CDC if configured)
Step 5 — Post-Migration (Week 7-8):
- Monitor Aurora performance (CloudWatch, Performance Insights)
- Optimize: adjust
work_mem,shared_buffers, connection pooling (PgBouncer) - Enable Aurora read replicas for reporting queries
- Decommission Oracle (after 2-week bake period)
Key risks:
- LOB columns (CLOB/BLOB) — DMS handles these but slowly; may need increased
LobMaxSize - Oracle-specific SQL in application code — needs application-level changes
- Sequence gaps — Oracle sequences may not map 1:1 to PostgreSQL sequences”
Scenario 4: “What Does the Central Infra Team Need to Have Ready Before App Teams Can Migrate?”
Section titled “Scenario 4: “What Does the Central Infra Team Need to Have Ready Before App Teams Can Migrate?””Strong Answer:
“The landing zone must be fully operational before any application team starts migration. Here is the readiness checklist:
Networking (critical path): Direct Connect with redundant connections (primary + backup VPN). Transit Gateway with route tables for workload account VPCs. DNS forwarding so cloud apps can resolve on-prem hostnames and vice versa. Network Firewall in the inspection VPC for all egress traffic.
Identity: IAM Identity Center connected to corporate Active Directory via SAML. Permission sets defined for developer, admin, and read-only access. Break-glass roles tested. Service accounts for CI/CD (OIDC federation for GitHub Actions).
Compute: EKS/GKE clusters provisioned with node groups sized for initial waves. ArgoCD deployed for GitOps. ECR/Artifact Registry with vulnerability scanning enabled. Namespace vending automation ready (team requests namespace, Terraform creates it with RBAC, quotas, and ESO).
Security: GuardDuty/SCC enabled across all accounts. Security Hub aggregating findings. SCPs applied (deny public S3, restrict regions, require encryption). KMS keys for data encryption.
Observability: Prometheus + Grafana for metrics. Loki or CloudWatch for logs. Alerting rules for infrastructure (node health, pod restarts, disk usage). PagerDuty integration for on-call.
Data: DMS replication instances pre-provisioned. Target databases created. Backup policies configured. Data validation tooling ready.
If any of these are missing, the migration stalls. I have seen migrations fail because Direct Connect was not ready, and teams tried to migrate over VPN — latency was unacceptable.”
Scenario 5: “How Do You Handle the Hybrid State Where Some Apps Are On-Prem and Some Are in Cloud?”
Section titled “Scenario 5: “How Do You Handle the Hybrid State Where Some Apps Are On-Prem and Some Are in Cloud?””Strong Answer:
“The hybrid state is the most complex phase of any migration. Four critical areas:
Networking: Direct Connect is the backbone. On-prem apps calling cloud APIs and vice versa must have sub-10ms latency. DNS is split — Route 53/Cloud DNS for cloud-hosted services, on-prem DNS for remaining services. We use conditional forwarders so each side can resolve the other. Traffic between environments goes through the Network Hub inspection VPC.
Identity: Single identity source (Active Directory) federated to both on-prem LDAP and cloud IAM (Identity Center / BeyondCorp). Users log in once (SSO), access both environments. Service accounts for cross-environment API calls use short-lived tokens (OAuth2/OIDC).
Data consistency: DMS runs CDC continuously for databases that are shared between on-prem and cloud apps. The source of truth remains on-prem until the final cutover. Application teams must agree on which system is the writer — dual-write is dangerous and should be avoided.
Monitoring: A single pane of glass. We forward on-prem metrics (Prometheus remote write or Telegraf) to the cloud-based Grafana instance. Dashboards show both environments side by side. Alert rules cover both. The goal is to make the hybrid state visible and measurable — how much traffic is still on-prem vs cloud?”
Scenario 6: “A Critical App Migration Failed and You Need to Rollback. What Is Your Plan?”
Section titled “Scenario 6: “A Critical App Migration Failed and You Need to Rollback. What Is Your Plan?””Strong Answer:
“Rollback planning must be defined BEFORE the migration, not during the incident. Here is the structured rollback:
Immediate (within minutes):
- Revert DNS — point the application’s DNS record back to the on-prem IP. If using Route 53 weighted routing, shift 100% back to on-prem. TTL should have been lowered to 60 seconds before cutover.
- Revert load balancer — if using F5 or ALB, switch the backend pool back to on-prem servers.
- The application is now serving from on-prem again. Users see a brief blip (DNS TTL propagation).
Data rollback (within hours): 4. If DMS CDC was running in both directions (forward and reverse replication), the on-prem database has been kept in sync with cloud writes. No data loss. 5. If DMS was one-directional, any data written to the cloud database during the failed migration needs to be replayed to on-prem. This is why we keep the cutover window short and have transaction logs.
Post-mortem (within 24 hours): 6. Root cause analysis — was it a performance issue? Network latency? Missing dependency? Application bug in the new environment? 7. Fix the issue in a staging environment 8. Re-test the migration end-to-end 9. Schedule the next attempt with the fix applied
Key lesson: Never migrate without a tested rollback plan. During the migration rehearsal (which we run 1-2 times before the real cutover), we also rehearse the rollback. If the rollback does not work in rehearsal, we do not proceed with the real migration.”
References
Section titled “References”- AWS Cloud Adoption Framework (CAF) — six-perspective framework for planning cloud migrations
- AWS Migration Hub — centralized tracking for application migrations
- AWS Database Migration Service (DMS) — heterogeneous and homogeneous database migration
- AWS Schema Conversion Tool — convert database schemas between engines
- AWS Prescriptive Guidance: Migration Strategy — detailed migration patterns and best practices
- Google Cloud Adoption Framework — four-theme framework with maturity model
- Migration Center — discover, assess, and plan cloud migrations
- Database Migration Service — managed migration for MySQL, PostgreSQL, SQL Server, and Oracle
- Migrate to Virtual Machines — VM migration from on-prem or other clouds
- Migrate to Containers — convert VMs to containers for GKE
Tools & Frameworks
Section titled “Tools & Frameworks”- 6Rs of Cloud Migration (AWS Blog) — Stephen Orban’s original 6Rs blog post
- Strangler Fig Pattern (Martin Fowler) — incremental migration pattern for monoliths
- Terraform AWS DMS Resources — Terraform provider docs for DMS