Skip to content

Compute Services

Enterprise compute options across AWS and GCP

As the central infra team, you define which compute platforms are approved for each workload type. Most enterprise workloads run on Kubernetes (EKS/GKE), but some use cases are better served by managed containers (ECS Fargate/Cloud Run) or serverless functions (Lambda/Cloud Functions).


Compute decision framework for workload types


ECS Fargate architecture

  • Task Definition: Blueprint — container images, CPU/memory, ports, env vars, IAM task role
  • Task: Running instance of a task definition (like a pod in K8s)
  • Service: Maintains desired count, integrates with ALB, handles rolling updates
  • Fargate: Serverless compute — no EC2 instances to manage
  • Service Connect: Built-in service mesh (no Istio needed)

Cloud Run service architecture with revisions

FeatureCloud Run ServiceCloud Run Job
TriggerHTTP requestsManual, scheduled, or event
Scaling0 to N instancesParallel task execution
DurationRequest timeout (up to 60 min)Up to 24 hours
Use caseAPIs, web appsBatch processing, migrations
BillingPer request + CPU timePer task execution time
# ECS Cluster with Fargate
resource "aws_ecs_cluster" "main" {
name = "web-platform"
setting {
name = "containerInsights"
value = "enabled"
}
configuration {
execute_command_configuration {
logging = "OVERRIDE"
log_configuration {
cloud_watch_log_group_name = aws_cloudwatch_log_group.ecs.name
}
}
}
}
# Task Definition
resource "aws_ecs_task_definition" "api" {
family = "web-api"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "1024" # 1 vCPU
memory = "2048" # 2 GB
execution_role_arn = aws_iam_role.ecs_execution.arn
task_role_arn = aws_iam_role.ecs_task.arn
container_definitions = jsonencode([
{
name = "api"
image = "${aws_ecr_repository.api.repository_url}:latest"
essential = true
portMappings = [{
containerPort = 8080
protocol = "tcp"
}]
environment = [
{ name = "DB_HOST", value = aws_rds_cluster.main.endpoint }
]
secrets = [
{
name = "DB_PASSWORD"
valueFrom = aws_secretsmanager_secret.db_password.arn
}
]
logConfiguration = {
logDriver = "awslogs"
options = {
"awslogs-group" = aws_cloudwatch_log_group.api.name
"awslogs-region" = var.region
"awslogs-stream-prefix" = "api"
}
}
healthCheck = {
command = ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"]
interval = 30
timeout = 5
retries = 3
startPeriod = 60
}
}
])
}
# ECS Service
resource "aws_ecs_service" "api" {
name = "web-api"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.api.arn
desired_count = 3
launch_type = "FARGATE"
network_configuration {
subnets = var.private_subnet_ids
security_groups = [aws_security_group.ecs_tasks.id]
assign_public_ip = false
}
load_balancer {
target_group_arn = aws_lb_target_group.api.arn
container_name = "api"
container_port = 8080
}
deployment_circuit_breaker {
enable = true
rollback = true
}
deployment_configuration {
maximum_percent = 200
minimum_healthy_percent = 100
}
}
# Auto Scaling
resource "aws_appautoscaling_target" "api" {
max_capacity = 20
min_capacity = 3
resource_id = "service/${aws_ecs_cluster.main.name}/${aws_ecs_service.api.name}"
scalable_dimension = "ecs:service:DesiredCount"
service_namespace = "ecs"
}
resource "aws_appautoscaling_policy" "api_cpu" {
name = "api-cpu-scaling"
policy_type = "TargetTrackingScaling"
resource_id = aws_appautoscaling_target.api.resource_id
scalable_dimension = aws_appautoscaling_target.api.scalable_dimension
service_namespace = aws_appautoscaling_target.api.service_namespace
target_tracking_scaling_policy_configuration {
predefined_metric_specification {
predefined_metric_type = "ECSServiceAverageCPUUtilization"
}
target_value = 70.0
scale_in_cooldown = 300
scale_out_cooldown = 60
}
}

  • Runtime: Up to 15 minutes execution
  • Memory: 128 MB to 10,240 MB (CPU scales proportionally)
  • Concurrency: 1,000 default (can request increase)
  • Cold starts: 100ms-1s (provisioned concurrency eliminates this)
  • Triggers: API Gateway, S3, SQS, EventBridge, DynamoDB Streams, Kinesis

Lambda function architecture with event sources and destinations

GCP: Cloud Functions Key Characteristics (2nd Gen)

Section titled “GCP: Cloud Functions Key Characteristics (2nd Gen)”
  • Runtime: Up to 9 minutes (60 min for HTTP-triggered)
  • Memory: 128 MB to 32 GB
  • Concurrency: Up to 1,000 concurrent requests per instance (unlike Lambda’s 1:1)
  • Cold starts: Similar to Lambda, mitigate with min instances
  • Triggers: HTTP, Pub/Sub, Cloud Storage, Firestore, Eventarc
  • Built on Cloud Run: 2nd gen Cloud Functions are Cloud Run services under the hood
# Lambda function with SQS trigger
resource "aws_lambda_function" "processor" {
function_name = "order-processor"
role = aws_iam_role.lambda_exec.arn
handler = "index.handler"
runtime = "nodejs20.x"
timeout = 60
memory_size = 512
filename = data.archive_file.lambda.output_path
source_code_hash = data.archive_file.lambda.output_base64sha256
environment {
variables = {
TABLE_NAME = aws_dynamodb_table.orders.name
STAGE = var.environment
}
}
vpc_config {
subnet_ids = var.private_subnet_ids
security_group_ids = [aws_security_group.lambda.id]
}
tracing_config {
mode = "Active" # X-Ray tracing
}
dead_letter_config {
target_arn = aws_sqs_queue.dlq.arn
}
reserved_concurrent_executions = 100
tags = local.common_tags
}
# SQS Event Source Mapping
resource "aws_lambda_event_source_mapping" "sqs" {
event_source_arn = aws_sqs_queue.orders.arn
function_name = aws_lambda_function.processor.arn
batch_size = 10
maximum_batching_window_in_seconds = 5
function_response_types = ["ReportBatchItemFailures"]
scaling_config {
maximum_concurrency = 50
}
}

FeatureECS FargateCloud Run
Scale to zeroNo (min 1 task)Yes
Max instancesService-level limits100 per service (adjustable)
CPU/MemoryUp to 16 vCPU / 120 GBUp to 8 vCPU / 32 GB
Sidecar containersYes (multi-container tasks)Yes (2nd gen)
GPUNoYes (A100, L4)
Service meshService ConnectBuilt-in with Istio on GKE
Traffic splittingVia ALB weighted target groupsNative revision-based
VPC integrationNative (awsvpc mode)VPC connector or Direct VPC
Pricing modelPer-second (vCPU + memory)Per-request + per-second
Startup probeHealth check in task defNative startup probe


StrategyHow It WorksBest ForLatency
Target TrackingSet target metric (e.g., CPU 70%), ASG adjusts automatically80% of use cases, simplest1-3 min
Step ScalingDefine steps: CPU > 70% add 2, > 90% add 5Fine-grained control1-3 min
Predictive ScalingML forecasts traffic patterns, pre-scales before demandPeriodic workloads (daily/weekly cycles)Pre-emptive
Scheduled ScalingTime-based: scale up at 9 AM, down at 6 PMKnown events, business hoursImmediate

Target Tracking is the default starting point. You set a target metric value (e.g., average CPU utilization at 70%), and the ASG continuously adjusts capacity to maintain that target. It automatically creates and manages the CloudWatch alarms for you. Scale-out happens when the metric is sustained above the target; scale-in happens when it drops below. Always configure cooldown periods (scale-out cooldown: 60s for fast response, scale-in cooldown: 300s to prevent thrashing) to avoid the ASG oscillating between adding and removing instances every minute.

Step Scaling gives you fine-grained control over scaling actions at different alarm thresholds. You create CloudWatch alarms at multiple breakpoints — for example, CPU > 70% add 2 instances, CPU > 80% add 3 instances, CPU > 90% add 5 instances. This is useful when you need aggressive scaling at high utilization but gentle scaling at moderate utilization. Each step can define a different adjustment type (exact count, percentage change, or fixed increment).

Predictive Scaling uses 14 days of historical CloudWatch data to forecast future traffic patterns using machine learning. It pre-provisions capacity before demand arrives, eliminating the 1-3 minute reactive lag of target tracking. This is ideal for workloads with predictable daily or weekly patterns — for example, a retail application that sees traffic spikes every day at noon or a banking app with high volume on the first of each month. Predictive scaling works alongside target tracking — predictive handles the expected load, target tracking handles unexpected spikes.

Scheduled Scaling is for known events. You define time-based actions: scale to 50 instances at 8:55 AM before business hours, scale down to 10 at 6:05 PM. Use this for Black Friday preparation (scale up 2 hours before the event), marketing campaign launches, or weekly batch processing windows. Combine with target tracking so the ASG can scale beyond the scheduled count if demand exceeds expectations.

Mixed Instances Policy allows a single ASG to use multiple instance types and purchase options (on-demand + spot). Configure a capacity-optimized allocation strategy for spot instances to minimize interruptions. Set an on-demand base capacity (e.g., 10 instances) for guaranteed availability, then use spot for burst capacity above the base. This can reduce compute costs by 60-70% for fault-tolerant workloads.

GCP: Managed Instance Group (MIG) Autoscaler

Section titled “GCP: Managed Instance Group (MIG) Autoscaler”

GCP autoscaling is configured on Managed Instance Groups (MIGs). The autoscaler supports multiple signal types that can be combined:

  • CPU utilization target: Set a target (e.g., 70%) and MIG adjusts instance count to maintain it. Functionally equivalent to AWS target tracking.
  • Load balancer utilization: Scale based on backend service utilization as reported by the HTTP(S) load balancer. Useful when CPU does not accurately reflect load (e.g., I/O-bound applications).
  • Custom metrics: Use any Cloud Monitoring metric as a scaling signal — Pub/Sub subscription backlog (queue depth), custom application metrics exported via OpenTelemetry, or external metrics. This is the equivalent of custom CloudWatch metrics in AWS.
  • Predictive autoscaling (preview): Similar to AWS predictive scaling, uses historical data to forecast demand and pre-provision capacity.
  • Scale-in controls: Define a stabilization window (e.g., do not scale below the maximum of the last 60 minutes) to prevent aggressive scale-down during brief traffic dips. Also set max_scaled_in_replicas to limit how many instances can be removed in a single scale-down event.
# ASG with Target Tracking + Mixed Instances (On-Demand Base + Spot Scaling)
resource "aws_autoscaling_group" "web" {
name = "web-asg"
vpc_zone_identifier = var.private_subnet_ids
min_size = 6
max_size = 100
desired_capacity = 10
mixed_instances_policy {
instances_distribution {
on_demand_base_capacity = 6 # 6 on-demand always
on_demand_percentage_above_base_capacity = 0 # Everything above = spot
spot_allocation_strategy = "capacity-optimized"
spot_max_price = "" # Use on-demand price cap
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.web.id
version = "$Latest"
}
override {
instance_type = "m6g.xlarge" # Graviton (primary)
weighted_capacity = "1"
}
override {
instance_type = "m7g.xlarge" # Graviton gen 3
weighted_capacity = "1"
}
override {
instance_type = "m6i.xlarge" # Intel fallback
weighted_capacity = "1"
}
override {
instance_type = "c6g.xlarge" # Compute-optimized Graviton
weighted_capacity = "1"
}
}
}
# Health check
health_check_type = "ELB"
health_check_grace_period = 120
# Instance refresh for rolling deployments
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 75
instance_warmup = 120
}
}
tag {
key = "Name"
value = "web-asg"
propagate_at_launch = true
}
}
# Target Tracking — CPU at 70%
resource "aws_autoscaling_policy" "cpu_target" {
name = "cpu-target-tracking"
autoscaling_group_name = aws_autoscaling_group.web.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ASGAverageCPUUtilization"
}
target_value = 70.0
disable_scale_in = false
}
}
# Target Tracking — ALB Request Count per Target
resource "aws_autoscaling_policy" "request_count" {
name = "request-count-tracking"
autoscaling_group_name = aws_autoscaling_group.web.name
policy_type = "TargetTrackingScaling"
target_tracking_configuration {
predefined_metric_specification {
predefined_metric_type = "ALBRequestCountPerTarget"
resource_label = "${aws_lb.web.arn_suffix}/${aws_lb_target_group.web.arn_suffix}"
}
target_value = 1000 # 1000 requests per target per minute
}
}
# Predictive Scaling
resource "aws_autoscaling_policy" "predictive" {
name = "predictive-scaling"
autoscaling_group_name = aws_autoscaling_group.web.name
policy_type = "PredictiveScaling"
predictive_scaling_configuration {
mode = "ForecastAndScale"
scheduling_buffer_time = 300 # Pre-scale 5 min before predicted demand
max_capacity_breach_behavior = "HonorMaxCapacity"
metric_specification {
target_value = 70.0
predefined_scaling_metric_specification {
predefined_metric_type = "ASGAverageCPUUtilization"
resource_label = ""
}
predefined_load_metric_specification {
predefined_metric_type = "ASGTotalCPUUtilization"
resource_label = ""
}
}
}
}
# Scheduled Scaling — Black Friday prep
resource "aws_autoscaling_schedule" "black_friday_scale_up" {
scheduled_action_name = "black-friday-scale-up"
autoscaling_group_name = aws_autoscaling_group.web.name
min_size = 50
max_size = 200
desired_capacity = 80
recurrence = "0 7 25 11 *" # Nov 25 at 7 AM UTC
}
resource "aws_autoscaling_schedule" "black_friday_scale_down" {
scheduled_action_name = "black-friday-scale-down"
autoscaling_group_name = aws_autoscaling_group.web.name
min_size = 6
max_size = 100
desired_capacity = 10
recurrence = "0 6 27 11 *" # Nov 27 at 6 AM UTC
}

Understanding instance families is critical for right-sizing recommendations and cost optimization conversations in interviews.

FamilyAWSGCPUse Case
General purposem6g, m7g (Graviton)e2, n2Web servers, app servers, small databases
Compute optimizedc6g, c7g (Graviton)c2, c2dBatch processing, ML inference, gaming servers
Memory optimizedr6g, r7g, x2gdm2, m3In-memory databases (Redis, SAP HANA), caching, analytics
GPU / Acceleratorp4d, p5 (A100/H100)a2, a3 (A100/H100)ML training, HPC, video processing, rendering
Storage optimizedi3, d3, i4i— (use local SSD on n2/c3)HDFS, Cassandra, Elasticsearch, data-intensive workloads

“How do you handle Black Friday traffic? Your app normally handles 1K RPS but expects 50K RPS for 4 hours.”

Strong Answer:

“I would use a layered scaling strategy that combines proactive and reactive mechanisms:

1. Predictive Scaling (weeks before): Enable AWS predictive scaling, which analyzes the previous year’s Black Friday traffic pattern (if available) to pre-provision capacity. For the first year, use scheduled scaling as a substitute.

2. Scheduled Scaling (2 hours before): Create a scheduled action to scale the ASG to 80% of expected peak capacity 2 hours before the sale starts. This gives instances time to warm up, register with the ALB health check, and populate caches. Set min_size = 40 (assuming each instance handles ~1,250 RPS at 70% CPU).

3. Target Tracking (during event): Keep target tracking active to handle unexpected spikes beyond the scheduled capacity. If actual traffic exceeds 50K RPS, the ASG automatically adds more instances within minutes.

4. Mixed Instances Policy: Use on-demand instances for the base capacity (guarantees availability) and spot instances for burst capacity above the base. With a capacity-optimized spot allocation strategy and 4+ instance type overrides, spot interruption risk is minimal for a 4-hour window.

5. Pre-warm the ALB: For massive scale-up, contact AWS support to pre-warm the Application Load Balancer. ALBs scale gradually and might not handle a sudden jump from 1K to 50K RPS without pre-warming. On GCP, Cloud Load Balancing pre-warms automatically.

6. Cooldown configuration: Set scale-in cooldown to 600 seconds (10 minutes) to prevent premature scale-down during brief traffic dips mid-event. Set scale-out cooldown to 30 seconds for fast response to spikes.

7. Downstream protection: Auto-scaling the web tier is not enough. Ensure the database (Aurora Serverless v2 with sufficient ACU headroom), caching layer (ElastiCache with cluster mode), and any downstream services can also handle 50x load. Rate limit APIs at the API Gateway level as a safety valve.”


Scenario 1: Choosing Between EKS, ECS Fargate, and Lambda

Section titled “Scenario 1: Choosing Between EKS, ECS Fargate, and Lambda”

“A team wants to deploy a new REST API. They currently have no Kubernetes experience. What compute platform do you recommend?”

Strong Answer:

“It depends on the workload characteristics:

If the API has consistent traffic (>100 RPS sustained): I’d recommend ECS Fargate. It gives them containers without Kubernetes complexity. They define a task definition (container image, CPU/memory, IAM role), create a service with an ALB, and they’re done. Auto-scaling is straightforward with target tracking on CPU/request count.

If the API has bursty traffic or is internal/low-traffic: Cloud Run (GCP) or Lambda + API Gateway (AWS). Cloud Run scales to zero, so they pay nothing during idle periods. Cloud Run also handles containers, so they can migrate to GKE later if needed.

If they’re joining our platform (most likely in an enterprise): They should deploy to our existing EKS/GKE clusters. The platform team provides namespace provisioning, CI/CD templates, observability, and service mesh. The team just writes a Dockerfile and a Kubernetes manifest — we provide the golden path. This is the most cost-effective at scale because we share cluster overhead across all tenants.”


“Your Lambda-based API has P99 latency spikes of 3 seconds due to cold starts. How do you fix this?”

Strong Answer:

“Three approaches, in order of impact:

  1. Provisioned Concurrency: Keep N Lambda instances warm at all times. Set to your baseline traffic level (e.g., 50 concurrent). Eliminates cold starts for those instances. Cost: you pay for provisioned capacity even when idle. Use Application Auto Scaling to adjust provisioned concurrency by schedule (high during business hours, low at night).

  2. Optimize the function: Reduce package size (tree-shake dependencies, use layers for shared code). Use a compiled language (Go, Rust) instead of Java/Python — Go cold starts are 50-100ms vs Java’s 1-3s. Move initialization outside the handler (database connections, SDK clients).

  3. Architecture change: If cold starts are unacceptable (financial APIs), move to ECS Fargate or Cloud Run with min_instance_count = 1. You lose scale-to-zero but guarantee consistent latency. For GCP, Cloud Run with min instances is the easiest migration — same container, zero cold starts.”


“Design a system to process 1 million images nightly — resize, apply watermark, and store results.”

Strong Answer:

“I’d use a fan-out pattern:

On AWS:

  1. S3 event notification → SQS queue (batch source images daily)
  2. Lambda reads from SQS (batch size 10, concurrency 500)
  3. Lambda resizes/watermarks using Sharp library
  4. Results stored to output S3 bucket
  5. DLQ captures failures for retry
  6. CloudWatch dashboard tracks: processed count, error rate, queue depth

On GCP:

  1. Cloud Storage notification → Pub/Sub topic
  2. Cloud Run Job with parallelism: 100, task_count: 10000
  3. Each task processes a batch of 100 images
  4. Results stored to output GCS bucket
  5. Dead-letter topic for failures

Why not Step Functions / Workflows? For simple image processing, the SQS/Lambda or Pub/Sub/Cloud Run pattern is simpler and cheaper. Step Functions add value when you need complex orchestration (branching, human approval, error handling per step).

Cost estimate (AWS): 1M Lambda invocations at 512MB, 5s each = ~$4.20. S3 storage for 1M images = ~$23/month. Total: under $30/month.”


Scenario 4: Migration from ECS to Kubernetes

Section titled “Scenario 4: Migration from ECS to Kubernetes”

“We have 15 services on ECS Fargate. Leadership wants to move to Kubernetes for standardization. How do you approach this?”

Strong Answer:

“This is a common migration. Here’s my phased approach:

Phase 1: Platform readiness (2-4 weeks)

  • Stand up EKS/GKE cluster in the workload account using our platform Terraform modules
  • Deploy shared infrastructure: ingress controller, cert-manager, external-dns, external-secrets-operator
  • Set up namespaces, RBAC, and resource quotas for each team
  • Configure CI/CD pipeline templates (GitHub Actions → ArgoCD)

Phase 2: Translate ECS constructs to K8s (1 week per service)

ECS ConceptKubernetes Equivalent
Task DefinitionPod spec in Deployment
ECS ServiceDeployment + Service + HPA
ALB target groupIngress / Gateway API
Task IAM roleIRSA / Workload Identity
Service ConnectIstio / Linkerd
CloudWatch LogsLoki + Grafana Alloy
Parameter Store secretsExternal Secrets Operator

Phase 3: Migrate service by service

  • Start with lowest-risk internal service
  • Run in parallel (ECS + K8s) with weighted DNS routing
  • Validate metrics match (latency, error rate, throughput)
  • Cut over DNS, decommission ECS service
  • Repeat for all 15 services

Key risk: Teams lose ECS simplicity. Mitigate with golden path templates — a single Helm chart that takes image, port, replicas, and env vars, abstracting K8s complexity.”