Skip to content

API Gateway & Management

API Gateway sits between the Network Hub (WAF / CloudFront / Cloud Armor) and Workload Accounts (EKS / GKE / Cloud Run). The central infra team manages the gateway infrastructure, rate-limiting policies, authentication, and security. Tenant teams register their APIs and receive per-tenant throttling, keys, and usage plans.

API gateway architecture overview


An API gateway is the single entry point for all client requests. It decouples clients from backend microservices, handling cross-cutting concerns centrally rather than in each service.

ResponsibilityWhat It DoesWhy It Matters
RoutingMaps URL paths to backend servicesClients use one domain, services evolve independently
AuthenticationValidates JWT / OAuth2 tokens, API keysCentralized auth instead of per-service implementation
Rate LimitingThrottles requests per client, per API, per planProtects backends from abuse and noisy neighbors
Request TransformModifies headers, body, query paramsAdapts client format to backend format
Response CachingCaches responses at the gatewayReduces backend load for read-heavy APIs
ObservabilityAccess logs, latency metrics, error ratesSingle point of visibility for all API traffic
VersioningRoutes to different backend versionsAllows API evolution without breaking clients

For an enterprise bank, APIs are products. The central infra team enforces:

  1. Design-first — OpenAPI spec before any code
  2. Consistent naming/v1/accounts/{id}/transactions not /getTransactionsByAccountId
  3. Standard error format — RFC 7807 Problem Details across all APIs
  4. Pagination — cursor-based for large datasets, not offset-based
  5. IdempotencyIdempotency-Key header for payment APIs (critical for banking)
  6. Correlation IDsX-Request-ID propagated through the entire chain

OAuth2 JWT authentication flow through API gateway

Consumer TypeAuth MethodImplementation
Mobile / SPAOAuth2 + PKCEAuthorization code flow, short-lived tokens, refresh tokens
Third-party partnersOAuth2 Client Credentials + API KeyUsage plan for rate limiting, API key for identification
Internal servicesMutual TLS or IAM AuthService mesh (mTLS) or cloud-native IAM
B2B integrationsOAuth2 Client CredentialsScoped to specific API operations

API gateway rate limiting strategy layers

Most API gateways use token bucket for rate limiting:

  • Bucket capacity = burst limit (e.g., 500 requests)
  • Refill rate = steady-state limit (e.g., 100 requests/second)
  • Each request consumes one token
  • When bucket is empty, requests get HTTP 429 Too Many Requests
  • Bucket refills at the steady rate

AWS API Gateway throttling is three-tiered:

LevelDefaultConfigurable
Account level10,000 req/sec across all APIsYes, via support ticket
Stage levelInherits accountYes, per stage
Method levelInherits stageYes, per method + resource

Usage Plans + API Keys for per-tenant throttling:

AWS API Gateway usage plans for partner throttling

Apigee rate limiting uses policies:

PolicyPurposeScope
SpikeArrestSmooths traffic burstsPrevents sudden spikes (e.g., 10pm = ~1 every 100ms)
QuotaEnforces usage limits over timePer-developer app, per API product (e.g., 10K/day)
ConcurrentRateLimitLimits concurrent connectionsProtects slow backends

Apigee API Products for per-tenant management:

Apigee API products for partner management

Developers register apps, subscribe to API Products, and receive client credentials. Apigee tracks usage per app against the product quotas.


StrategyExampleProsCons
URI path/v1/accounts, /v2/accountsSimple, explicit, cacheableURL changes, pollutes resource path
Query param/accounts?version=2Easy to addEasy to forget, not RESTful
HeaderAccept: application/vnd.bank.v2+jsonClean URLsHidden, harder to test in browser
Content negotiationAccept: application/json; version=2Standard HTTPComplex to implement

API version lifecycle management


Pattern 1: External API Gateway (North-South Traffic)

Section titled “Pattern 1: External API Gateway (North-South Traffic)”

For traffic entering from the internet to backend services.

External API gateway pattern for north-south traffic

Pattern 2: Internal API Gateway (East-West Traffic)

Section titled “Pattern 2: Internal API Gateway (East-West Traffic)”

For service-to-service communication within the enterprise.

Internal gateway patterns: service mesh vs API gateway

CriterionService MeshInternal API Gateway
EnvironmentKubernetes (EKS/GKE)Mixed (ECS, Lambda, VMs, cross-account)
LatencySub-millisecond (in-pod sidecar)5-15ms (extra network hop)
AuthmTLS automaticIAM auth or JWT
ObservabilityBuilt-in (Envoy metrics)CloudWatch / Cloud Logging
Rate limitingPer-service policiesUsage plans
ComplexityHigh (Istio control plane)Low (managed service)

FeatureREST APIHTTP APIWebSocket API
ProtocolRESTHTTPWebSocket
AuthIAM, Lambda, CognitoJWT, IAM, LambdaIAM, Lambda
Usage PlansYesNoNo
API KeysYesNoNo
Request ValidationYesNoNo
Request TransformVTL templatesParameter mappingRoute selection
CachingYes (per stage)NoNo
WAFYesNoNo
Price$3.50/million$1.00/million$1.00/million + connection
Latency~29ms overhead~10ms overheadPersistent connection

Connect API Gateway to resources in private subnets:

Apigee API proxy architecture

Apigee is Google’s enterprise API management platform with analytics, monetization, and developer portal.

FeatureApigee X (Standard)Apigee X (Enterprise)Apigee Hybrid
HostingGoogle-managedGoogle-managedRuntime in your GKE
Environments24+Unlimited
SLA99.9%99.99%Your infra SLA
Use caseMid-sizeEnterpriseData residency / hybrid
NetworkingPeering to your VPCPeering + PSCRuns in your VPC

The central infra team operates a shared API gateway platform that multiple tenant teams consume.

External partner API architecture

Key design decisions:

  1. Single API Gateway with path-based routing to backend ALBs — not one gateway per service. Simplifies partner integration (one base URL, one API key).

  2. REST API (not HTTP API) because we need usage plans, API keys, and WAF integration for partner-facing APIs.

  3. Usage Plans by partner tier — Gold (1000 req/s), Silver (100 req/s), Bronze (10 req/s). Each partner gets an API key mapped to their usage plan.

  4. Lambda Authorizer that validates the partner’s OAuth2 token AND checks the API key, returning an IAM policy that scopes access to only their permitted API paths.

  5. Custom domain (api.bank.com) with ACM certificate, Route 53 alias record.

  6. Versioning via URI path (/v1/, /v2/). When v2 launches, v1 continues working with a Sunset header. Partners get 6 months to migrate.

On GCP, I would use Apigee X instead — it provides the usage plans, developer portal, and analytics out of the box. Partners self-register through the developer portal.


Q: “How do you implement per-tenant rate limiting so one partner can’t impact another?”

A: This is the noisy neighbor problem applied to APIs.

AWS approach — Usage Plans:

Per-tenant rate limiting with usage plans

Each usage plan is an independent throttle bucket. Partner A hitting their 1,000 req/s limit does NOT affect Partner B’s 1,000 req/s allocation.

Beyond API Gateway throttling, I would add:

  • WAF rate rules as an outer layer (per-IP, catches distributed attacks)
  • Backend circuit breaker (Istio or app-level) to protect databases
  • DynamoDB/Redis counter for custom business logic limits (e.g., max 100 payment initiations per partner per hour)

GCP approach — Apigee:

Use Quota policies per API Product. Each developer app subscribes to a product and gets its own quota counter. Apigee tracks this automatically and returns 429 with a Retry-After header when exceeded.


Q: “Internal APIs: should you use an API Gateway or service mesh for service-to-service communication?”

A: It depends on the environment.

Use service mesh (Istio) when:

  • All services run in Kubernetes (EKS/GKE)
  • You need mutual TLS without application changes
  • You want circuit breaking, retries, and timeouts as infrastructure
  • Latency sensitivity is high (sidecar adds less than 1ms, gateway adds 5-15ms)

Use internal API Gateway when:

  • Services span multiple compute platforms (EKS + ECS + Lambda)
  • You need cross-account API access via VPC Link
  • You want usage tracking and throttling between internal teams
  • You need request/response transformation (different internal formats)

Our enterprise recommendation:

  • Within a K8s cluster: service mesh (Istio). No API Gateway needed.
  • Between K8s clusters: service mesh with multi-cluster Istio or internal API Gateway.
  • K8s to non-K8s (Lambda, ECS): internal API Gateway with IAM auth via VPC Link.
  • Cross-account: internal API Gateway is the cleanest option — no VPC peering needed for the API path, just VPC Link.

Q: “Design API authentication for a mobile banking app, third-party partners, and internal services.”

A: Three distinct auth flows for three consumer types:

API authentication design for different consumer types

Lambda Authorizer handles the complexity — it inspects the request and determines which auth flow to apply:

  1. Authorization: Bearer <JWT> — validate JWT signature, claims, scopes
  2. x-api-key header present — look up usage plan, validate client credentials
  3. AWS SigV4 signed request — IAM auth for internal services

Q: “How do you handle API versioning without breaking existing clients? You have 50 partner integrations.”

A: With 50 partners, breaking changes are expensive — each partner has their own development cycle.

Strategy: URI path versioning with parallel deployment

API versioning timeline for partner migration

Implementation:

  1. Parallel deployment — v1 and v2 run simultaneously as separate target groups behind the same API Gateway
  2. Sunset headerSunset: Sat, 15 Mar 2027 00:00:00 GMT on all v1 responses starting 6 months before removal
  3. Deprecation noticeDeprecation: true header plus Link header pointing to migration guide
  4. Usage tracking — monitor which partners still use v1 via API key analytics
  5. Partner communication — automated email when a partner’s v1 usage exceeds threshold after deprecation
  6. Graceful sunset — v1 returns 410 Gone with a response body containing the v2 equivalent endpoint

API Gateway routing:

# v1 routes to old backend
resource "aws_apigatewayv2_route" "accounts_v1" {
api_id = aws_apigatewayv2_api.main.id
route_key = "GET /v1/accounts/{id}"
target = "integrations/${aws_apigatewayv2_integration.accounts_v1.id}"
}
# v2 routes to new backend
resource "aws_apigatewayv2_route" "accounts_v2" {
api_id = aws_apigatewayv2_api.main.id
route_key = "GET /v2/accounts/{id}"
target = "integrations/${aws_apigatewayv2_integration.accounts_v2.id}"
}

Q: “Your API is being hammered by a single client — 10x their normal traffic. Backend latency is spiking for all clients. How do you respond?”

A: This is an incident with clear containment, mitigation, and prevention phases.

Immediate (0-5 minutes):

  1. Identify the client — check API Gateway access logs for the API key or source IP generating the spike
  2. Throttle the specific client — if they are on a usage plan, reduce their rate limit immediately. If not, add a WAF rate-based rule for their IP
  3. Verify it is not an attack — is the traffic from a known partner (misconfigured retry loop) or an unknown source (DDoS/scraping)?

Containment (5-30 minutes):

If known partner (API key identified):
→ Reduce usage plan rate limit temporarily
→ Contact partner to fix their client
→ Check for retry storms (are they retrying 5xx responses in a tight loop?)
If unknown source:
→ WAF rate-based rule: block IP if > 2000 req/5min
→ If distributed IPs: enable AWS Shield Advanced / Cloud Armor adaptive protection
→ Geo-blocking if traffic from unexpected region

Backend protection (parallel):

  1. Enable API Gateway response caching for read endpoints — serve cached responses instead of hitting the backend
  2. Circuit breaker — if using Istio, configure outlier detection to eject unhealthy backends
  3. Scale backends — trigger HPA if pods are at capacity, but this treats the symptom not the cause

Prevention (post-incident):

  1. Mandatory usage plans for all API consumers — no unthrottled access
  2. WAF rate-based rules as a secondary throttle layer (catches traffic before it hits API Gateway)
  3. Alerting on per-client request rate anomalies (CloudWatch alarm on usage plan throttle count)
  4. Retry guidance in API docs — exponential backoff with jitter, not immediate retry
  5. Circuit breaker at client side — require partners to implement client-side circuit breakers (include in API contract)

The key insight is defense in depth: WAF rate rules (L7 edge) + API Gateway usage plan throttling (per-client) + backend circuit breaker (per-service). No single layer handles everything.


Quick Reference: API Gateway Decision Matrix

Section titled “Quick Reference: API Gateway Decision Matrix”

API gateway decision matrix