API Gateway & Management

Where This Fits

API Gateway sits between the Network Hub (WAF / CloudFront / Cloud Armor) and Workload Accounts (EKS / GKE / Cloud Run). The central infra team manages the gateway infrastructure, rate-limiting policies, authentication, and security. Tenant teams register their APIs and receive per-tenant throttling, keys, and usage plans.

API gateway architecture overview

API Gateway Fundamentals

An API gateway is the single entry point for all client requests. It decouples clients from backend microservices, handling cross-cutting concerns centrally rather than in each service.

Core Responsibilities

Responsibility	What It Does	Why It Matters
Routing	Maps URL paths to backend services	Clients use one domain, services evolve independently
Authentication	Validates JWT / OAuth2 tokens, API keys	Centralized auth instead of per-service implementation
Rate Limiting	Throttles requests per client, per API, per plan	Protects backends from abuse and noisy neighbors
Request Transform	Modifies headers, body, query params	Adapts client format to backend format
Response Caching	Caches responses at the gateway	Reduces backend load for read-heavy APIs
Observability	Access logs, latency metrics, error rates	Single point of visibility for all API traffic
Versioning	Routes to different backend versions	Allows API evolution without breaking clients

API-First Design Principles

For an enterprise bank, APIs are products. The central infra team enforces:

Design-first — OpenAPI spec before any code
Consistent naming — /v1/accounts/{id}/transactions not /getTransactionsByAccountId
Standard error format — RFC 7807 Problem Details across all APIs
Pagination — cursor-based for large datasets, not offset-based
Idempotency — Idempotency-Key header for payment APIs (critical for banking)
Correlation IDs — X-Request-ID propagated through the entire chain

Authentication Patterns

OAuth2 / JWT Flow (External Clients)

OAuth2 JWT authentication flow through API gateway

Authentication by Consumer Type

Consumer Type	Auth Method	Implementation
Mobile / SPA	OAuth2 + PKCE	Authorization code flow, short-lived tokens, refresh tokens
Third-party partners	OAuth2 Client Credentials + API Key	Usage plan for rate limiting, API key for identification
Internal services	Mutual TLS or IAM Auth	Service mesh (mTLS) or cloud-native IAM
B2B integrations	OAuth2 Client Credentials	Scoped to specific API operations

Rate Limiting & Throttling

Strategy Layers

API gateway rate limiting strategy layers

Token Bucket Algorithm

Most API gateways use token bucket for rate limiting:

Bucket capacity = burst limit (e.g., 500 requests)
Refill rate = steady-state limit (e.g., 100 requests/second)
Each request consumes one token
When bucket is empty, requests get HTTP 429 Too Many Requests
Bucket refills at the steady rate

AWS API Gateway Throttling

AWS API Gateway throttling is three-tiered:

Level	Default	Configurable
Account level	10,000 req/sec across all APIs	Yes, via support ticket
Stage level	Inherits account	Yes, per stage
Method level	Inherits stage	Yes, per method + resource

Usage Plans + API Keys for per-tenant throttling:

AWS API Gateway usage plans for partner throttling

Apigee Rate Limiting Policies

Apigee rate limiting uses policies:

Policy	Purpose	Scope
SpikeArrest	Smooths traffic bursts	Prevents sudden spikes (e.g., 10pm = ~1 every 100ms)
Quota	Enforces usage limits over time	Per-developer app, per API product (e.g., 10K/day)
ConcurrentRateLimit	Limits concurrent connections	Protects slow backends

Apigee API Products for per-tenant management:

Apigee API products for partner management

Developers register apps, subscribe to API Products, and receive client credentials. Apigee tracks usage per app against the product quotas.

API Versioning Strategies

Strategy	Example	Pros	Cons
URI path	`/v1/accounts`, `/v2/accounts`	Simple, explicit, cacheable	URL changes, pollutes resource path
Query param	`/accounts?version=2`	Easy to add	Easy to forget, not RESTful
Header	`Accept: application/vnd.bank.v2+json`	Clean URLs	Hidden, harder to test in browser
Content negotiation	`Accept: application/json; version=2`	Standard HTTP	Complex to implement

Version Lifecycle Management

API version lifecycle management

Internal vs External Gateway Patterns

Pattern 1: External API Gateway (North-South Traffic)

For traffic entering from the internet to backend services.

External API gateway pattern for north-south traffic

Pattern 2: Internal API Gateway (East-West Traffic)

For service-to-service communication within the enterprise.

Internal gateway patterns: service mesh vs API gateway

When to Use Which

Criterion	Service Mesh	Internal API Gateway
Environment	Kubernetes (EKS/GKE)	Mixed (ECS, Lambda, VMs, cross-account)
Latency	Sub-millisecond (in-pod sidecar)	5-15ms (extra network hop)
Auth	mTLS automatic	IAM auth or JWT
Observability	Built-in (Envoy metrics)	CloudWatch / Cloud Logging
Rate limiting	Per-service policies	Usage plans
Complexity	High (Istio control plane)	Low (managed service)

API Gateway Services

AWS API Gateway Types

Feature	REST API	HTTP API	WebSocket API
Protocol	REST	HTTP	WebSocket
Auth	IAM, Lambda, Cognito	JWT, IAM, Lambda	IAM, Lambda
Usage Plans	Yes	No	No
API Keys	Yes	No	No
Request Validation	Yes	No	No
Request Transform	VTL templates	Parameter mapping	Route selection
Caching	Yes (per stage)	No	No
WAF	Yes	No	No
Price	$3.50/million	$1.00/million	$1.00/million + connection
Latency	~29ms overhead	~10ms overhead	Persistent connection

VPC Link (Private Integration)

Connect API Gateway to resources in private subnets:

Apigee API proxy architecture

GCP API Management — Apigee

Apigee is Google’s enterprise API management platform with analytics, monetization, and developer portal.

Apigee Editions

Feature	Apigee X (Standard)	Apigee X (Enterprise)	Apigee Hybrid
Hosting	Google-managed	Google-managed	Runtime in your GKE
Environments	2	4+	Unlimited
SLA	99.9%	99.99%	Your infra SLA
Use case	Mid-size	Enterprise	Data residency / hybrid
Networking	Peering to your VPC	Peering + PSC	Runs in your VPC

Multi-Tenant API Platform Architecture

The central infra team operates a shared API gateway platform that multiple tenant teams consume.

External partner API architecture

Key design decisions:

Single API Gateway with path-based routing to backend ALBs — not one gateway per service. Simplifies partner integration (one base URL, one API key).
REST API (not HTTP API) because we need usage plans, API keys, and WAF integration for partner-facing APIs.
Usage Plans by partner tier — Gold (1000 req/s), Silver (100 req/s), Bronze (10 req/s). Each partner gets an API key mapped to their usage plan.
Lambda Authorizer that validates the partner’s OAuth2 token AND checks the API key, returning an IAM policy that scopes access to only their permitted API paths.
Custom domain (api.bank.com) with ACM certificate, Route 53 alias record.
Versioning via URI path (/v1/, /v2/). When v2 launches, v1 continues working with a Sunset header. Partners get 6 months to migrate.

On GCP, I would use Apigee X instead — it provides the usage plans, developer portal, and analytics out of the box. Partners self-register through the developer portal.

Scenario 2: Per-Tenant Rate Limiting

Q: “How do you implement per-tenant rate limiting so one partner can’t impact another?”

A: This is the noisy neighbor problem applied to APIs.

AWS approach — Usage Plans:

Per-tenant rate limiting with usage plans

Each usage plan is an independent throttle bucket. Partner A hitting their 1,000 req/s limit does NOT affect Partner B’s 1,000 req/s allocation.

Beyond API Gateway throttling, I would add:

WAF rate rules as an outer layer (per-IP, catches distributed attacks)
Backend circuit breaker (Istio or app-level) to protect databases
DynamoDB/Redis counter for custom business logic limits (e.g., max 100 payment initiations per partner per hour)

GCP approach — Apigee:

Use Quota policies per API Product. Each developer app subscribes to a product and gets its own quota counter. Apigee tracks this automatically and returns 429 with a Retry-After header when exceeded.

Scenario 3: Internal vs Service Mesh

Q: “Internal APIs: should you use an API Gateway or service mesh for service-to-service communication?”

A: It depends on the environment.

Use service mesh (Istio) when:

All services run in Kubernetes (EKS/GKE)
You need mutual TLS without application changes
You want circuit breaking, retries, and timeouts as infrastructure
Latency sensitivity is high (sidecar adds less than 1ms, gateway adds 5-15ms)

Use internal API Gateway when:

Services span multiple compute platforms (EKS + ECS + Lambda)
You need cross-account API access via VPC Link
You want usage tracking and throttling between internal teams
You need request/response transformation (different internal formats)

Our enterprise recommendation:

Within a K8s cluster: service mesh (Istio). No API Gateway needed.
Between K8s clusters: service mesh with multi-cluster Istio or internal API Gateway.
K8s to non-K8s (Lambda, ECS): internal API Gateway with IAM auth via VPC Link.
Cross-account: internal API Gateway is the cleanest option — no VPC peering needed for the API path, just VPC Link.

Scenario 4: API Authentication Design

Q: “Design API authentication for a mobile banking app, third-party partners, and internal services.”

A: Three distinct auth flows for three consumer types:

API authentication design for different consumer types

Lambda Authorizer handles the complexity — it inspects the request and determines which auth flow to apply:

Authorization: Bearer <JWT> — validate JWT signature, claims, scopes
x-api-key header present — look up usage plan, validate client credentials
AWS SigV4 signed request — IAM auth for internal services

Scenario 5: API Versioning Strategy

Q: “How do you handle API versioning without breaking existing clients? You have 50 partner integrations.”

A: With 50 partners, breaking changes are expensive — each partner has their own development cycle.

Strategy: URI path versioning with parallel deployment

API versioning timeline for partner migration

Implementation:

Parallel deployment — v1 and v2 run simultaneously as separate target groups behind the same API Gateway
Sunset header — Sunset: Sat, 15 Mar 2027 00:00:00 GMT on all v1 responses starting 6 months before removal
Deprecation notice — Deprecation: true header plus Link header pointing to migration guide
Usage tracking — monitor which partners still use v1 via API key analytics
Partner communication — automated email when a partner’s v1 usage exceeds threshold after deprecation
Graceful sunset — v1 returns 410 Gone with a response body containing the v2 equivalent endpoint

API Gateway routing:

# v1 routes to old backend
resource "aws_apigatewayv2_route" "accounts_v1" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "GET /v1/accounts/{id}"
  target    = "integrations/${aws_apigatewayv2_integration.accounts_v1.id}"
}

# v2 routes to new backend
resource "aws_apigatewayv2_route" "accounts_v2" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "GET /v2/accounts/{id}"
  target    = "integrations/${aws_apigatewayv2_integration.accounts_v2.id}"
}

Scenario 6: API Under Attack

Q: “Your API is being hammered by a single client — 10x their normal traffic. Backend latency is spiking for all clients. How do you respond?”

A: This is an incident with clear containment, mitigation, and prevention phases.

Immediate (0-5 minutes):

Identify the client — check API Gateway access logs for the API key or source IP generating the spike
Throttle the specific client — if they are on a usage plan, reduce their rate limit immediately. If not, add a WAF rate-based rule for their IP
Verify it is not an attack — is the traffic from a known partner (misconfigured retry loop) or an unknown source (DDoS/scraping)?

Containment (5-30 minutes):

If known partner (API key identified):
  → Reduce usage plan rate limit temporarily
  → Contact partner to fix their client
  → Check for retry storms (are they retrying 5xx responses in a tight loop?)

If unknown source:
  → WAF rate-based rule: block IP if > 2000 req/5min
  → If distributed IPs: enable AWS Shield Advanced / Cloud Armor adaptive protection
  → Geo-blocking if traffic from unexpected region

Backend protection (parallel):

Enable API Gateway response caching for read endpoints — serve cached responses instead of hitting the backend
Circuit breaker — if using Istio, configure outlier detection to eject unhealthy backends
Scale backends — trigger HPA if pods are at capacity, but this treats the symptom not the cause

Prevention (post-incident):

Mandatory usage plans for all API consumers — no unthrottled access
WAF rate-based rules as a secondary throttle layer (catches traffic before it hits API Gateway)
Alerting on per-client request rate anomalies (CloudWatch alarm on usage plan throttle count)
Retry guidance in API docs — exponential backoff with jitter, not immediate retry
Circuit breaker at client side — require partners to implement client-side circuit breakers (include in API contract)

The key insight is defense in depth: WAF rate rules (L7 edge) + API Gateway usage plan throttling (per-client) + backend circuit breaker (per-service). No single layer handles everything.