Skip to content

Secrets Management

Secrets infrastructure is managed centrally by the platform team in the Shared Services Account/Project. Secrets for databases, API keys, and certificates live in AWS Secrets Manager or GCP Secret Manager. Kubernetes workloads in tenant namespaces consume secrets via External Secrets Operator (ESO) — tenants create an ExternalSecret CR, and ESO syncs the value into a native Kubernetes Secret.

Secrets Management — Where This Fits


The evolution of secrets management:

Level 0: Secrets in code (hardcoded) ← NEVER
Level 1: Secrets in environment variables ← better but still exposed in process listing
Level 2: Secrets in CI/CD pipeline variables ← better but CI/CD is now a target
Level 3: Secrets in cloud secrets manager ← GOOD (encrypted, audited, rotatable)
Level 4: Dynamic secrets (Vault generates ← BEST (short-lived, unique per consumer)
unique DB creds per app instance)

AWS Secrets Manager vs SSM Parameter Store

Section titled “AWS Secrets Manager vs SSM Parameter Store”
FeatureSecrets ManagerSSM Parameter Store
Pricing$0.40/secret/month + $0.05 per 10K API callsFree (standard), $0.05/advanced param/month
Automatic rotationYes — Lambda-based rotationNo built-in rotation
Cross-account sharingYes — resource-based policyYes — but requires more IAM setup
VersioningYes (AWSCURRENT, AWSPREVIOUS)Yes (labels)
Binary secretsYes (up to 64KB)Yes (advanced tier, up to 8KB)
EncryptionKMS (default or CMEK)KMS (default or CMEK)
CloudFormation/TerraformDynamic reference supportDynamic reference support
Best forDatabase credentials, API keys, certificatesConfiguration values, feature flags, non-sensitive params

AWS Secrets Manager Rotation Flow — 4 steps

GCP Secret Manager Features

Rotation via Cloud Scheduler + Cloud Functions (GCP)

Section titled “Rotation via Cloud Scheduler + Cloud Functions (GCP)”

GCP Secret Manager supports rotation notifications via Pub/Sub topics. Cloud Scheduler triggers a Cloud Function on a cron schedule to perform the actual rotation.

# In Shared Services Account
resource "aws_secretsmanager_secret" "db_password" {
name = "/prod/team-a/rds-password"
description = "RDS password for team-a production database"
kms_key_id = aws_kms_key.secrets_encryption.arn
# Cross-account access policy
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowWorkloadAccountAccess"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${var.team_a_prod_account_id}:root"
}
Action = [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
]
Resource = "*"
Condition = {
StringEquals = {
"aws:PrincipalTag/team" = "team-a"
}
}
}
]
})
}
resource "aws_secretsmanager_secret_version" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
secret_string = jsonencode({
username = "app_user"
password = random_password.db.result
engine = "postgres"
host = aws_db_instance.team_a.address
port = 5432
dbname = "production"
})
}
resource "aws_secretsmanager_secret_rotation" "db_password" {
secret_id = aws_secretsmanager_secret.db_password.id
rotation_lambda_arn = aws_lambda_function.secret_rotation.arn
rotation_rules {
automatically_after_days = 30
duration = "2h" # Rotation window
}
}
resource "aws_lambda_function" "secret_rotation" {
function_name = "secret-rotation-rds"
handler = "rotation.handler"
runtime = "python3.12"
timeout = 60
filename = data.archive_file.rotation_lambda.output_path
environment {
variables = {
SECRETS_MANAGER_ENDPOINT = "https://secretsmanager.${var.region}.amazonaws.com"
}
}
vpc_config {
subnet_ids = var.private_subnets
security_group_ids = [aws_security_group.rotation_lambda.id]
}
}

ESO is the bridge between cloud secrets managers and Kubernetes Secrets. The platform team deploys ESO and creates the ClusterSecretStore. Tenant teams create ExternalSecret resources in their namespaces.

External Secrets Operator Architecture

# ClusterSecretStore — platform team deploys this (one per cluster)
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: aws-secrets-manager
spec:
provider:
aws:
service: SecretsManager
region: me-central-1
auth:
jwt:
serviceAccountRef:
name: external-secrets-sa
namespace: external-secrets
---
# ServiceAccount with IRSA for cross-account Secrets Manager access
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-secrets-sa
namespace: external-secrets
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::SHARED_SERVICES_ACCOUNT:role/ExternalSecretsRole

IAM Role for ESO (in Shared Services Account):

resource "aws_iam_role" "external_secrets" {
name = "ExternalSecretsRole"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Federated = "arn:aws:iam::${var.workload_account_id}:oidc-provider/${var.eks_oidc_provider}"
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${var.eks_oidc_provider}:sub" = "system:serviceaccount:external-secrets:external-secrets-sa"
}
}
}
]
})
}
resource "aws_iam_role_policy" "external_secrets" {
role = aws_iam_role.external_secrets.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret",
"secretsmanager:ListSecrets"
]
Resource = "arn:aws:secretsmanager:me-central-1:${var.shared_services_account_id}:secret:/prod/*"
},
{
Effect = "Allow"
Action = [
"kms:Decrypt",
"kms:DescribeKey"
]
Resource = var.secrets_kms_key_arn
}
]
})
}
# Tenant creates this in their namespace — ESO syncs the secret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
namespace: team-a
spec:
refreshInterval: 1h # How often to sync from Secrets Manager
secretStoreRef:
name: aws-secrets-manager # References the ClusterSecretStore
kind: ClusterSecretStore
target:
name: db-credentials # Name of the K8s Secret to create
creationPolicy: Owner # ESO owns this Secret (deletes if ExternalSecret deleted)
data:
- secretKey: username
remoteRef:
key: /prod/team-a/rds-password
property: username # JSON key extraction
- secretKey: password
remoteRef:
key: /prod/team-a/rds-password
property: password
- secretKey: host
remoteRef:
key: /prod/team-a/rds-password
property: host

Dual-Secret Rotation Pattern (Zero-Downtime)

Section titled “Dual-Secret Rotation Pattern (Zero-Downtime)”

The standard rotation problem: you rotate the password in Secrets Manager, but running pods still have the old password cached. The pod cannot connect to the database until it restarts and picks up the new secret.

Solution: Dual-secret (alternating user) rotation.

Dual-Secret Rotation Flow:
Time T0 (normal):
Secrets Manager: user_a / password_a (CURRENT)
Database: user_a active, user_b active
App pods: using user_a / password_a ✓
Time T1 (rotation starts):
Lambda creates new password for user_b
Secrets Manager: user_b / password_b (PENDING)
Database: user_b gets new password
Secrets Manager: user_b / password_b → CURRENT
ESO syncs new secret to K8s (within refreshInterval)
Time T2 (pods pick up new secret):
Old pods still using user_a / password_a ← still works (user_a not changed)
New pods (restarted or new replicas) using user_b / password_b ✓
Both work simultaneously → zero downtime
Time T3 (next rotation — 30 days later):
Rotate user_a with new password
Cycle continues: a → b → a → b

Prevent secrets from ever reaching the Git repository.

Pre-Commit Secret Scanning Pipeline

.pre-commit-config.yaml:

repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.0
hooks:
- id: gitleaks
name: Detect hardcoded secrets
entry: gitleaks protect --staged --verbose
language: golang
pass_filenames: false
- repo: https://github.com/trufflesecurity/trufflehog
rev: v3.63.0
hooks:
- id: trufflehog
name: TruffleHog secret scan
entry: trufflehog git file://. --only-verified --fail
language: golang
pass_filenames: false

gitleaks.toml (custom rules for enterprise patterns):

[extend]
useDefault = true
[[rules]]
id = "aws-account-id"
description = "AWS Account ID"
regex = '''(?i)(?:account.?id|aws.?account)\s*[:=]\s*['\"]?(\d{12})['\"]?'''
tags = ["aws", "account"]
[[rules]]
id = "database-connection-string"
description = "Database connection string"
regex = '''(?i)(?:postgres|mysql|mongodb|redis):\/\/[^\s'"]+:[^\s'"]+@[^\s'"]+'''
tags = ["database", "connection"]
[allowlist]
paths = [
'''\.md$''',
'''\.txt$''',
'''testdata/''',
'''test_fixtures/'''
]

HashiCorp Vault — When Over Cloud-Native

Section titled “HashiCorp Vault — When Over Cloud-Native”
FactorCloud-Native (Secrets Manager / Secret Manager)HashiCorp Vault
Multi-cloudAWS-only or GCP-onlySingle pane across AWS, GCP, Azure, on-prem
Dynamic secretsNot supported (static secrets with rotation)Yes — generates unique, short-lived DB creds per app
PKI / Certificate AuthorityACM (AWS), CAS (GCP) — limitedFull PKI engine with custom CA hierarchy
Transit encryptionKMS Encrypt/Decrypt APITransit engine — encrypt data without storing it
OIDC/SAML authCloud IAM onlyMultiple auth backends (LDAP, OIDC, K8s, AWS IAM, GCP)
Secrets versioningBasic (current/previous)Full versioning with soft delete and recovery
AuditCloudTrail / Cloud Audit LogsBuilt-in audit log with every access recorded
Operational costManaged service (zero ops)Self-hosted (HA cluster, unseal ceremony, upgrades)
Enterprise licensePay-per-useVault Enterprise or HCP Vault (significant cost)

HashiCorp Vault Enterprise HA Architecture

# Vault database secrets engine configuration
resource "vault_database_secret_backend_connection" "postgres" {
backend = "database"
name = "team-a-prod"
allowed_roles = ["team-a-readonly", "team-a-readwrite"]
postgresql {
connection_url = "postgres://{{username}}:{{password}}@${var.rds_endpoint}:5432/production"
username = var.vault_admin_user
password = var.vault_admin_password
}
}
resource "vault_database_secret_backend_role" "team_a_readonly" {
backend = "database"
name = "team-a-readonly"
db_name = vault_database_secret_backend_connection.postgres.name
creation_statements = [
"CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';",
"GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";"
]
default_ttl = 3600 # 1 hour
max_ttl = 86400 # 24 hours
}

When a pod requests credentials, Vault creates a unique database user with a 1-hour TTL. When the TTL expires, Vault revokes the user. No shared, long-lived database passwords.


Scenario 1: “Design Secrets Management for 50 Microservices Across 3 K8s Clusters”

Section titled “Scenario 1: “Design Secrets Management for 50 Microservices Across 3 K8s Clusters””

Strong Answer:

“I would build a centralized secrets architecture with tenant self-service:

Central infrastructure:

  • All secrets stored in AWS Secrets Manager (or GCP Secret Manager) in the Shared Services Account
  • Naming convention: /{env}/{team}/{secret-name} — this enables IAM scoping per team
  • CMEK encryption with a dedicated KMS key for secrets
  • Rotation Lambdas for database credentials (30-day cycle, dual-secret pattern)

Kubernetes integration:

  • External Secrets Operator (ESO) deployed by the platform team in every cluster
  • One ClusterSecretStore per cluster, pointing to the central Secrets Manager
  • ESO uses IRSA (AWS) or Workload Identity (GCP) for authentication — no static credentials
  • Refresh interval: 1 hour for non-critical secrets, 5 minutes for database credentials

Tenant self-service:

  • Teams create ExternalSecret CRDs in their namespace
  • OPA/Gatekeeper policy: teams can only reference secrets matching their prefix (/prod/team-a/*)
  • The platform team never sees secret values — they manage infrastructure, not data

Security controls:

  • IAM policies scope each team to their secret prefix
  • CloudTrail/Cloud Audit Logs track every GetSecretValue call
  • Alert on unusual access patterns (new IP, unusual time, burst reads)
  • Pre-commit scanning with gitleaks to prevent secrets in Git”

Scenario 2: “A Secret Was Committed to Git. What Is Your Incident Response?”

Section titled “Scenario 2: “A Secret Was Committed to Git. What Is Your Incident Response?””

Strong Answer:

“This is a security incident. I follow a structured response:

Immediate (within minutes):

  1. Rotate the secret immediately — generate a new password/key in Secrets Manager, update the consumer application
  2. Revoke the old credential — disable the API key, change the database password, invalidate the token
  3. Do NOT just delete the commit — Git history preserves it. The secret is compromised regardless of whether you force-push

Investigation (within hours): 4. Determine the blast radius — what does this secret access? Was it a database password, an API key, a cloud credential? 5. Check CloudTrail/audit logs for any unauthorized use of the credential since the commit was made 6. Identify how the secret was committed — was pre-commit scanning disabled? Was the developer unaware?

Remediation (within days): 7. Run trufflehog or gitleaks across the entire repository history to find any other leaked secrets 8. If using GitHub Enterprise, enable Secret Scanning alerts (GitHub automatically detects known secret patterns) 9. Enforce pre-commit hooks organization-wide — make gitleaks a required CI check, not just a local hook 10. Rotate all related secrets proactively (if a DB password was leaked, rotate the master password too)

Post-incident: 11. Blameless post-mortem — focus on process gaps, not the individual 12. Update onboarding documentation for new developers 13. Consider using ESO/Vault so developers never have secrets locally at all”


Scenario 3: “How Do You Rotate Database Credentials Without Downtime?”

Section titled “Scenario 3: “How Do You Rotate Database Credentials Without Downtime?””

Strong Answer:

“The key technique is the dual-secret (alternating user) rotation pattern:

Setup: Create two database users: app_user_a and app_user_b. Both have identical permissions. The application connects with whichever is marked CURRENT in Secrets Manager.

Rotation cycle:

  1. Currently active: app_user_a with password_a (AWSCURRENT)
  2. Rotation Lambda generates new password for app_user_b, updates it in the database
  3. Lambda stores app_user_b / new_password_b as AWSPENDING in Secrets Manager
  4. Lambda tests the new credentials (connects to DB, runs a query)
  5. Lambda promotes AWSPENDING to AWSCURRENT

Application side:

  • Running pods still use app_user_a / password_a — this credential is still valid
  • ESO syncs the new secret within its refresh interval
  • New pods (or restarted pods) pick up app_user_b / new_password_b
  • At no point is any running pod unable to connect

Next rotation (30 days later): Rotate app_user_a with a new password. The cycle alternates: a, b, a, b.

Why this works: You never invalidate the credential that running pods are using. You rotate the inactive user, and new pods transition naturally.”


Scenario 4: “As the Platform Team, How Do Tenants Request and Access Secrets?”

Section titled “Scenario 4: “As the Platform Team, How Do Tenants Request and Access Secrets?””

Strong Answer:

“Self-service with guardrails:

Requesting a new secret:

  1. Tenant opens a PR to their team’s infrastructure repo, adding a Terraform resource for aws_secretsmanager_secret under their namespace prefix (/prod/team-a/new-api-key)
  2. Platform team reviews the PR — checks naming convention, encryption key, rotation config
  3. PR is merged, Terraform pipeline creates the secret
  4. Tenant (or on-call) sets the initial secret value via AWS Console or CLI

Accessing a secret in Kubernetes:

  1. Tenant creates an ExternalSecret CRD in their namespace (this is just YAML in their app repo)
  2. ESO syncs the value into a native K8s Secret
  3. Tenant mounts the K8s Secret as an environment variable or volume in their pod spec

Guardrails:

  • OPA/Gatekeeper policy: ExternalSecret resources can only reference keys matching /{env}/{team-name}/*
  • IAM policy: the ESO service account can only read secrets matching the team’s prefix
  • The platform team cannot read secret values — they manage the infrastructure, IAM scoping restricts data access
  • Audit logs capture who accessed which secret and when

What the platform team provides:

  • ESO controller (deployed, upgraded, monitored)
  • ClusterSecretStore (configured, IRSA/WI authenticated)
  • Rotation Lambda templates (for database secrets)
  • Documentation and examples (how to create ExternalSecret, naming conventions)“

Scenario 5: “Secrets Manager vs Vault — When Would You Choose Vault?”

Section titled “Scenario 5: “Secrets Manager vs Vault — When Would You Choose Vault?””

Strong Answer:

“I default to cloud-native Secrets Manager because it is a managed service with zero operational overhead. I choose Vault in specific situations:

Choose Vault when:

  1. Multi-cloud: You have workloads in AWS, GCP, and on-prem. Vault provides a single secrets API across all environments. Cloud-native means two separate systems.
  2. Dynamic secrets: Vault generates unique, short-lived database credentials per application instance. No shared passwords, no rotation needed — credentials expire automatically. Cloud Secrets Manager only stores static secrets that you rotate.
  3. PKI requirements: Vault’s PKI engine can act as an intermediate CA, issuing TLS certificates for internal services. Cloud-native requires ACM Private CA (AWS) or CAS (GCP), which are more limited.
  4. Complex policy requirements: Vault policies can be incredibly granular — restrict by path, time window, CIDR, number of uses. Cloud IAM is powerful but less flexible for secret-specific policies.
  5. Compliance mandates: Some regulated industries require that secrets never leave a specific boundary. Vault Enterprise with HSM-backed unseal and namespaces provides this.

Stick with cloud-native when:

  • Single cloud provider
  • Standard secrets (DB passwords, API keys) with periodic rotation
  • Team does not want to operate a stateful, distributed system (Vault HA is non-trivial)
  • Cost is a concern (Vault Enterprise license is significant; HCP Vault is simpler but still adds cost)

For our enterprise bank on AWS, I would start with Secrets Manager + ESO. If we later need dynamic secrets for database access at scale, we would layer Vault on top for just the database engine, keeping static secrets in Secrets Manager.”