Secrets Management
Where This Fits
Section titled “Where This Fits”Secrets infrastructure is managed centrally by the platform team in the Shared Services Account/Project. Secrets for databases, API keys, and certificates live in AWS Secrets Manager or GCP Secret Manager. Kubernetes workloads in tenant namespaces consume secrets via External Secrets Operator (ESO) — tenants create an ExternalSecret CR, and ESO syncs the value into a native Kubernetes Secret.
Why Hardcoded Secrets Kill Enterprises
Section titled “Why Hardcoded Secrets Kill Enterprises”The evolution of secrets management:
Level 0: Secrets in code (hardcoded) ← NEVERLevel 1: Secrets in environment variables ← better but still exposed in process listingLevel 2: Secrets in CI/CD pipeline variables ← better but CI/CD is now a targetLevel 3: Secrets in cloud secrets manager ← GOOD (encrypted, audited, rotatable)Level 4: Dynamic secrets (Vault generates ← BEST (short-lived, unique per consumer) unique DB creds per app instance)Cloud-Native Secrets Services
Section titled “Cloud-Native Secrets Services”AWS Secrets Manager vs SSM Parameter Store
Section titled “AWS Secrets Manager vs SSM Parameter Store”| Feature | Secrets Manager | SSM Parameter Store |
|---|---|---|
| Pricing | $0.40/secret/month + $0.05 per 10K API calls | Free (standard), $0.05/advanced param/month |
| Automatic rotation | Yes — Lambda-based rotation | No built-in rotation |
| Cross-account sharing | Yes — resource-based policy | Yes — but requires more IAM setup |
| Versioning | Yes (AWSCURRENT, AWSPREVIOUS) | Yes (labels) |
| Binary secrets | Yes (up to 64KB) | Yes (advanced tier, up to 8KB) |
| Encryption | KMS (default or CMEK) | KMS (default or CMEK) |
| CloudFormation/Terraform | Dynamic reference support | Dynamic reference support |
| Best for | Database credentials, API keys, certificates | Configuration values, feature flags, non-sensitive params |
Automatic Rotation with Lambda (AWS)
Section titled “Automatic Rotation with Lambda (AWS)”GCP Secret Manager
Section titled “GCP Secret Manager”Rotation via Cloud Scheduler + Cloud Functions (GCP)
Section titled “Rotation via Cloud Scheduler + Cloud Functions (GCP)”GCP Secret Manager supports rotation notifications via Pub/Sub topics. Cloud Scheduler triggers a Cloud Function on a cron schedule to perform the actual rotation.
# In Shared Services Accountresource "aws_secretsmanager_secret" "db_password" { name = "/prod/team-a/rds-password" description = "RDS password for team-a production database" kms_key_id = aws_kms_key.secrets_encryption.arn
# Cross-account access policy policy = jsonencode({ Version = "2012-10-17" Statement = [ { Sid = "AllowWorkloadAccountAccess" Effect = "Allow" Principal = { AWS = "arn:aws:iam::${var.team_a_prod_account_id}:root" } Action = [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret" ] Resource = "*" Condition = { StringEquals = { "aws:PrincipalTag/team" = "team-a" } } } ] })}
resource "aws_secretsmanager_secret_version" "db_password" { secret_id = aws_secretsmanager_secret.db_password.id secret_string = jsonencode({ username = "app_user" password = random_password.db.result engine = "postgres" host = aws_db_instance.team_a.address port = 5432 dbname = "production" })}resource "aws_secretsmanager_secret_rotation" "db_password" { secret_id = aws_secretsmanager_secret.db_password.id rotation_lambda_arn = aws_lambda_function.secret_rotation.arn
rotation_rules { automatically_after_days = 30 duration = "2h" # Rotation window }}
resource "aws_lambda_function" "secret_rotation" { function_name = "secret-rotation-rds" handler = "rotation.handler" runtime = "python3.12" timeout = 60
filename = data.archive_file.rotation_lambda.output_path
environment { variables = { SECRETS_MANAGER_ENDPOINT = "https://secretsmanager.${var.region}.amazonaws.com" } }
vpc_config { subnet_ids = var.private_subnets security_group_ids = [aws_security_group.rotation_lambda.id] }}resource "google_secret_manager_secret" "db_password" { secret_id = "team-a-rds-password" project = var.shared_services_project_id
replication { user_managed { replicas { location = "me-central1" # UAE data residency customer_managed_encryption { kms_key_name = google_kms_crypto_key.secrets_key.id } } replicas { location = "me-central2" # DR in UAE customer_managed_encryption { kms_key_name = google_kms_crypto_key.secrets_key.id } } } }
# Rotation notification topics { name = google_pubsub_topic.secret_rotation.id }
rotation { rotation_period = "2592000s" # 30 days next_rotation_time = "2026-04-15T00:00:00Z" }}
resource "google_secret_manager_secret_version" "db_password" { secret = google_secret_manager_secret.db_password.id secret_data = jsonencode({ username = "app_user" password = random_password.db.result host = google_sql_database_instance.team_a.private_ip_address port = 5432 dbname = "production" })}
# Grant workload project's service account accessresource "google_secret_manager_secret_iam_member" "team_a_accessor" { secret_id = google_secret_manager_secret.db_password.secret_id project = var.shared_services_project_id role = "roles/secretmanager.secretAccessor" member = "serviceAccount:${var.team_a_workload_sa}"}resource "google_cloud_scheduler_job" "rotate_secret" { name = "rotate-team-a-db-password" schedule = "0 2 1 * *" # 1st of every month at 2 AM project = var.shared_services_project_id region = "me-central1"
http_target { http_method = "POST" uri = google_cloudfunctions2_function.rotate_secret.url body = base64encode(jsonencode({ secret_id = "team-a-rds-password" db_instance = "team-a-prod" }))
oidc_token { service_account_email = var.rotation_sa_email } }}External Secrets Operator (ESO)
Section titled “External Secrets Operator (ESO)”ESO is the bridge between cloud secrets managers and Kubernetes Secrets. The platform team deploys ESO and creates the ClusterSecretStore. Tenant teams create ExternalSecret resources in their namespaces.
Platform Team: ClusterSecretStore
Section titled “Platform Team: ClusterSecretStore”# ClusterSecretStore — platform team deploys this (one per cluster)apiVersion: external-secrets.io/v1beta1kind: ClusterSecretStoremetadata: name: aws-secrets-managerspec: provider: aws: service: SecretsManager region: me-central-1 auth: jwt: serviceAccountRef: name: external-secrets-sa namespace: external-secrets---# ServiceAccount with IRSA for cross-account Secrets Manager accessapiVersion: v1kind: ServiceAccountmetadata: name: external-secrets-sa namespace: external-secrets annotations: eks.amazonaws.com/role-arn: arn:aws:iam::SHARED_SERVICES_ACCOUNT:role/ExternalSecretsRoleIAM Role for ESO (in Shared Services Account):
resource "aws_iam_role" "external_secrets" { name = "ExternalSecretsRole"
assume_role_policy = jsonencode({ Version = "2012-10-17" Statement = [ { Effect = "Allow" Principal = { Federated = "arn:aws:iam::${var.workload_account_id}:oidc-provider/${var.eks_oidc_provider}" } Action = "sts:AssumeRoleWithWebIdentity" Condition = { StringEquals = { "${var.eks_oidc_provider}:sub" = "system:serviceaccount:external-secrets:external-secrets-sa" } } } ] })}
resource "aws_iam_role_policy" "external_secrets" { role = aws_iam_role.external_secrets.id
policy = jsonencode({ Version = "2012-10-17" Statement = [ { Effect = "Allow" Action = [ "secretsmanager:GetSecretValue", "secretsmanager:DescribeSecret", "secretsmanager:ListSecrets" ] Resource = "arn:aws:secretsmanager:me-central-1:${var.shared_services_account_id}:secret:/prod/*" }, { Effect = "Allow" Action = [ "kms:Decrypt", "kms:DescribeKey" ] Resource = var.secrets_kms_key_arn } ] })}# ClusterSecretStore for GCP Secret ManagerapiVersion: external-secrets.io/v1beta1kind: ClusterSecretStoremetadata: name: gcp-secret-managerspec: provider: gcpsm: projectID: shared-services-prod # Central secrets project auth: workloadIdentity: clusterLocation: me-central1 clusterName: prod-cluster clusterProjectID: workload-project serviceAccountRef: name: external-secrets-sa namespace: external-secretsWorkload Identity binding for ESO:
# Allow the K8s SA to act as the GCP SAresource "google_service_account_iam_member" "eso_workload_identity" { service_account_id = google_service_account.external_secrets.name role = "roles/iam.workloadIdentityUser" member = "serviceAccount:${var.workload_project_id}.svc.id.goog[external-secrets/external-secrets-sa]"}
# Grant the GCP SA access to secretsresource "google_secret_manager_secret_iam_member" "eso_accessor" { for_each = toset(var.secret_ids) secret_id = each.value project = var.shared_services_project_id role = "roles/secretmanager.secretAccessor" member = "serviceAccount:${google_service_account.external_secrets.email}"}Tenant Team: ExternalSecret
Section titled “Tenant Team: ExternalSecret”# Tenant creates this in their namespace — ESO syncs the secretapiVersion: external-secrets.io/v1beta1kind: ExternalSecretmetadata: name: db-credentials namespace: team-aspec: refreshInterval: 1h # How often to sync from Secrets Manager secretStoreRef: name: aws-secrets-manager # References the ClusterSecretStore kind: ClusterSecretStore
target: name: db-credentials # Name of the K8s Secret to create creationPolicy: Owner # ESO owns this Secret (deletes if ExternalSecret deleted)
data: - secretKey: username remoteRef: key: /prod/team-a/rds-password property: username # JSON key extraction
- secretKey: password remoteRef: key: /prod/team-a/rds-password property: password
- secretKey: host remoteRef: key: /prod/team-a/rds-password property: hostDual-Secret Rotation Pattern (Zero-Downtime)
Section titled “Dual-Secret Rotation Pattern (Zero-Downtime)”The standard rotation problem: you rotate the password in Secrets Manager, but running pods still have the old password cached. The pod cannot connect to the database until it restarts and picks up the new secret.
Solution: Dual-secret (alternating user) rotation.
Dual-Secret Rotation Flow:
Time T0 (normal): Secrets Manager: user_a / password_a (CURRENT) Database: user_a active, user_b active App pods: using user_a / password_a ✓
Time T1 (rotation starts): Lambda creates new password for user_b Secrets Manager: user_b / password_b (PENDING) Database: user_b gets new password Secrets Manager: user_b / password_b → CURRENT ESO syncs new secret to K8s (within refreshInterval)
Time T2 (pods pick up new secret): Old pods still using user_a / password_a ← still works (user_a not changed) New pods (restarted or new replicas) using user_b / password_b ✓ Both work simultaneously → zero downtime
Time T3 (next rotation — 30 days later): Rotate user_a with new password Cycle continues: a → b → a → bPre-Commit Secret Scanning
Section titled “Pre-Commit Secret Scanning”Prevent secrets from ever reaching the Git repository.
.pre-commit-config.yaml:
repos: - repo: https://github.com/gitleaks/gitleaks rev: v8.18.0 hooks: - id: gitleaks name: Detect hardcoded secrets entry: gitleaks protect --staged --verbose language: golang pass_filenames: false
- repo: https://github.com/trufflesecurity/trufflehog rev: v3.63.0 hooks: - id: trufflehog name: TruffleHog secret scan entry: trufflehog git file://. --only-verified --fail language: golang pass_filenames: falsegitleaks.toml (custom rules for enterprise patterns):
[extend]useDefault = true
[[rules]]id = "aws-account-id"description = "AWS Account ID"regex = '''(?i)(?:account.?id|aws.?account)\s*[:=]\s*['\"]?(\d{12})['\"]?'''tags = ["aws", "account"]
[[rules]]id = "database-connection-string"description = "Database connection string"regex = '''(?i)(?:postgres|mysql|mongodb|redis):\/\/[^\s'"]+:[^\s'"]+@[^\s'"]+'''tags = ["database", "connection"]
[allowlist]paths = [ '''\.md$''', '''\.txt$''', '''testdata/''', '''test_fixtures/''']HashiCorp Vault — When Over Cloud-Native
Section titled “HashiCorp Vault — When Over Cloud-Native”| Factor | Cloud-Native (Secrets Manager / Secret Manager) | HashiCorp Vault |
|---|---|---|
| Multi-cloud | AWS-only or GCP-only | Single pane across AWS, GCP, Azure, on-prem |
| Dynamic secrets | Not supported (static secrets with rotation) | Yes — generates unique, short-lived DB creds per app |
| PKI / Certificate Authority | ACM (AWS), CAS (GCP) — limited | Full PKI engine with custom CA hierarchy |
| Transit encryption | KMS Encrypt/Decrypt API | Transit engine — encrypt data without storing it |
| OIDC/SAML auth | Cloud IAM only | Multiple auth backends (LDAP, OIDC, K8s, AWS IAM, GCP) |
| Secrets versioning | Basic (current/previous) | Full versioning with soft delete and recovery |
| Audit | CloudTrail / Cloud Audit Logs | Built-in audit log with every access recorded |
| Operational cost | Managed service (zero ops) | Self-hosted (HA cluster, unseal ceremony, upgrades) |
| Enterprise license | Pay-per-use | Vault Enterprise or HCP Vault (significant cost) |
Vault Architecture (Enterprise HA)
Section titled “Vault Architecture (Enterprise HA)”Vault Dynamic Secrets Example (Database)
Section titled “Vault Dynamic Secrets Example (Database)”# Vault database secrets engine configurationresource "vault_database_secret_backend_connection" "postgres" { backend = "database" name = "team-a-prod" allowed_roles = ["team-a-readonly", "team-a-readwrite"]
postgresql { connection_url = "postgres://{{username}}:{{password}}@${var.rds_endpoint}:5432/production" username = var.vault_admin_user password = var.vault_admin_password }}
resource "vault_database_secret_backend_role" "team_a_readonly" { backend = "database" name = "team-a-readonly" db_name = vault_database_secret_backend_connection.postgres.name creation_statements = [ "CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}';", "GRANT SELECT ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" ] default_ttl = 3600 # 1 hour max_ttl = 86400 # 24 hours}When a pod requests credentials, Vault creates a unique database user with a 1-hour TTL. When the TTL expires, Vault revokes the user. No shared, long-lived database passwords.
Interview Scenarios
Section titled “Interview Scenarios”Scenario 1: “Design Secrets Management for 50 Microservices Across 3 K8s Clusters”
Section titled “Scenario 1: “Design Secrets Management for 50 Microservices Across 3 K8s Clusters””Strong Answer:
“I would build a centralized secrets architecture with tenant self-service:
Central infrastructure:
- All secrets stored in AWS Secrets Manager (or GCP Secret Manager) in the Shared Services Account
- Naming convention:
/{env}/{team}/{secret-name}— this enables IAM scoping per team - CMEK encryption with a dedicated KMS key for secrets
- Rotation Lambdas for database credentials (30-day cycle, dual-secret pattern)
Kubernetes integration:
- External Secrets Operator (ESO) deployed by the platform team in every cluster
- One
ClusterSecretStoreper cluster, pointing to the central Secrets Manager - ESO uses IRSA (AWS) or Workload Identity (GCP) for authentication — no static credentials
- Refresh interval: 1 hour for non-critical secrets, 5 minutes for database credentials
Tenant self-service:
- Teams create
ExternalSecretCRDs in their namespace - OPA/Gatekeeper policy: teams can only reference secrets matching their prefix (
/prod/team-a/*) - The platform team never sees secret values — they manage infrastructure, not data
Security controls:
- IAM policies scope each team to their secret prefix
- CloudTrail/Cloud Audit Logs track every
GetSecretValuecall - Alert on unusual access patterns (new IP, unusual time, burst reads)
- Pre-commit scanning with gitleaks to prevent secrets in Git”
Scenario 2: “A Secret Was Committed to Git. What Is Your Incident Response?”
Section titled “Scenario 2: “A Secret Was Committed to Git. What Is Your Incident Response?””Strong Answer:
“This is a security incident. I follow a structured response:
Immediate (within minutes):
- Rotate the secret immediately — generate a new password/key in Secrets Manager, update the consumer application
- Revoke the old credential — disable the API key, change the database password, invalidate the token
- Do NOT just delete the commit — Git history preserves it. The secret is compromised regardless of whether you force-push
Investigation (within hours): 4. Determine the blast radius — what does this secret access? Was it a database password, an API key, a cloud credential? 5. Check CloudTrail/audit logs for any unauthorized use of the credential since the commit was made 6. Identify how the secret was committed — was pre-commit scanning disabled? Was the developer unaware?
Remediation (within days):
7. Run trufflehog or gitleaks across the entire repository history to find any other leaked secrets
8. If using GitHub Enterprise, enable Secret Scanning alerts (GitHub automatically detects known secret patterns)
9. Enforce pre-commit hooks organization-wide — make gitleaks a required CI check, not just a local hook
10. Rotate all related secrets proactively (if a DB password was leaked, rotate the master password too)
Post-incident: 11. Blameless post-mortem — focus on process gaps, not the individual 12. Update onboarding documentation for new developers 13. Consider using ESO/Vault so developers never have secrets locally at all”
Scenario 3: “How Do You Rotate Database Credentials Without Downtime?”
Section titled “Scenario 3: “How Do You Rotate Database Credentials Without Downtime?””Strong Answer:
“The key technique is the dual-secret (alternating user) rotation pattern:
Setup: Create two database users: app_user_a and app_user_b. Both have identical permissions. The application connects with whichever is marked CURRENT in Secrets Manager.
Rotation cycle:
- Currently active:
app_user_awithpassword_a(AWSCURRENT) - Rotation Lambda generates new password for
app_user_b, updates it in the database - Lambda stores
app_user_b / new_password_bas AWSPENDING in Secrets Manager - Lambda tests the new credentials (connects to DB, runs a query)
- Lambda promotes AWSPENDING to AWSCURRENT
Application side:
- Running pods still use
app_user_a / password_a— this credential is still valid - ESO syncs the new secret within its refresh interval
- New pods (or restarted pods) pick up
app_user_b / new_password_b - At no point is any running pod unable to connect
Next rotation (30 days later): Rotate app_user_a with a new password. The cycle alternates: a, b, a, b.
Why this works: You never invalidate the credential that running pods are using. You rotate the inactive user, and new pods transition naturally.”
Scenario 4: “As the Platform Team, How Do Tenants Request and Access Secrets?”
Section titled “Scenario 4: “As the Platform Team, How Do Tenants Request and Access Secrets?””Strong Answer:
“Self-service with guardrails:
Requesting a new secret:
- Tenant opens a PR to their team’s infrastructure repo, adding a Terraform resource for
aws_secretsmanager_secretunder their namespace prefix (/prod/team-a/new-api-key) - Platform team reviews the PR — checks naming convention, encryption key, rotation config
- PR is merged, Terraform pipeline creates the secret
- Tenant (or on-call) sets the initial secret value via AWS Console or CLI
Accessing a secret in Kubernetes:
- Tenant creates an
ExternalSecretCRD in their namespace (this is just YAML in their app repo) - ESO syncs the value into a native K8s Secret
- Tenant mounts the K8s Secret as an environment variable or volume in their pod spec
Guardrails:
- OPA/Gatekeeper policy:
ExternalSecretresources can only reference keys matching/{env}/{team-name}/* - IAM policy: the ESO service account can only read secrets matching the team’s prefix
- The platform team cannot read secret values — they manage the infrastructure, IAM scoping restricts data access
- Audit logs capture who accessed which secret and when
What the platform team provides:
- ESO controller (deployed, upgraded, monitored)
ClusterSecretStore(configured, IRSA/WI authenticated)- Rotation Lambda templates (for database secrets)
- Documentation and examples (how to create ExternalSecret, naming conventions)“
Scenario 5: “Secrets Manager vs Vault — When Would You Choose Vault?”
Section titled “Scenario 5: “Secrets Manager vs Vault — When Would You Choose Vault?””Strong Answer:
“I default to cloud-native Secrets Manager because it is a managed service with zero operational overhead. I choose Vault in specific situations:
Choose Vault when:
- Multi-cloud: You have workloads in AWS, GCP, and on-prem. Vault provides a single secrets API across all environments. Cloud-native means two separate systems.
- Dynamic secrets: Vault generates unique, short-lived database credentials per application instance. No shared passwords, no rotation needed — credentials expire automatically. Cloud Secrets Manager only stores static secrets that you rotate.
- PKI requirements: Vault’s PKI engine can act as an intermediate CA, issuing TLS certificates for internal services. Cloud-native requires ACM Private CA (AWS) or CAS (GCP), which are more limited.
- Complex policy requirements: Vault policies can be incredibly granular — restrict by path, time window, CIDR, number of uses. Cloud IAM is powerful but less flexible for secret-specific policies.
- Compliance mandates: Some regulated industries require that secrets never leave a specific boundary. Vault Enterprise with HSM-backed unseal and namespaces provides this.
Stick with cloud-native when:
- Single cloud provider
- Standard secrets (DB passwords, API keys) with periodic rotation
- Team does not want to operate a stateful, distributed system (Vault HA is non-trivial)
- Cost is a concern (Vault Enterprise license is significant; HCP Vault is simpler but still adds cost)
For our enterprise bank on AWS, I would start with Secrets Manager + ESO. If we later need dynamic secrets for database access at scale, we would layer Vault on top for just the database engine, keeping static secrets in Secrets Manager.”
References
Section titled “References”- AWS Secrets Manager Documentation — secret storage, rotation, and cross-account sharing
- GCP Secret Manager Documentation — secret storage with IAM per secret, versioning, and CMEK encryption
Tools & Frameworks
Section titled “Tools & Frameworks”- External Secrets Operator Documentation — syncing cloud secrets into Kubernetes native Secret objects
- HashiCorp Vault Documentation — dynamic secrets, PKI, transit encryption, and multi-cloud secret management