Multi-Tenancy & RBAC
Where This Fits
Section titled “Where This Fits”
You are the central platform team at an enterprise bank. 20 development teams share
3 EKS/GKE clusters. Each team gets one or more namespaces with resource quotas, RBAC,
network isolation, and policy enforcement. This page covers how to design, implement,
and secure multi-tenant Kubernetes clusters.
1. Multi-Tenancy Models
Section titled “1. Multi-Tenancy Models”Soft Tenancy (Namespace-Level Isolation)
Section titled “Soft Tenancy (Namespace-Level Isolation)”
When to use: Internal teams within the same trust boundary. This is the standard
for enterprise Kubernetes — 80% of use cases.
Hard Tenancy (Cluster-Level Isolation)
Section titled “Hard Tenancy (Cluster-Level Isolation)”
When to use: Regulatory requirements (PCI-DSS cardholder data), different trust
boundaries (external vendors), or workloads that need dedicated resources (ML training).
The Enterprise Sweet Spot
Section titled “The Enterprise Sweet Spot”2. Namespace Design
Section titled “2. Namespace Design”Naming Convention
Section titled “Naming Convention”Standard Namespace Bundle
Section titled “Standard Namespace Bundle”Every team namespace gets the same set of resources, deployed via Terraform module or GitOps.
# namespace-bundle.yaml — applied for every team namespace# Deployed via ArgoCD ApplicationSet or Terraform
# 1. Namespace with labelsapiVersion: v1kind: Namespacemetadata: name: payments labels: team: payments cost-center: "CC-1234" environment: production # Pod Security Standards enforcement pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/enforce-version: latest pod-security.kubernetes.io/warn: restricted pod-security.kubernetes.io/audit: restricted # Gateway API access gateway-access: "true" # For NetworkPolicy namespaceSelector name: payments---# 2. ResourceQuotaapiVersion: v1kind: ResourceQuotametadata: name: team-quota namespace: paymentsspec: hard: requests.cpu: "8" requests.memory: "32Gi" limits.cpu: "16" limits.memory: "64Gi" pods: "50" services: "20" services.loadbalancers: "2" persistentvolumeclaims: "10" secrets: "30" configmaps: "30"---# 3. LimitRange — defaults for pods that don't set requests/limitsapiVersion: v1kind: LimitRangemetadata: name: default-limits namespace: paymentsspec: limits: - type: Container default: cpu: "500m" memory: "512Mi" defaultRequest: cpu: "100m" memory: "128Mi" max: cpu: "4" memory: "8Gi" min: cpu: "50m" memory: "64Mi" - type: PersistentVolumeClaim max: storage: "50Gi" min: storage: "1Gi"---# 4. Default deny-all NetworkPolicyapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: default-deny-all namespace: paymentsspec: podSelector: {} policyTypes: - Ingress - Egress---# 5. Allow DNS egress (always needed)apiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-dns namespace: paymentsspec: podSelector: {} policyTypes: - Egress egress: - to: - namespaceSelector: matchLabels: name: kube-system ports: - protocol: UDP port: 53 - protocol: TCP port: 53---# 6. Allow intra-namespaceapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-same-namespace namespace: paymentsspec: podSelector: {} policyTypes: - Ingress ingress: - from: - podSelector: {}---# 7. Allow Prometheus scrapingapiVersion: networking.k8s.io/v1kind: NetworkPolicymetadata: name: allow-monitoring namespace: paymentsspec: podSelector: {} policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: name: monitoring ports: - protocol: TCP port: 9090 - protocol: TCP port: 8080Terraform Module for Namespace Provisioning
Section titled “Terraform Module for Namespace Provisioning”variable "team_name" {}variable "cpu_request_quota" { default = "8" }variable "memory_request_quota" { default = "32Gi" }variable "max_pods" { default = 50 }
resource "kubernetes_namespace" "team" { metadata { name = var.team_name labels = { team = var.team_name "pod-security.kubernetes.io/enforce" = "restricted" "pod-security.kubernetes.io/enforce-version" = "latest" "pod-security.kubernetes.io/warn" = "restricted" name = var.team_name gateway-access = "true" } }}
resource "kubernetes_resource_quota" "team" { metadata { name = "team-quota" namespace = kubernetes_namespace.team.metadata[0].name } spec { hard = { "requests.cpu" = var.cpu_request_quota "requests.memory" = var.memory_request_quota "pods" = var.max_pods } }}
resource "kubernetes_limit_range" "team" { metadata { name = "default-limits" namespace = kubernetes_namespace.team.metadata[0].name } spec { limit { type = "Container" default = { cpu = "500m" memory = "512Mi" } default_request = { cpu = "100m" memory = "128Mi" } } }}
# Usage:# module "team_payments" {# source = "./modules/team-namespace"# team_name = "payments"# cpu_request_quota = "16"# memory_request_quota = "64Gi"# max_pods = 100# }3. RBAC — Role-Based Access Control
Section titled “3. RBAC — Role-Based Access Control”RBAC Building Blocks
Section titled “RBAC Building Blocks”Team Admin Role
Section titled “Team Admin Role”# ClusterRole — reusable across namespacesapiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: namespace-adminrules: # Workload management - apiGroups: ["apps"] resources: ["deployments", "statefulsets", "daemonsets", "replicasets"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: [""] resources: ["pods", "pods/log", "pods/exec", "pods/portforward"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: ["batch"] resources: ["jobs", "cronjobs"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# Networking - apiGroups: [""] resources: ["services"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: ["networking.k8s.io"] resources: ["ingresses", "networkpolicies"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: ["gateway.networking.k8s.io"] resources: ["httproutes", "grpcroutes"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# Config - apiGroups: [""] resources: ["configmaps", "secrets", "serviceaccounts"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["get", "list", "watch", "create", "delete"]
# Autoscaling - apiGroups: ["autoscaling"] resources: ["horizontalpodautoscalers"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# External Secrets - apiGroups: ["external-secrets.io"] resources: ["externalsecrets"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
# View-only for quotas and events - apiGroups: [""] resources: ["resourcequotas", "limitranges", "events"] verbs: ["get", "list", "watch"]
# CANNOT: modify namespaces, RBAC, cluster-wide resources, node access---# Read-only role for auditors / complianceapiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: namespace-viewerrules: - apiGroups: ["", "apps", "batch", "networking.k8s.io", "autoscaling"] resources: ["*"] verbs: ["get", "list", "watch"] # Explicitly deny secrets read for some teams # (handled via separate role if needed)RoleBinding — Mapping Identity to Roles
Section titled “RoleBinding — Mapping Identity to Roles”# Bind the payments team's IAM group to namespace-admin in their namespaceapiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: payments-admin-binding namespace: paymentssubjects: # EKS: map to IAM role/group (via aws-auth or access entries) - kind: Group name: "payments-admins" # K8s group, mapped from IAM apiGroup: rbac.authorization.k8s.io # GKE: map to Google Group # - kind: Group # name: "payments-admins@bank.com" # apiGroup: rbac.authorization.k8s.ioroleRef: kind: ClusterRole name: namespace-admin apiGroup: rbac.authorization.k8s.io---# Platform team gets cluster-admin (but only specific people)apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRoleBindingmetadata: name: platform-admin-bindingsubjects: - kind: Group name: "platform-admins" apiGroup: rbac.authorization.k8s.ioroleRef: kind: ClusterRole name: cluster-admin apiGroup: rbac.authorization.k8s.ioAggregated ClusterRoles
Section titled “Aggregated ClusterRoles”# Aggregated ClusterRole — automatically combines rules from labelled ClusterRoles# Use case: extensible permission sets that grow with CRDs
apiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: team-admin-aggregateaggregationRule: clusterRoleSelectors: - matchLabels: rbac.bank.com/aggregate-to-team-admin: "true"rules: [] # Rules are auto-populated from matching ClusterRoles---# Base permissionsapiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: team-admin-base labels: rbac.bank.com/aggregate-to-team-admin: "true"rules: - apiGroups: ["apps"] resources: ["deployments", "statefulsets"] verbs: ["*"]---# When you add a new CRD (e.g., ExternalSecret), just add another labelled roleapiVersion: rbac.authorization.k8s.io/v1kind: ClusterRolemetadata: name: team-admin-eso labels: rbac.bank.com/aggregate-to-team-admin: "true"rules: - apiGroups: ["external-secrets.io"] resources: ["externalsecrets"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]# team-admin-aggregate now automatically includes ESO permissions4. Pod Security Standards (PSS)
Section titled “4. Pod Security Standards (PSS)”Pod Security Standards replaced PodSecurityPolicy (removed in K8s 1.25). They define three levels of security for pods.
Enforcement Modes
Section titled “Enforcement Modes”# Applied via namespace labelsapiVersion: v1kind: Namespacemetadata: name: payments labels: # enforce: reject pods that violate pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/enforce-version: v1.31
# warn: allow but show warning to user pod-security.kubernetes.io/warn: restricted
# audit: log violations to audit log pod-security.kubernetes.io/audit: restrictedWhat “Restricted” Requires
Section titled “What “Restricted” Requires”# A pod that passes "restricted" PSS:apiVersion: v1kind: Podmetadata: name: compliant-podspec: securityContext: runAsNonRoot: true # Must not run as root seccompProfile: type: RuntimeDefault # Seccomp profile required containers: - name: app image: bank-app:v1 securityContext: allowPrivilegeEscalation: false # Cannot gain more privileges readOnlyRootFilesystem: true # Cannot write to / runAsNonRoot: true capabilities: drop: - ALL # Drop all Linux capabilities volumeMounts: - name: tmp mountPath: /tmp # Writable temp dir volumes: - name: tmp emptyDir: {}5. Policy Enforcement — OPA Gatekeeper & Kyverno
Section titled “5. Policy Enforcement — OPA Gatekeeper & Kyverno”Pod Security Standards cover pod-level security. For organizational policies (image registries, required labels, naming conventions), you need OPA Gatekeeper or Kyverno.
OPA Gatekeeper
Section titled “OPA Gatekeeper”# ConstraintTemplate — define the policy (Rego language)apiVersion: templates.gatekeeper.sh/v1kind: ConstraintTemplatemetadata: name: k8sallowedreposspec: crd: spec: names: kind: K8sAllowedRepos validation: openAPIV3Schema: type: object properties: repos: type: array items: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package k8sallowedrepos
violation[{"msg": msg}] { container := input.review.object.spec.containers[_] not startswith(container.image, input.parameters.repos[_]) msg := sprintf( "Container <%v> image <%v> not from allowed repo. Allowed: %v", [container.name, container.image, input.parameters.repos] ) } violation[{"msg": msg}] { container := input.review.object.spec.initContainers[_] not startswith(container.image, input.parameters.repos[_]) msg := sprintf( "Init container <%v> image <%v> not from allowed repo. Allowed: %v", [container.name, container.image, input.parameters.repos] ) }---# Constraint — apply the policy with parametersapiVersion: constraints.gatekeeper.sh/v1beta1kind: K8sAllowedReposmetadata: name: require-private-registryspec: enforcementAction: deny # deny, dryrun, or warn match: kinds: - apiGroups: [""] kinds: ["Pod"] - apiGroups: ["apps"] kinds: ["Deployment", "StatefulSet", "DaemonSet"] excludedNamespaces: - kube-system - gatekeeper-system parameters: repos: - "111111111111.dkr.ecr.us-east-1.amazonaws.com/" # ECR - "us-central1-docker.pkg.dev/bank-prod/" # Artifact Registry# More Gatekeeper policies for enterprise
# Require specific labels on all deploymentsapiVersion: templates.gatekeeper.sh/v1kind: ConstraintTemplatemetadata: name: k8srequiredlabelsspec: crd: spec: names: kind: K8sRequiredLabels validation: openAPIV3Schema: type: object properties: labels: type: array items: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package k8srequiredlabels
violation[{"msg": msg}] { provided := {label | input.review.object.metadata.labels[label]} required := {label | label := input.parameters.labels[_]} missing := required - provided count(missing) > 0 msg := sprintf("Missing required labels: %v", [missing]) }---apiVersion: constraints.gatekeeper.sh/v1beta1kind: K8sRequiredLabelsmetadata: name: require-team-labelsspec: match: kinds: - apiGroups: ["apps"] kinds: ["Deployment"] excludedNamespaces: ["kube-system"] parameters: labels: - "app" - "team" - "version"---# Block privileged containers (defense in depth with PSS)apiVersion: templates.gatekeeper.sh/v1kind: ConstraintTemplatemetadata: name: k8spsprivilegedcontainerspec: crd: spec: names: kind: K8sPSPPrivilegedContainer targets: - target: admission.k8s.gatekeeper.sh rego: | package k8spsprivilegedcontainer
violation[{"msg": msg}] { container := input.review.object.spec.containers[_] container.securityContext.privileged == true msg := sprintf("Privileged container <%v> not allowed", [container.name]) }---apiVersion: constraints.gatekeeper.sh/v1beta1kind: K8sPSPPrivilegedContainermetadata: name: block-privilegedspec: match: kinds: - apiGroups: [""] kinds: ["Pod"] excludedNamespaces: ["kube-system"]Kyverno (Alternative to Gatekeeper)
Section titled “Kyverno (Alternative to Gatekeeper)”# Kyverno uses native YAML instead of Rego — easier to read and write
# Require private registryapiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: require-private-registryspec: validationFailureAction: Enforce # Enforce or Audit background: true rules: - name: validate-image-registry match: any: - resources: kinds: - Pod exclude: any: - resources: namespaces: - kube-system - kyverno validate: message: "Images must come from the bank's private registry." pattern: spec: containers: - image: "111111111111.dkr.ecr.us-east-1.amazonaws.com/*"---# Auto-add labels (mutating policy)apiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: add-default-labelsspec: rules: - name: add-team-label-from-namespace match: any: - resources: kinds: - Deployment - StatefulSet mutate: patchStrategicMerge: metadata: labels: +(managed-by): "platform-team" # + means add only if missing---# Generate NetworkPolicy for every new namespaceapiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: generate-default-denyspec: rules: - name: default-deny match: any: - resources: kinds: - Namespace exclude: any: - resources: names: - kube-system - kube-public - default generate: apiVersion: networking.k8s.io/v1 kind: NetworkPolicy name: default-deny-all namespace: "{{request.object.metadata.name}}" data: spec: podSelector: {} policyTypes: - Ingress - Egress6. Cloud-Specific Identity Mapping
Section titled “6. Cloud-Specific Identity Mapping”aws-auth ConfigMap (EKS — Legacy)
Section titled “aws-auth ConfigMap (EKS — Legacy)”EKS Access Entries (New — Recommended)
Section titled “EKS Access Entries (New — Recommended)”Key advantages of Access Entries over aws-auth:
| Aspect | aws-auth | Access Entries |
|---|---|---|
| Management | kubectl (ConfigMap) | AWS API / Terraform |
| Validation | None (silent failures) | API validates role ARN |
| Audit | K8s audit logs only | CloudTrail + K8s audit |
| Recovery | Must have kubectl access | Can fix from AWS console |
| Scope | Cluster-wide groups | Namespace-scoped policies |
| Versioning | Manual (GitOps) | AWS API (Terraform state) |
Google Groups for GKE RBAC
Section titled “Google Groups for GKE RBAC”GKE Fleet RBAC (Multi-Cluster)
Section titled “GKE Fleet RBAC (Multi-Cluster)”# The aws-auth ConfigMap maps IAM principals to K8s users/groups# Located in kube-system namespace# WARNING: Misconfiguring this can lock you out of the cluster
apiVersion: v1kind: ConfigMapmetadata: name: aws-auth namespace: kube-systemdata: mapRoles: | # Node role — required for nodes to join - rolearn: arn:aws:iam::111111111111:role/eks-node-role username: system:node:{{EC2PrivateDNSName}} groups: - system:bootstrappers - system:nodes
# Platform admins — cluster-admin - rolearn: arn:aws:iam::111111111111:role/PlatformAdminRole username: platform-admin groups: - system:masters # Full cluster admin
# Payment team — mapped to payments-admins K8s group - rolearn: arn:aws:iam::111111111111:role/PaymentsTeamRole username: "payments-{{SessionName}}" groups: - payments-admins # Maps to RoleBinding in payments namespace
# Lending team - rolearn: arn:aws:iam::111111111111:role/LendingTeamRole username: "lending-{{SessionName}}" groups: - lending-admins
mapUsers: | # Break-glass admin (emergency access) - userarn: arn:aws:iam::111111111111:user/break-glass-admin username: break-glass groups: - system:masters# Terraform — EKS access entries replace aws-auth# Available since EKS platform version eks.12+ (late 2023)
# Enable the access entry API on the clusterresource "aws_eks_cluster" "main" { name = "prod-eks-01" role_arn = aws_iam_role.cluster.arn
access_config { authentication_mode = "API_AND_CONFIG_MAP" # Transitional mode # Final state: "API" (after migrating all entries) } # ...}
# Platform admins — cluster-adminresource "aws_eks_access_entry" "platform_admins" { cluster_name = aws_eks_cluster.main.name principal_arn = "arn:aws:iam::111111111111:role/PlatformAdminRole" type = "STANDARD"}
resource "aws_eks_access_policy_association" "platform_admins" { cluster_name = aws_eks_cluster.main.name principal_arn = "arn:aws:iam::111111111111:role/PlatformAdminRole" policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope { type = "cluster" # Cluster-wide }}
# Payment team — namespace-scoped accessresource "aws_eks_access_entry" "payments_team" { cluster_name = aws_eks_cluster.main.name principal_arn = "arn:aws:iam::111111111111:role/PaymentsTeamRole" type = "STANDARD" kubernetes_groups = ["payments-admins"] # K8s group for RoleBinding}
resource "aws_eks_access_policy_association" "payments_team" { cluster_name = aws_eks_cluster.main.name principal_arn = "arn:aws:iam::111111111111:role/PaymentsTeamRole" policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSAdminViewPolicy"
access_scope { type = "namespace" namespaces = ["payments"] # Only this namespace }}
# Node role — EC2 type access entry (auto-created for managed node groups)resource "aws_eks_access_entry" "nodes" { cluster_name = aws_eks_cluster.main.name principal_arn = aws_iam_role.node.arn type = "EC2_LINUX" # Special type for nodes}# Enable Google Groups for RBAC on the clusterresource "google_container_cluster" "main" { name = "prod-gke-01" location = "me-central1"
authenticator_groups_config { security_group = "gke-security-groups@bank.com" # All groups nested under this group are available for RBAC # Group members of payments-admins@bank.com get K8s group "payments-admins@bank.com" }}# RoleBinding to Google GroupapiVersion: rbac.authorization.k8s.io/v1kind: RoleBindingmetadata: name: payments-admin-binding namespace: paymentssubjects: - kind: Group name: "payments-admins@bank.com" # Google Group email apiGroup: rbac.authorization.k8s.ioroleRef: kind: ClusterRole name: namespace-admin apiGroup: rbac.authorization.k8s.io# Fleet-level RBAC — manage access across multiple GKE clustersresource "google_gke_hub_membership" "prod_cluster" { membership_id = "prod-gke-01" project = var.fleet_project_id
endpoint { gke_cluster { resource_link = google_container_cluster.main.id } }}
# Fleet-scope RBAC bindingresource "google_gke_hub_scope_rbac_role_binding" "payments_team" { project = var.fleet_project_id scope_rbac_role_binding_id = "payments-admin" scope_id = "prod-scope"
role { predefined_role = "ADMIN" }
group = "payments-admins@bank.com"}7. Hierarchical Namespace Controller (HNC)
Section titled “7. Hierarchical Namespace Controller (HNC)”HNC enables self-service sub-namespace creation. A team admin can create sub-namespaces without cluster-admin access. Policies (RBAC, NetworkPolicy, Quotas) propagate from parent to child.
# HNC sub-namespace creation (by team admin, not cluster admin)apiVersion: hnc.x-k8s.io/v1alpha2kind: SubnamespaceAnchormetadata: name: payments-api namespace: payments # Parent namespace# This creates the "payments-api" namespace and propagates policies from "payments"Interview Scenarios
Section titled “Interview Scenarios”Scenario 1: “Design multi-tenancy for 20 teams sharing 3 EKS clusters”
Section titled “Scenario 1: “Design multi-tenancy for 20 teams sharing 3 EKS clusters””Answer:
“I would implement a namespace-per-team model with a standardized isolation stack, deployed via Terraform and ArgoCD.”
Architecture:
Per-namespace isolation stack (deployed for each of 20 teams):
| Layer | Implementation | Purpose |
|---|---|---|
| Identity | EKS Access Entries → K8s groups | Map IAM roles to K8s RBAC |
| Authorization | ClusterRole + RoleBinding | Namespace-scoped permissions |
| Resource limits | ResourceQuota + LimitRange | Prevent resource hogging |
| Network | NetworkPolicy (deny-all + allow-list) | East-west traffic isolation |
| Pod security | PSS restricted + Gatekeeper | No privileged containers |
| Image control | Gatekeeper require-private-registry | Only ECR images |
| Labels | Gatekeeper required-labels | app, team, version on all Deployments |
| Secrets | ESO ExternalSecret per team path | Namespace-scoped secrets |
| Cost | Namespace labels (cost-center) | FinOps attribution via Kubecost |
Self-service workflow:
1. Team requests namespace → Jira ticket2. Platform team approves → merge Terraform PR3. Terraform creates: - aws_eks_access_entry (IAM → K8s group) - kubernetes_namespace (with PSS labels) - kubernetes_resource_quota - kubernetes_limit_range4. ArgoCD syncs: - NetworkPolicies (deny-all + baseline allow) - RoleBindings5. Team gets kubectl access (scoped to their namespace)Scenario 2: “A team is consuming 80% of cluster resources. How do you prevent this?”
Section titled “Scenario 2: “A team is consuming 80% of cluster resources. How do you prevent this?””Answer:
“This is a classic noisy neighbor problem. I would address it at three levels: quotas, priority, and monitoring.”
Immediate fix:
# 1. Apply ResourceQuota to the offending namespaceapiVersion: v1kind: ResourceQuotametadata: name: team-quota namespace: team-heavyspec: hard: requests.cpu: "16" # Cap at 16 CPUs (was unbounded) requests.memory: "64Gi" limits.cpu: "32" limits.memory: "128Gi" pods: "100"Priority-based preemption:
# 2. Ensure critical services have higher priorityapiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata: name: platform-criticalvalue: 1000000globalDefault: falsedescription: "Platform services (monitoring, ingress, DNS)"---apiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata: name: productionvalue: 100000globalDefault: true # Default for all team workloadsdescription: "Production workloads"---apiVersion: scheduling.k8s.io/v1kind: PriorityClassmetadata: name: batch-lowvalue: 10000preemptionPolicy: Never # Cannot evict other podsdescription: "Batch jobs, data processing"Prevention:
# 3. LimitRange — force every pod to declare requestsapiVersion: v1kind: LimitRangemetadata: name: force-requests namespace: team-heavyspec: limits: - type: Container default: cpu: "500m" memory: "512Mi" defaultRequest: cpu: "100m" memory: "128Mi" # Without explicit requests, pods get defaults # Cluster Autoscaler uses requests for schedulingMonitoring:
# 4. Track resource usage per namespacekubectl top pods -n team-heavy --sort-by=cpukubectl describe resourcequota team-quota -n team-heavy
# Kubecost or OpenCost for cost attribution by namespace# Alert when any namespace exceeds 60% of its quotaScenario 3: “How do you enforce that all pods must come from your private registry?”
Section titled “Scenario 3: “How do you enforce that all pods must come from your private registry?””Answer:
“I would use OPA Gatekeeper or Kyverno as an admission webhook that rejects any pod with images not from our private registry. Defense in depth with binary authorization.”
Layer 1: Admission policy (Gatekeeper)
# See the K8sAllowedRepos ConstraintTemplate above# This rejects any pod with images not from ECR or Artifact RegistryLayer 2: Binary Authorization (GKE) / Image verification
# GKE Binary Authorization — only allow signed images# (AWS equivalent: use Sigstore/Cosign with Kyverno verify-images)
# Kyverno image verification policyapiVersion: kyverno.io/v1kind: ClusterPolicymetadata: name: verify-image-signaturesspec: validationFailureAction: Enforce rules: - name: verify-cosign-signature match: any: - resources: kinds: - Pod verifyImages: - imageReferences: - "111111111111.dkr.ecr.us-east-1.amazonaws.com/*" attestors: - entries: - keys: publicKeys: |- -----BEGIN PUBLIC KEY----- MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE... -----END PUBLIC KEY-----Layer 3: ECR/Artifact Registry pull-through cache
# Even for public images (nginx, redis), pull through the private registry# ECR pull-through cache rule:# Source: public.ecr.aws, docker.io, quay.io, gcr.io# Target: 111111111111.dkr.ecr.us-east-1.amazonaws.com/ecr-cache/## Developers use: 111111111111.dkr.ecr.us-east-1.amazonaws.com/ecr-cache/library/nginx:latest# NOT: nginx:latestScenario 4: “Design self-service namespace provisioning for developers”
Section titled “Scenario 4: “Design self-service namespace provisioning for developers””Answer:
“I would build a GitOps-driven self-service workflow where developers submit a PR to request a namespace, and ArgoCD + Terraform handle provisioning.”
# namespaces.yaml (in Git — source of truth)# Developer adds their team here
# main.tf reads this and loopslocals { teams = yamldecode(file("namespaces.yaml"))}
# namespaces.yaml# teams:# - name: payments# cpu_quota: "16"# memory_quota: "64Gi"# iam_role: "arn:aws:iam::111111111111:role/PaymentsTeamRole"# owners: ["alice@bank.com", "bob@bank.com"]# - name: lending# cpu_quota: "8"# memory_quota: "32Gi"# iam_role: "arn:aws:iam::111111111111:role/LendingTeamRole"
module "team_namespace" { source = "./modules/team-namespace" for_each = { for team in local.teams.teams : team.name => team }
team_name = each.value.name cpu_request_quota = each.value.cpu_quota memory_request_quota = each.value.memory_quota iam_role_arn = each.value.iam_role cluster_name = aws_eks_cluster.main.name}Alternative: HNC for sub-namespace self-service
If the team already has a parent namespace, they can create sub-namespaceswithout any platform team intervention:
kubectl hns create payments-feature-x -n payments# Creates "payments-feature-x" sub-namespace# Inherits all RBAC, quotas, and policies from "payments"# Team admin can do this — no cluster-admin neededScenario 5: “A developer escalated their RBAC permissions. How do you detect and prevent this?”
Section titled “Scenario 5: “A developer escalated their RBAC permissions. How do you detect and prevent this?””Answer:
“RBAC escalation happens when someone creates a RoleBinding to a more powerful ClusterRole. I would detect it with audit logging and prevent it with admission policies.”
Detection:
# Kubernetes audit logs — look for RoleBinding/ClusterRoleBinding changes# Filter for non-platform-admin users creating bindings
# AWS: audit logs go to CloudWatch Logs (enabled in EKS logging)# Query CloudWatch Logs Insights:fields @timestamp, user.username, objectRef.resource, objectRef.name, verb| filter objectRef.resource = "rolebindings" or objectRef.resource = "clusterrolebindings"| filter verb in ["create", "update", "patch"]| filter user.username not like /platform-admin/| sort @timestamp desc# Use RBAC Lookup tool for periodic auditskubectl get rolebindings,clusterrolebindings --all-namespaces -o json | \ jq '.items[] | select(.roleRef.name == "cluster-admin") | {name: .metadata.name, namespace: .metadata.namespace}'Prevention:
# Gatekeeper policy — prevent non-admins from creating RoleBindings# that reference system:masters or cluster-adminapiVersion: templates.gatekeeper.sh/v1kind: ConstraintTemplatemetadata: name: k8sblockescalationspec: crd: spec: names: kind: K8sBlockEscalation validation: openAPIV3Schema: type: object properties: blockedRoles: type: array items: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package k8sblockescalation
violation[{"msg": msg}] { input.review.kind.kind == "RoleBinding" input.review.object.roleRef.name == input.parameters.blockedRoles[_] msg := sprintf( "RoleBinding to <%v> is not allowed. Contact platform team.", [input.review.object.roleRef.name] ) } violation[{"msg": msg}] { input.review.kind.kind == "ClusterRoleBinding" input.review.object.roleRef.name == input.parameters.blockedRoles[_] msg := sprintf( "ClusterRoleBinding to <%v> is not allowed. Contact platform team.", [input.review.object.roleRef.name] ) }---apiVersion: constraints.gatekeeper.sh/v1beta1kind: K8sBlockEscalationmetadata: name: block-escalationspec: match: kinds: - apiGroups: ["rbac.authorization.k8s.io"] kinds: ["RoleBinding", "ClusterRoleBinding"] parameters: blockedRoles: - "cluster-admin" - "admin" - "namespace-admin" # Our custom powerful roleAdditional safeguards:
- RBAC: team admins have no
create/updateverb onrolebindingsorclusterrolebindings - Audit alerts: PagerDuty alert on any RoleBinding change outside of Terraform/ArgoCD
- Periodic review: monthly RBAC audit comparing actual bindings to Terraform state
Scenario 6: “How do you migrate from aws-auth ConfigMap to EKS access entries?”
Section titled “Scenario 6: “How do you migrate from aws-auth ConfigMap to EKS access entries?””Answer:
“This is a phased migration with zero downtime. The key is using API_AND_CONFIG_MAP mode during transition.”
Migration steps:
# Terraform migration example
# Phase 1: Enable dual moderesource "aws_eks_cluster" "main" { # ... access_config { authentication_mode = "API_AND_CONFIG_MAP" bootstrap_cluster_creator_admin_permissions = true }}
# Phase 2: Recreate all aws-auth entries as access entries# (run in parallel with existing aws-auth — both work)
resource "aws_eks_access_entry" "platform_admins" { cluster_name = aws_eks_cluster.main.name principal_arn = "arn:aws:iam::111111111111:role/PlatformAdminRole" type = "STANDARD"}
# ... repeat for all roles/users ...
# Phase 4: After validation, switch to API-only# Change authentication_mode to "API"# terraform apply — this disables aws-authQuick Reference
Section titled “Quick Reference”Multi-Tenancy Checklist
Section titled “Multi-Tenancy Checklist”For every team namespace, verify:
□ Namespace exists with correct labels□ Pod Security Standards: enforce=restricted□ ResourceQuota applied (CPU, memory, pods)□ LimitRange applied (defaults, min, max)□ NetworkPolicy: default-deny-all□ NetworkPolicy: allow DNS egress□ NetworkPolicy: allow intra-namespace□ NetworkPolicy: allow monitoring scraping□ RBAC: RoleBinding to team's IAM group□ RBAC: team cannot create RoleBindings□ Gatekeeper: require-private-registry active□ Gatekeeper: required-labels active□ ESO: ClusterSecretStore accessible□ Cost labels: team, cost-centerRBAC Verbs Reference
Section titled “RBAC Verbs Reference”| Verb | Description | Example Use |
|---|---|---|
get | Read a single resource by name | kubectl get pod my-pod |
list | List all resources of a type | kubectl get pods |
watch | Stream changes (used by controllers) | kubectl get pods -w |
create | Create new resources | kubectl apply -f deployment.yaml |
update | Replace entire resource | kubectl replace -f deployment.yaml |
patch | Partial update | kubectl patch deployment ... |
delete | Remove a resource | kubectl delete pod my-pod |
deletecollection | Remove all resources of a type | kubectl delete pods --all |
impersonate | Act as another user | kubectl --as=other-user get pods |
bind | Create RoleBindings (special) | Required to bind roles |
escalate | Modify roles beyond own permissions | Required to edit ClusterRoles |
Debugging RBAC
Section titled “Debugging RBAC”# Check if a user can perform an actionkubectl auth can-i create deployments -n payments --as="payments-admin"# yes
kubectl auth can-i create clusterrolebindings --as="payments-admin"# no
# List all permissions for a userkubectl auth can-i --list --as="payments-admin" -n payments
# Who has cluster-admin?kubectl get clusterrolebindings -o json | \ jq -r '.items[] | select(.roleRef.name=="cluster-admin") | .subjects[]?.name'
# What can a ServiceAccount do?kubectl auth can-i --list --as=system:serviceaccount:payments:payment-sa -n payments