Skip to content

Control Plane & Internals

EKS and GKE clusters run in Workload Accounts/Projects, consuming VPCs from the Network Hub via Transit Gateway attachments (AWS) or Shared VPC service projects (GCP). The central infrastructure team provisions and manages the clusters. Tenant application teams deploy workloads into namespaces with RBAC boundaries, resource quotas, and network policies.

Enterprise Reference Architecture — Kubernetes Placement


Every Kubernetes cluster has a control plane (the brain) and worker nodes (the muscle). Understanding each component at the API level is essential for architect-level interviews.

The API server is the front door to the entire cluster. Every interaction goes through it: kubectl, kubelet, controllers, the scheduler, external admission webhooks — everything.

What it does:

  • Exposes the Kubernetes API as a RESTful HTTP service
  • Authenticates every request (client certificates, bearer tokens, OIDC, webhook)
  • Authorizes every request (RBAC, ABAC, webhook, Node authorizer)
  • Runs admission controllers (mutating and validating webhooks)
  • Validates and persists objects to etcd
  • Serves as the only component that talks to etcd directly

Key details for interviews:

  • Stateless and horizontally scalable (EKS/GKE run multiple replicas behind a load balancer)
  • Supports watch semantics: clients open long-lived HTTP connections to receive change notifications
  • Request flow: Authentication -> Authorization -> Mutating Admission -> Schema Validation -> Validating Admission -> etcd write
  • Rate limited via --max-requests-inflight and --max-mutating-requests-inflight
  • API priority and fairness (APF) in newer versions allows flow control per-user/group

A distributed key-value store that holds ALL cluster state. If etcd is lost without backup, the cluster is unrecoverable.

What it stores:

  • Every Kubernetes object: Pods, Deployments, Services, ConfigMaps, Secrets, RBAC rules
  • All custom resources (CRDs and their instances)
  • Lease objects for leader election (controller manager, scheduler)

How it works:

  • Raft consensus protocol: requires a quorum (majority) for writes
  • 3-node cluster tolerates 1 failure; 5-node cluster tolerates 2 failures
  • Leader handles all writes; followers replicate
  • Sequential consistency for reads (linearizable reads optional but slower)
  • Performance-sensitive: latency spikes in etcd directly impact API server response times

Enterprise considerations:

  • EKS: AWS manages etcd entirely — encrypted at rest with AWS KMS, replicated across 3 AZs, automatic backups
  • GKE: Google manages etcd — encrypted at rest with Google-managed or CMEK keys, replicated across zones
  • Self-managed: you must handle backups (etcdctl snapshot save), defragmentation, compaction, and disk performance (SSD required)

Watches for newly created Pods that have no node assignment and selects the best node to run them.

Scheduling algorithm (two phases):

  1. Filtering — eliminates nodes that cannot run the Pod:

    • Insufficient CPU or memory (vs resource requests)
    • Node taints the Pod does not tolerate
    • Node affinity rules not satisfied
    • Pod topology spread constraints violated
    • PVC zone constraints (EBS volumes are AZ-bound)
  2. Scoring — ranks remaining nodes:

    • Least requested resources (spread load)
    • Node affinity preferences (soft rules)
    • Inter-pod affinity/anti-affinity
    • Image locality (node already has the container image)

Key details:

  • Only one scheduler instance is active at a time (leader election via Lease object)
  • Custom schedulers can run alongside the default scheduler
  • Scheduler profiles (v1.25+) allow multiple scheduling configurations
  • The scheduler writes a Binding object to the API server (not directly to etcd)

Controller Manager (kube-controller-manager)

Section titled “Controller Manager (kube-controller-manager)”

Runs control loops (controllers) that watch cluster state and make changes to move from current state to desired state.

Key controllers:

ControllerWhat It Does
DeploymentCreates/updates ReplicaSets based on Deployment spec
ReplicaSetEnsures correct number of Pod replicas exist
NodeMonitors node health, marks NotReady, evicts pods after timeout
JobTracks job completions, manages pod creation for batch work
EndpointSlicePopulates endpoint slices for Services
ServiceAccountCreates default ServiceAccount in new namespaces
NamespaceHandles namespace lifecycle (finalizer cleanup)
PV/PVCBinds PersistentVolumeClaims to PersistentVolumes
TTLCleans up finished Jobs after TTL expires

How a control loop works: Generic Controller Loop Pattern

Runs cloud-specific control loops that integrate Kubernetes with the underlying cloud provider.

What it manages:

  • Node controller: registers nodes with cloud metadata (instance type, zone, external IP), removes nodes when instances are terminated
  • Route controller: configures cloud network routes so pods can communicate across nodes
  • Service controller: creates cloud load balancers (NLB, ALB, Cloud Load Balancer) when a Service has type: LoadBalancer

In EKS and GKE, this runs as part of the managed control plane. You interact with it indirectly through annotations on Services and Ingress resources.

The agent that runs on every worker node. It is responsible for the actual container lifecycle.

What it does:

  • Watches the API server for Pods assigned to its node
  • Pulls container images via the container runtime (containerd)
  • Starts, stops, and monitors containers
  • Runs liveness, readiness, and startup probes
  • Reports node status (capacity, allocatable, conditions) to the API server
  • Manages volume mounts (calls CSI driver for persistent volumes)
  • Manages pod sandbox creation via the Container Runtime Interface (CRI)

Key details:

  • kubelet does NOT run as a container — it is a systemd service on the node
  • Communicates with the API server over TLS (client certificate authentication)
  • Cadvisor is embedded in kubelet for container resource metrics
  • Node allocatable = total capacity minus system reserved minus kube reserved minus eviction threshold

Node Resource Allocation

Runs on every node and implements the Service abstraction by programming network rules.

Modes:

ModeHow It WorksTrade-offs
iptables (default)Creates iptables rules for each Service/endpoint pairSimple, but O(n) rule evaluation; slow with 10K+ services
IPVSUses Linux IPVS (IP Virtual Server) kernel moduleO(1) lookup, supports more LB algorithms, better at scale
nftables (v1.29+)Uses nftables instead of iptablesModern replacement, atomic rule updates

What it does:

  • ClusterIP: routes traffic from cluster-ip:port to a backend Pod IP
  • NodePort: opens a port on every node, forwards to backend Pods
  • LoadBalancer: works with cloud controller to expose via external LB
  • Session affinity via sessionAffinity: ClientIP

Cluster-internal DNS server deployed as a Deployment (typically 2 replicas) in kube-system.

What it resolves:

  • service-name.namespace.svc.cluster.local -> ClusterIP
  • pod-ip-dashed.namespace.pod.cluster.local -> Pod IP
  • Headless services: returns individual Pod IPs (A records)
  • External DNS: forwards to upstream DNS (VPC DNS resolver)

Configuration: CoreDNS is configured via a ConfigMap (coredns in kube-system). Common customizations:

  • Forward specific domains to on-prem DNS servers (hybrid cloud)
  • Add custom DNS entries
  • Enable logging for troubleshooting

Enterprise pattern:

CoreDNS Corefile (enterprise hybrid example)
=============================================
.:53 {
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
forward corp.bank.internal 10.200.0.2 10.200.0.3 # on-prem DNS
forward . /etc/resolv.conf # VPC DNS resolver
cache 30
reload
}

The actual process that manages container lifecycle on the node.

  • Kubernetes communicates with containerd via the CRI (Container Runtime Interface)
  • Docker was removed as a runtime in Kubernetes 1.24; containerd is the standard
  • containerd pulls images, creates container sandboxes (via runc), manages storage
  • Both EKS and GKE use containerd as the default runtime
  • Image pull policies: Always, IfNotPresent, Never

Kubernetes Control Plane — kubectl apply Request Flow

When you run kubectl apply -f deployment.yaml, here is exactly what happens:

kubectl apply flow through the Kubernetes API

Control Loop Flow (Deployment -> ReplicaSet -> Pod)

Section titled “Control Loop Flow (Deployment -> ReplicaSet -> Pod)”

Controller Chain — Deployment to Running Pod

Service Routing — ClusterIP Traffic Flow


EKS — How AWS Implements the Control Plane

Section titled “EKS — How AWS Implements the Control Plane”

EKS Control Plane Architecture

Control plane access:

  • Public + Private (default): API server reachable from internet + from within VPC
  • Private only (enterprise standard): API server only reachable from within VPC via ENIs. Requires VPN/Direct Connect or a bastion host. This is what banks use.

Node types:

TypeWhen to UseBank Recommendation
Managed Node GroupsStandard workloads, auto-handles drain/upgradePrimary choice
Self-ManagedCustom AMIs, GPU, specific kernel configSpecial cases only
FargateServerless pods, burst workloads, isolation per podBatch jobs, dev/test
KarpenterIntelligent autoscaling, mixed instance typesCost optimization

VPC CNI (Amazon VPC CNI Plugin):

  • Pods get real VPC IP addresses from the node’s subnet
  • No overlay network; pods are directly routable in the VPC
  • Enables Security Groups for Pods — apply VPC security groups to individual pods
  • IP address management: each ENI provides a pool of secondary IPs
  • Prefix delegation mode: assign /28 prefixes instead of individual IPs (more pod density)
  • Custom networking: pods can use different subnets than nodes (separate CIDR for pods)

EKS Add-ons (managed by AWS):

  • vpc-cni — networking
  • kube-proxy — service routing
  • coredns — cluster DNS
  • aws-ebs-csi-driver — EBS persistent volumes
  • aws-efs-csi-driver — EFS shared storage
  • adot — AWS Distro for OpenTelemetry
  • aws-guardduty-agent — runtime threat detection

EKS Pod Identity (replaces IRSA):

  • Simpler than IRSA — no OIDC provider per cluster
  • Associate a Kubernetes ServiceAccount with an IAM role via the EKS API
  • Pod automatically gets temporary IAM credentials
  • Supported in EKS add-ons and custom workloads

GKE Control Plane Architecture

Cluster type:

TypeControl PlaneNodesSLABank Recommendation
Zonal StandardSingle zoneYou manage99.5%Dev/test only
Regional Standard3 zonesYou manage99.95%Production workloads
Autopilot3 zonesGoogle manages99.95%Simpler operations

GKE Autopilot vs Standard:

  • Standard: You create and manage node pools, choose instance types, handle node upgrades, configure node auto-provisioning
  • Autopilot: Google manages everything below the Pod spec. You define workloads; Google provisions the right nodes. Per-pod billing. Built-in security hardening. Increasingly the recommended default.

Networking (VPC-native / alias IPs):

  • Pods get IPs from secondary IP ranges on the subnet (alias IPs)
  • No overlay network; pods are natively routable in the VPC
  • Each node gets a /24 of pod IPs by default (max 110 pods per node)
  • Shared VPC: GKE cluster in service project uses subnets from host project
  • Dataplane V2 (default on new clusters): eBPF-based (Cilium), replaces kube-proxy, provides built-in network policy enforcement

Release channels:

ChannelUpdate SpeedStabilityUse Case
RapidLatest versions, frequent updatesLeast stableTesting new features
RegularBalanced (2-3 months after Rapid)Good stabilityNon-critical production
StableMost tested (2-3 months after Regular)Most stableCritical production
ExtendedExtra long support (24 months)Patch-onlyBanks, regulated industries

GKE Enterprise (formerly Anthos):

  • Fleet management: manage multiple GKE clusters as one logical unit
  • Config Sync: GitOps-based configuration management across clusters
  • Policy Controller: OPA-based policy enforcement
  • Service Mesh (managed Istio)
  • Multi-cluster Services: service discovery across clusters

Workload Identity Federation (for GKE):

  • Map a Kubernetes ServiceAccount to a Google Cloud IAM service account
  • Pods automatically receive GCP credentials without storing keys
  • Enabled per node pool; requires annotation on the K8s ServiceAccount

Terraform — EKS vs GKE Cluster Provisioning

Section titled “Terraform — EKS vs GKE Cluster Provisioning”
# ============================================================
# EKS Cluster — Private Endpoint, Managed Node Groups
# Workload Account, consuming VPC from Network Hub via TGW
# ============================================================
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = "prod-banking-eks"
cluster_version = "1.31"
# Private cluster — API server only accessible within VPC
cluster_endpoint_public_access = false
cluster_endpoint_private_access = true
# VPC from Network Hub (TGW-attached subnets)
vpc_id = data.aws_vpc.workload.id
subnet_ids = data.aws_subnets.private.ids
# Control plane logging
cluster_enabled_log_types = [
"api", "audit", "authenticator",
"controllerManager", "scheduler"
]
# KMS encryption for secrets
cluster_encryption_config = {
provider_key_arn = aws_kms_key.eks_secrets.arn
resources = ["secrets"]
}
# EKS Add-ons (managed by AWS)
cluster_addons = {
vpc-cni = {
most_recent = true
configuration_values = jsonencode({
env = {
ENABLE_PREFIX_DELEGATION = "true" # more pod IPs per node
WARM_PREFIX_TARGET = "1"
}
})
}
kube-proxy = { most_recent = true }
coredns = { most_recent = true }
aws-ebs-csi-driver = {
most_recent = true
service_account_role_arn = module.ebs_csi_irsa.iam_role_arn
}
}
# Managed Node Groups
eks_managed_node_groups = {
# General purpose — application workloads
general = {
ami_type = "AL2023_x86_64_STANDARD"
instance_types = ["m7i.xlarge", "m6i.xlarge"]
capacity_type = "ON_DEMAND"
min_size = 3
max_size = 20
desired_size = 6
# Spread across AZs
subnet_ids = data.aws_subnets.private.ids
labels = {
workload-type = "general"
}
# Node group update config
update_config = {
max_unavailable_percentage = 33 # rolling update
}
}
# Memory-optimized — caching, in-memory processing
memory_optimized = {
ami_type = "AL2023_x86_64_STANDARD"
instance_types = ["r7i.2xlarge", "r6i.2xlarge"]
capacity_type = "ON_DEMAND"
min_size = 0
max_size = 10
desired_size = 2
labels = {
workload-type = "memory-optimized"
}
taints = [{
key = "workload-type"
value = "memory-optimized"
effect = "NO_SCHEDULE"
}]
}
}
# Access configuration — use EKS access entries (not aws-auth ConfigMap)
authentication_mode = "API_AND_CONFIG_MAP"
access_entries = {
platform_admins = {
principal_arn = "arn:aws:iam::role/PlatformAdminRole"
policy_associations = {
admin = {
policy_arn = "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
access_scope = { type = "cluster" }
}
}
}
}
tags = {
Environment = "production"
ManagedBy = "terraform"
Team = "platform-engineering"
}
}
# Pod Identity association (replaces IRSA for new workloads)
resource "aws_eks_pod_identity_association" "app" {
cluster_name = module.eks.cluster_name
namespace = "payments"
service_account = "payments-api"
role_arn = aws_iam_role.payments_api.arn
}

FeatureEKSGKE
Control plane HA3 AZs (always)Regional: 3 zones; Zonal: 1 zone
SLA99.95%Regional: 99.95%; Zonal: 99.5%
etcdAWS-managed, KMS encryptedGoogle-managed, CMEK optional
Pod networkingVPC CNI (real VPC IPs)VPC-native (alias IPs)
Service routingkube-proxy (iptables/IPVS)Dataplane V2 (eBPF/Cilium)
ServerlessFargate (per-pod, no visible nodes)Autopilot (managed nodes, per-pod billing)
IAM integrationPod Identity / IRSAWorkload Identity Federation
Version managementManual or managed upgradesRelease channels (Rapid/Regular/Stable/Extended)
Add-on managementEKS Add-ons (vpc-cni, coredns, etc.)GKE Add-ons (auto-managed)
Multi-clusterEKS Connector (limited)GKE Enterprise / Fleet management
Node upgradesRolling update per node groupSurge upgrades per node pool
Network policyCalico (add-on)Dataplane V2 (built-in)
Image securityECR scanning, GuardDutyBinary Authorization, Artifact Analysis
Hybrid/on-premEKS AnywhereGKE Enterprise (Anthos)

Scenario 1: “Walk me through how a pod gets scheduled in Kubernetes”

Section titled “Scenario 1: “Walk me through how a pod gets scheduled in Kubernetes””

What the interviewer wants: Proof that you understand the full component chain, not just “the scheduler picks a node.”

Strong answer structure:

“When I run kubectl apply, the request hits the API server, which authenticates the client — in EKS that means validating the IAM identity via the OIDC authenticator, or in GKE it is validated via Google OAuth. Then RBAC authorization checks whether this identity can create a Deployment in the target namespace.

Next, mutating admission controllers run — this is where things like Istio sidecar injection, default resource limits from LimitRange, or OPA/Gatekeeper policy mutations happen. Then validating admission controllers enforce policies like ‘all containers must have resource limits’ or ‘images must come from our approved registry.’

Once the Deployment object is persisted in etcd, the Deployment controller creates a ReplicaSet. The ReplicaSet controller then creates the individual Pod objects — but without a node assignment yet.

The scheduler is watching for these unscheduled pods. It runs the filtering phase — eliminating nodes where the pod cannot fit due to resource requests, taints, affinity rules, or topology constraints. Then the scoring phase ranks remaining nodes by criteria like balanced resource utilization, topology spread, and image locality.

Once the scheduler picks a node, it writes a Binding object. The kubelet on that node sees the new assignment via its watch, pulls the container image through containerd, calls the CNI plugin (VPC CNI on EKS, Cilium on GKE) to set up networking, calls the CSI driver if volumes are needed, and starts the containers.

Finally, the kubelet runs startup probes (if configured), then readiness probes. Once readiness passes, the EndpointSlice controller adds the pod’s IP to the Service’s endpoint list, and kube-proxy (or Dataplane V2 on GKE) updates its routing rules so traffic can reach the pod.”


Scenario 2: “What happens when a node goes down?”

Section titled “Scenario 2: “What happens when a node goes down?””

Answer:

Timeline of Events When a Node Fails


Scenario 3: “Design an EKS cluster for production at a bank”

Section titled “Scenario 3: “Design an EKS cluster for production at a bank””

Key decisions to walk through:

  1. Networking: Private endpoint only. VPC from Network Hub via TGW. VPC CNI with prefix delegation for pod density. Custom networking for separate pod CIDR.

  2. Node strategy: Managed node groups with m7i.xlarge for general, r7i.2xlarge for memory-intensive. On-Demand for production (no Spot for banking). Karpenter for intelligent bin-packing.

  3. Security: EKS Pod Identity for IAM roles. KMS encryption for secrets. GuardDuty EKS Runtime Monitoring. No public ECR — use private ECR in Shared Services account with cross-account access.

  4. Multi-tenancy: Namespace per team. RBAC via EKS access entries. ResourceQuotas and LimitRanges. Network policies (Calico).

  5. Observability: Control plane logging to CloudWatch (audit, API, authenticator). Prometheus + Grafana in Shared Services account. Fluent Bit DaemonSet for application logs.

  6. Upgrades: Blue-green node groups for zero-downtime. Test in staging first. PodDisruptionBudgets on all workloads.


Scenario 4: “GKE Standard vs Autopilot — when do you choose each?”

Section titled “Scenario 4: “GKE Standard vs Autopilot — when do you choose each?””

Standard when:

  • You need DaemonSets with privileged access (legacy security agents)
  • Specific machine types (GPU, high-memory) with fine-grained control
  • Custom node images or kernel tuning
  • You want control over node pool topology and placement

Autopilot when:

  • You want minimal operational overhead
  • Per-pod billing is preferred (no paying for unused node capacity)
  • Security hardening out of the box (no SSH to nodes, no privileged containers by default)
  • Teams should focus on workloads, not infrastructure
  • Recommended default for new GKE clusters unless you have a specific Standard requirement

“At a bank, I would start with GKE Autopilot for application workloads where the team does not need node-level control. For workloads requiring specific hardware (GPU for ML, high-IOPS for databases) or legacy security agents running as privileged DaemonSets, I would use Standard with dedicated node pools. A fleet can mix both.”


Scenario 5: “How do EKS and GKE differ in control plane architecture?”

Section titled “Scenario 5: “How do EKS and GKE differ in control plane architecture?””
DimensionEKSGKE
IsolationControl plane in AWS-managed VPC, ENIs in your VPCControl plane in Google-managed VPC, peered to yours
AccessNLB endpoint (public/private)Internal LB (private endpoint)
NetworkingVPC CNI (real VPC IPs, iptables-based routing)VPC-native alias IPs, Dataplane V2 (eBPF)
Version controlManual upgrade + managed add-on updatesRelease channels with auto-upgrade
IAM bindingPod Identity / IRSA (OIDC federation)Workload Identity Federation (metadata server)
etcd accessNone (fully managed)None (fully managed)
LoggingCloudWatch Logs (opt-in per log type)Cloud Logging (opt-in per component)

“Both are managed — you never touch API server flags or etcd. The biggest architectural difference is networking: EKS uses VPC CNI where pods get first-class VPC IPs and you still use kube-proxy for service routing. GKE uses Dataplane V2 which is eBPF-based, replaces kube-proxy entirely, and provides built-in network policy enforcement without needing Calico.”


Scenario 6: “How do you handle Kubernetes version upgrades across 10 clusters?”

Section titled “Scenario 6: “How do you handle Kubernetes version upgrades across 10 clusters?””

Strategy:

Upgrade Pipeline

EKS approach:

  • Upgrade control plane version (AWS handles this, ~15 min)
  • Upgrade managed add-ons (VPC CNI, CoreDNS, kube-proxy, CSI drivers)
  • Create new node group with new AMI → drain old node group → delete old
  • Use PodDisruptionBudgets to prevent service disruption during drain

GKE approach:

  • Use release channels — clusters auto-upgrade within channel cadence
  • Stable channel for production (most tested, longest lead time)
  • Maintenance windows to control when auto-upgrades happen
  • Surge upgrade settings: max_surge=1, max_unavailable=0 for zero-downtime node upgrades
  • Use GKE Enterprise fleet management to orchestrate upgrades across clusters

Scenario 7: “Explain the difference between EKS Fargate and GKE Autopilot”

Section titled “Scenario 7: “Explain the difference between EKS Fargate and GKE Autopilot””
DimensionEKS FargateGKE Autopilot
What it managesIndividual pods in micro-VMsNodes (but you never manage them)
Node visibilityNo nodes visible (kubectl get nodes shows virtual nodes)Nodes are visible but fully managed
BillingPer pod (vCPU + memory per second)Per pod (CPU, memory, ephemeral storage)
DaemonSetsNot supportedSupported
Privileged containersNot supportedSupported (with restrictions)
GPUNot supportedSupported
Persistent volumesEFS only (no EBS)PD-SSD, PD-Balanced supported
Startup time30-60 seconds (cold start)Standard pod startup
Max pods per namespaceFargate profile limitsNo special limits
SecurityStrong isolation (micro-VM per pod)Node-level isolation (hardened)

“Fargate is true serverless — each pod runs in its own Firecracker micro-VM with no shared kernel. But the restrictions are significant: no DaemonSets means no Datadog agent, no Fluentd. Autopilot is more like ‘managed nodes’ — Google handles the underlying nodes, but you can still run DaemonSets, GPU workloads, and use block storage. For a bank, I would use Fargate selectively for batch jobs and isolated workloads, and use managed node groups for everything else on EKS. On GKE, Autopilot is viable for most workloads.”


Scenario 8: “Your API server is slow — what could be causing it?”

Section titled “Scenario 8: “Your API server is slow — what could be causing it?””

Diagnostic checklist:

API Server Slowness — Root Cause Tree

How to diagnose on EKS:

Terminal window
# Check API server metrics (EKS exposes via /metrics endpoint from within VPC)
kubectl get --raw /metrics | grep apiserver_request_duration_seconds
# Check for slow requests
kubectl get --raw /metrics | grep apiserver_request_total | grep SLOW
# Check etcd request latency
kubectl get --raw /metrics | grep etcd_request_duration_seconds
# List all webhooks and check for failures
kubectl get mutatingwebhookconfigurations
kubectl get validatingwebhookconfigurations
# Check API server audit logs (CloudWatch)
# Look for high-latency requests, 429 (throttled), 504 (timeout)
# Check if control plane events exist
kubectl get events -n kube-system --sort-by='.lastTimestamp'

How to diagnose on GKE:

Terminal window
# GKE Cloud Logging — API server logs
# Filter: resource.type="k8s_cluster" AND
# protoPayload.methodName="io.k8s.*" AND
# protoPayload.status.code >= 500
# GKE Metrics Explorer
# kubernetes.io/apiserver/request_duration_seconds
# kubernetes.io/apiserver/request_count (group by response code)
# Check webhook configurations
kubectl get mutatingwebhookconfigurations -o json | \
jq '.items[].webhooks[].timeoutSeconds'

  1. Know the flow. The kubectl apply -> API server -> etcd -> controller -> scheduler -> kubelet chain is the most asked Kubernetes question. Practice explaining it without notes.

  2. EKS vs GKE is not “which is better.” It is about trade-offs. VPC CNI vs alias IPs, kube-proxy vs Dataplane V2, IRSA/Pod Identity vs Workload Identity, manual upgrades vs release channels.

  3. Enterprise decisions matter more than component theory. The interviewer wants to hear “private endpoint, KMS encryption, managed node groups, Pod Identity” — not just “the scheduler assigns pods to nodes.”

  4. Always think about failure modes. What happens when a node dies? What happens when etcd is slow? What happens when a webhook is down? These are the questions that separate senior engineers from architects.

  5. Managed does not mean you ignore it. You still need to understand what EKS/GKE manages for you, because when something goes wrong, you need to know where to look. You cannot debug API server latency if you do not know which components are involved.


  • EKS Best Practices Guide — operational excellence, security, reliability, and cost optimization for EKS
  • EKS Workshop — hands-on labs covering EKS fundamentals, autoscaling, observability, and security