Storage — PV, PVC, CSI Drivers

Where This Fits

In the enterprise bank architecture, the platform team defines approved StorageClasses with encryption, performance tiers, and backup policies. Tenant teams create PVCs referencing these StorageClasses. The CSI drivers (deployed as DaemonSets + controller Deployments) handle the actual volume provisioning in AWS or GCP.

Persistent Volumes (PV), Persistent Volume Claims (PVC), StorageClasses

The Three Objects

StorageClass, PersistentVolume, and PersistentVolumeClaim

Dynamic Provisioning (the standard approach)

This is how 99% of enterprise storage works in Kubernetes:

Dynamic Provisioning Flow

Static Provisioning (rare, but know it)

Admin manually creates a PV pointing to an existing volume, then creates a PVC that binds to it. Used for:

Pre-existing volumes with data (migration scenarios)
Volumes that must not be auto-deleted
Cross-account volume sharing (rare)

Access Modes

Mode	Short	Description	EBS/PD	EFS/Filestore
ReadWriteOnce	RWO	Single node read/write	Yes	Yes
ReadOnlyMany	ROX	Multiple nodes read-only	Via snapshot	Yes
ReadWriteMany	RWX	Multiple nodes read/write	No	Yes
ReadWriteOncePod	RWOP	Single pod read/write (K8s 1.27+)	Yes	Yes

Volume Binding Modes

Mode	Behavior	When to Use
Immediate	PV created as soon as PVC is created	When AZ does not matter (rare)
WaitForFirstConsumer	PV created when a pod using the PVC is scheduled	Always use this for block storage

# WHY WaitForFirstConsumer matters:
#
# Scenario: 3-AZ cluster, PVC with Immediate binding
#   PVC created → PV created in AZ-a (random)
#   Pod scheduled to AZ-b (best fit for resources)
#   PROBLEM: EBS volume in AZ-a, pod in AZ-b → cannot attach!
#
# Solution: WaitForFirstConsumer
#   PVC created → stays Pending
#   Pod scheduled to AZ-b
#   PV created in AZ-b (same AZ as pod) → attaches successfully

Reclaim Policies

Policy	What Happens When PVC is Deleted	Use Case
Delete (default for dynamic)	PV and underlying cloud volume are deleted	Dev/test, ephemeral data
Retain	PV becomes “Released”, cloud volume kept	Production databases, audit data

For banking, use Retain for all production data. Even after a PVC is deleted, the underlying EBS/PD volume remains for recovery.

CSI Drivers — EKS vs GKE

EBS CSI Driver (AWS)

The primary block storage driver for EKS. Deployed as an EKS managed add-on.

EBS volume types for Kubernetes:

Type	IOPS	Throughput	Use Case	Cost
gp3	3,000 (free) up to 16,000	125 MiB/s (free) up to 1,000 MiB/s	General purpose, most workloads	Lowest
io2	Up to 64,000 (provisioned)	Up to 1,000 MiB/s	Databases (PostgreSQL, MongoDB)	Highest
io2 Block Express	Up to 256,000	Up to 4,000 MiB/s	Extreme IOPS (SAP HANA)	Very high
st1	Throughput-optimized	Up to 500 MiB/s	Log processing, data warehousing	Low

EFS CSI Driver (AWS)

For shared storage (ReadWriteMany). Essential when multiple pods across nodes need to read/write the same files.

When to use EBS vs EFS:

Dimension	EBS (block)	EFS (file)
Access mode	RWO / RWOP	RWX / ROX / RWO
Performance	High IOPS, low latency	Lower IOPS, higher latency
AZ scope	Single AZ	Multi-AZ
Use case	Databases, Kafka, single-pod workloads	Shared config, CMS uploads, ML training data
Cost	Per GB provisioned	Per GB used (+ throughput)
Backup	EBS Snapshots	EFS Backup (AWS Backup)

Persistent Disk CSI Driver (GCP)

The primary block storage driver for GKE. Built into GKE (no manual installation needed).

GCP Persistent Disk types:

Type	IOPS (read)	IOPS (write)	Throughput	Use Case	Cost
pd-standard	0.75/GiB	1.5/GiB	120 MiB/s	Dev/test, logs	Lowest
pd-balanced	6/GiB	6/GiB	240 MiB/s	General purpose	Medium
pd-ssd	30/GiB	30/GiB	480 MiB/s	Databases, Kafka	Higher
pd-extreme	Up to 120K	Up to 120K	Up to 2,400 MiB/s	SAP HANA, Oracle	Highest
hyperdisk-extreme	Up to 350K	Up to 350K	Up to 5,000 MiB/s	Extreme performance	Very high

Regional Persistent Disks:

GCP offers regional PDs that replicate data across two zones. This is a major differentiator from AWS EBS (which is single-AZ only).

Filestore CSI Driver (GCP)

GCP’s managed NFS service. Equivalent to AWS EFS.

Filestore tiers:

Tier	Min Size	Performance	Use Case
Basic HDD	1 TiB	Low IOPS	Archival, low-access shared data
Basic SSD	2.5 TiB	High IOPS	General shared storage
Zonal	1 TiB	Configurable IOPS/throughput	Flexible, new tier
Enterprise	1 TiB	Highest IOPS, regional replication	Mission-critical shared data

GCS FUSE CSI Driver (GCP)

Mount Google Cloud Storage buckets as file systems in pods. Useful for large datasets (ML training, data analytics).

# StorageClass — gp3 with encryption (standard for banking)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  kmsKeyId: arn:aws:kms:me-south-1:123456789012:key/mrk-abc123   # CMEK
  fsType: ext4
  iops: "3000"                      # gp3 baseline (free up to 3000)
  throughput: "125"                  # gp3 baseline (free up to 125 MiB/s)
reclaimPolicy: Retain               # keep volume after PVC deletion
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true           # allow PVC resize
---
# StorageClass — io2 for high-IOPS databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: io2-database
provisioner: ebs.csi.aws.com
parameters:
  type: io2
  encrypted: "true"
  kmsKeyId: arn:aws:kms:me-south-1:123456789012:key/mrk-abc123
  fsType: ext4
  iops: "10000"                     # provisioned IOPS
  iopsPerGB: "50"                   # alternative: scale with volume size
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

# StorageClass — pd-ssd with CMEK encryption
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-ssd-encrypted
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  disk-encryption-kms-key: projects/security-project/locations/me-central1/keyRings/gke-storage/cryptoKeys/pd-encryption
  fstype: ext4
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# StorageClass — pd-balanced (cost-effective general purpose)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: pd-balanced-encrypted
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-balanced
  disk-encryption-kms-key: projects/security-project/locations/me-central1/keyRings/gke-storage/cryptoKeys/pd-encryption
  replication-type: regional-pd      # replicated across 2 zones!
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

parameters:
  type: pd-ssd
  replication-type: regional-pd    # synchronous replication across 2 zones

# StorageClass — EFS with encryption
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-shared
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap           # EFS Access Points (recommended)
  fileSystemId: fs-0abc1234def56789
  directoryPerms: "700"
  uid: "1000"
  gid: "1000"
  encrypted: "true"
  kmsKeyId: arn:aws:kms:me-south-1:123456789012:key/mrk-def456
reclaimPolicy: Delete

# StorageClass — Filestore (shared storage)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: filestore-shared
provisioner: filestore.csi.storage.gke.io
parameters:
  tier: standard                     # or premium, enterprise
  network: projects/network-host-project/global/networks/shared-vpc
reclaimPolicy: Delete
volumeBindingMode: Immediate         # Filestore is multi-zonal, AZ not a concern
allowVolumeExpansion: true

# Pod mounting a GCS bucket via GCS FUSE
apiVersion: v1
kind: Pod
metadata:
  name: ml-training
  annotations:
    gke-gcsfuse/volumes: "true"     # enable GCS FUSE sidecar injection
spec:
  serviceAccountName: ml-training-sa  # needs Workload Identity
  containers:
    - name: trainer
      image: ml-training:v1.0
      volumeMounts:
        - name: training-data
          mountPath: /data
          readOnly: true
  volumes:
    - name: training-data
      csi:
        driver: gcsfuse.csi.storage.gke.io
        readOnly: true
        volumeAttributes:
          bucketName: bank-ml-training-data
          mountOptions: "implicit-dirs"

# EBS CSI Driver as EKS managed add-on
resource "aws_eks_addon" "ebs_csi" {
  cluster_name             = module.eks.cluster_name
  addon_name               = "aws-ebs-csi-driver"
  addon_version            = "v1.37.0-eksbuild.1"
  service_account_role_arn = module.ebs_csi_irsa.iam_role_arn
  resolve_conflicts_on_update = "OVERWRITE"
}

# IAM role for EBS CSI driver (Pod Identity or IRSA)
module "ebs_csi_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.0"

  role_name             = "ebs-csi-driver-${module.eks.cluster_name}"
  attach_ebs_csi_policy = true

  # Allow encryption with custom KMS key
  ebs_csi_kms_cmk_ids = [aws_kms_key.ebs_encryption.arn]

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }
}

# KMS key for EBS encryption
resource "aws_kms_key" "ebs_encryption" {
  description             = "KMS key for EBS volume encryption in EKS"
  deletion_window_in_days = 30
  enable_key_rotation     = true

  policy = data.aws_iam_policy_document.ebs_kms.json
}

# EFS File System
resource "aws_efs_file_system" "shared" {
  creation_token = "eks-shared-storage"
  encrypted      = true
  kms_key_id     = aws_kms_key.efs_encryption.arn

  performance_mode = "generalPurpose"
  throughput_mode  = "elastic"        # auto-scales throughput

  lifecycle_policy {
    transition_to_ia = "AFTER_30_DAYS"  # cost optimization
  }

  tags = {
    Name = "eks-shared-storage"
  }
}

# Mount targets in each AZ (same subnets as EKS nodes)
resource "aws_efs_mount_target" "shared" {
  for_each = toset(data.aws_subnets.private.ids)

  file_system_id  = aws_efs_file_system.shared.id
  subnet_id       = each.value
  security_groups = [aws_security_group.efs.id]
}

# Security group — allow NFS from EKS nodes
resource "aws_security_group" "efs" {
  name_prefix = "efs-eks-"
  vpc_id      = data.aws_vpc.workload.id

  ingress {
    from_port       = 2049
    to_port         = 2049
    protocol        = "tcp"
    security_groups = [module.eks.node_security_group_id]
  }
}

# KMS key for PD encryption (in Security Project)
resource "google_kms_crypto_key" "pd_encryption" {
  name            = "pd-encryption"
  key_ring        = google_kms_key_ring.gke_storage.id
  rotation_period = "7776000s"  # 90 days
  purpose         = "ENCRYPT_DECRYPT"
}

# Grant GKE service agent access to the KMS key
resource "google_kms_crypto_key_iam_member" "gke_sa" {
  crypto_key_id = google_kms_crypto_key.pd_encryption.id
  role          = "roles/cloudkms.cryptoKeyEncrypterDecrypter"
  member        = "serviceAccount:service-${data.google_project.workload.number}@container-engine-robot.iam.gserviceaccount.com"
}

# Filestore instance (if not using dynamic provisioning)
resource "google_filestore_instance" "shared" {
  name     = "gke-shared-storage"
  location = "me-central1-a"
  tier     = "BASIC_SSD"
  project  = var.workload_project_id

  file_shares {
    name       = "shared"
    capacity_gb = 2560   # 2.5 TiB minimum for Basic SSD
  }

  networks {
    network = "projects/${var.host_project_id}/global/networks/${var.vpc_name}"
    modes   = ["MODE_IPV4"]
  }
}

Encryption at Rest — Enterprise Requirement

In banking, ALL persistent volumes must be encrypted. No exceptions.

EBS Encryption with KMS (AWS)

EBS Encryption Architecture with KMS

Best practice: account-level default encryption. This ensures that even if someone creates a StorageClass without encrypted: "true", the volume is still encrypted.

Persistent Disk Encryption with CMEK (GCP)

GCP PD Encryption Architecture

Terraform — AWS EBS Encryption
Terraform — GCP CMEK Policy

# Force ALL EBS volumes in the account to be encrypted
resource "aws_ebs_encryption_by_default" "enabled" {
  enabled = true
}

resource "aws_ebs_default_kms_key" "default" {
  key_arn = aws_kms_key.ebs_encryption.arn
}

# Org policy: deny creation of resources without CMEK
resource "google_org_policy_policy" "require_cmek" {
  name   = "organizations/${var.org_id}/policies/gcp.restrictNonCmekServices"
  parent = "organizations/${var.org_id}"

  spec {
    rules {
      deny_all = "TRUE"   # deny all services that don't support CMEK
    }
  }
}

Volume Snapshots

Volume snapshots allow point-in-time backups of persistent volumes, stored as cloud snapshots (EBS Snapshots / PD Snapshots).

VolumeSnapshotClass

# AWS — VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: ebs-snapshot-class
driver: ebs.csi.aws.com
deletionPolicy: Retain              # keep snapshot even if VolumeSnapshot object deleted
parameters:
  encrypted: "true"
  kmsKeyId: arn:aws:kms:me-south-1:123456789012:key/mrk-abc123
---
# GCP — VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: pd-snapshot-class
driver: pd.csi.storage.gke.io
deletionPolicy: Retain
parameters:
  storage-locations: me-central1

Creating a Snapshot

# Take a snapshot of a PVC
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: kafka-data-snapshot-2026-03-15
  namespace: fraud-detection
spec:
  volumeSnapshotClassName: ebs-snapshot-class  # or pd-snapshot-class
  source:
    persistentVolumeClaimName: data-kafka-0    # PVC to snapshot

Restoring from a Snapshot

# Create a new PVC from a snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-kafka-0-restored
  namespace: fraud-detection
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp3-encrypted
  resources:
    requests:
      storage: 100Gi
  dataSource:
    name: kafka-data-snapshot-2026-03-15
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

Automated Snapshot CronJob

# CronJob to snapshot Kafka data nightly
apiVersion: batch/v1
kind: CronJob
metadata:
  name: kafka-snapshot-backup
  namespace: fraud-detection
spec:
  schedule: "0 3 * * *"             # 03:00 UTC daily
  timeZone: "Asia/Dubai"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: snapshot-manager
          restartPolicy: Never
          containers:
            - name: snapshot
              image: bitnami/kubectl:1.31
              command:
                - /bin/sh
                - -c
                - |
                  DATE=$(date +%Y-%m-%d)
                  for i in 0 1 2; do
                    cat <<SNAP | kubectl apply -f -
                  apiVersion: snapshot.storage.k8s.io/v1
                  kind: VolumeSnapshot
                  metadata:
                    name: kafka-data-${i}-${DATE}
                    namespace: fraud-detection
                  spec:
                    volumeSnapshotClassName: ebs-snapshot-class
                    source:
                      persistentVolumeClaimName: data-kafka-${i}
                  SNAP
                  done
                  # Clean up snapshots older than 7 days
                  kubectl get volumesnapshot -n fraud-detection \
                    --sort-by=.metadata.creationTimestamp \
                    -o name | head -n -21 | xargs -r kubectl delete -n fraud-detection

Volume Cloning

Create a new PVC from an existing PVC (no snapshot needed). Useful for creating test environments from production data.

# Clone a PVC (same StorageClass, same AZ)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: payments-db-clone
  namespace: payments-staging
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: gp3-encrypted
  resources:
    requests:
      storage: 100Gi
  dataSource:
    name: payments-db-data           # source PVC
    kind: PersistentVolumeClaim

Volume Resizing (Expansion)

Grow a PVC without downtime (for file systems that support online resize).

Prerequisites

# StorageClass must have allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
provisioner: ebs.csi.aws.com
allowVolumeExpansion: true           # THIS enables resize
# ...

Resize Process

# 1. Edit the PVC to increase size
kubectl patch pvc data-kafka-0 -n fraud-detection \
  -p '{"spec": {"resources": {"requests": {"storage": "200Gi"}}}}'

# 2. Check PVC conditions
kubectl get pvc data-kafka-0 -n fraud-detection -o yaml
# Look for condition: FileSystemResizePending

# 3. Volume is resized in cloud (EBS ModifyVolume / PD resize)
#    File system resize happens automatically when pod restarts
#    (or immediately with online resize support)

# 4. Verify
kubectl get pvc data-kafka-0 -n fraud-detection
# CAPACITY should show 200Gi

Resize flow:

PVC Resize Flow

Ephemeral Volumes

Volumes that live and die with the pod. No PVC needed.

Type	Description	Use Case
emptyDir	Empty directory created when pod starts, deleted when pod dies	Scratch space, inter-container data sharing
emptyDir (memory)	tmpfs-backed emptyDir	Sensitive data that must not touch disk
configMap	Mount ConfigMap as files	Application config files
secret	Mount Secret as files	TLS certificates, credentials
projected	Combine multiple volume sources into one mount	ServiceAccount token + ConfigMap + Secret
downwardAPI	Expose pod metadata as files	Pod name, namespace, labels

emptyDir — Scratch Space

spec:
  containers:
    - name: app
      volumeMounts:
        - name: scratch
          mountPath: /tmp/processing
    - name: sidecar
      volumeMounts:
        - name: scratch
          mountPath: /data           # same volume, different mount path
  volumes:
    - name: scratch
      emptyDir:
        sizeLimit: 5Gi              # evict pod if exceeded
---
# Memory-backed emptyDir (for secrets/sensitive processing)
  volumes:
    - name: sensitive-scratch
      emptyDir:
        medium: Memory              # tmpfs — never written to disk
        sizeLimit: 256Mi            # counts against pod memory limit

Projected Volume — Combined Sources

spec:
  containers:
    - name: app
      volumeMounts:
        - name: all-config
          mountPath: /etc/app
          readOnly: true
  volumes:
    - name: all-config
      projected:
        sources:
          - configMap:
              name: app-config
              items:
                - key: config.yaml
                  path: config.yaml
          - secret:
              name: app-tls
              items:
                - key: tls.crt
                  path: tls/cert.pem
                - key: tls.key
                  path: tls/key.pem
          - serviceAccountToken:
              path: token
              expirationSeconds: 3600
              audience: vault

Enterprise Storage Patterns

Pattern 1: Namespace Storage Quotas

The platform team limits how much storage each tenant namespace can consume:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: storage-quota
  namespace: payments
spec:
  hard:
    requests.storage: 500Gi                    # total PVC size
    persistentvolumeclaims: "20"               # max number of PVCs
    gp3-encrypted.storageclass.storage.k8s.io/requests.storage: 300Gi   # per StorageClass
    io2-database.storageclass.storage.k8s.io/requests.storage: 200Gi

Pattern 2: Default StorageClass

Set a default StorageClass so PVCs without an explicit class get the right storage:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-encrypted
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"   # default
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  kmsKeyId: arn:aws:kms:me-south-1:123456789012:key/mrk-abc123
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Pattern 3: Storage Topology Awareness

For multi-AZ clusters with block storage, ensure pods and volumes are co-located:

# StatefulSet with topology spread + storage affinity
spec:
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: kafka
      # WaitForFirstConsumer ensures PV is created in the same AZ as the pod
      # Pod anti-affinity ensures one pod per AZ
      # Result: each Kafka broker in a different AZ, with its PV in the same AZ

Interview Scenarios

Scenario 1: “Your application needs shared storage across multiple pods. Options on EKS vs GKE?”

Answer:

EKS shared storage options:

Option	Access Mode	Performance	Cost	Use Case
EFS	RWX	Moderate (ms latency)	Per-GB used + throughput	Shared config, uploads, CMS
FSx for Lustre	RWX	Very high (sub-ms)	Per-GB provisioned	HPC, ML training
S3 via Mountpoint	ROX/RWX (append)	High throughput, high latency	Per-GB + requests	Data lake, archives

“For a banking application needing shared read-write storage across pods, I would use EFS with the EFS CSI driver. It provides ReadWriteMany access across all AZs, automatic scaling, and encryption with KMS. I would use EFS Access Points to isolate different tenants. For ML training data that is read-heavy, I might use S3 with Mountpoint for S3 CSI driver instead — cheaper and higher throughput for large sequential reads.”

GKE shared storage options:

Option	Access Mode	Performance	Cost	Use Case
Filestore	RWX	High (NFS)	Per-GB provisioned	Shared config, uploads
Filestore Enterprise	RWX	Highest, regional replication	Premium	Mission-critical shared data
GCS FUSE	ROX/RWX (object)	High throughput, eventual consistency	Per-GB used + ops	ML data, archives, static assets

“On GKE, I would use Filestore for traditional shared file storage. For ML training data, GCS FUSE is more cost-effective and integrates well with BigQuery and Vertex AI. Filestore Enterprise provides regional replication for HA — important for banking workloads that cannot tolerate zone failures.”

Scenario 2: “Design storage for a Kafka cluster on Kubernetes”

Answer:

Kafka Storage Architecture on Kubernetes Design decisions:

Decision	EKS Choice	GKE Choice	Why
Volume type	io2 (10,000 IOPS)	pd-ssd (30 IOPS/GiB = 15,000 at 500Gi)	Kafka needs high IOPS
Size	500Gi per broker	500Gi per broker	7-day retention, ~50 topics
Encryption	KMS CMK	CMEK	Banking requirement
Replication	Kafka replication factor=3	Kafka replication factor=3	Data redundancy at app level
PD replication	N/A (EBS is single-AZ)	Not needed (Kafka handles it)	Do not pay for both
Binding mode	WaitForFirstConsumer	WaitForFirstConsumer	Ensure PV in same AZ as pod
Reclaim policy	Retain	Retain	Never auto-delete Kafka data

“I would NOT use regional PDs for Kafka on GKE. Kafka already replicates data across brokers (replication factor=3). Paying for regional PD replication on top of Kafka replication is wasteful. The PD just needs to be fast and reliable within a single zone. If a zone fails, Kafka’s built-in replication handles it.”

Scenario 3: “A PVC is stuck in Pending. How do you debug?”

Answer:

PVC Pending — Debugging Decision Tree Quick debugging commands:

# Check PVC status and events
kubectl describe pvc <name> -n <namespace>

# Check if StorageClass exists
kubectl get sc

# Check CSI driver pods are running
kubectl get pods -n kube-system -l app=ebs-csi-controller   # EKS
kubectl get pods -n kube-system -l app=gke-pd-csi-driver    # GKE

# Check CSI driver logs
kubectl logs -n kube-system -l app=ebs-csi-controller -c csi-provisioner

# Check ResourceQuota
kubectl get resourcequota -n <namespace> -o yaml

# Check PV to see if any are available
kubectl get pv --sort-by=.status.phase

# Check node volume attachment count
kubectl get csinodes -o yaml

Scenario 4: “How do you migrate data from one StorageClass to another without downtime?”

Answer:

StorageClass Migration Strategy

“For a single database volume, I would use the snapshot approach — take a VolumeSnapshot, create a new PVC from the snapshot with the new StorageClass, stop the database briefly, update the StatefulSet to reference the new PVC, and start it. Downtime is minimal (just the restart time).

For a distributed system like Kafka, I would use application-level migration — add new brokers with the new StorageClass, let Kafka rebalance partitions to the new brokers, then decommission the old ones. Zero downtime.”

Scenario 5: “Explain the tradeoffs between EBS/PD (block) vs EFS/Filestore (file) for Kubernetes workloads”

Answer:

Dimension	Block (EBS/PD)	File (EFS/Filestore)
Access	Single pod (RWO/RWOP)	Multiple pods (RWX)
Performance	High IOPS, low latency (<1ms)	Moderate IOPS, higher latency (2-5ms)
AZ scope	Single AZ (EBS) / optional regional (PD)	Multi-AZ by default
Scaling	Fixed size (must resize explicitly)	Auto-scales (EFS) or fixed (Filestore)
Cost	Per-GB provisioned (predictable)	Per-GB used (EFS) or provisioned (Filestore)
Backup	Volume snapshots (incremental)	AWS Backup / GCP Backup
POSIX compliance	Full (it is a real block device with ext4/xfs)	Full (NFS)
Consistency	Strong (single writer)	NFS semantics (close-to-open)
Best for	Databases, Kafka, any single-pod stateful workload	Shared config, CMS uploads, ML training data, WordPress

“Use block storage when you need raw performance and a single pod owns the data — databases, message brokers, caches. Use file storage when multiple pods need to read and write the same data — shared configuration, user-uploaded files, ML training datasets.

For banking, I would use gp3/io2 (EBS) or pd-ssd (PD) for all databases and stateful services, and EFS or Filestore only for shared file storage like document processing pipelines. I would avoid EFS for high-IOPS workloads because the latency is noticeably higher than EBS.”

Key Takeaways for Interviews

Always say WaitForFirstConsumer. If you are designing any StorageClass with block storage, mention this binding mode. It prevents the AZ mismatch problem, which is the single most common storage issue on Kubernetes.
Encryption is not optional. For banking interviews, every StorageClass must have encryption with CMEK (customer-managed keys). Know the difference: AWS has encrypted: "true" + kmsKeyId; GCP has disk-encryption-kms-key.
Know the volume limits per node. EBS has a per-instance attachment limit (typically 25-28 volumes). GCP PD supports up to 128 per node. This matters when running many StatefulSets on the same node.
Understand when NOT to use storage replication. Kafka and other distributed systems already replicate at the application level. Adding regional PDs or EBS multi-AZ on top is wasteful. Match the redundancy mechanism to the application architecture.
PVC Pending is the most common storage issue. Know the debugging tree: missing StorageClass, wrong AZ (Immediate binding), quota exceeded, CSI driver permissions, volume attachment limits. Walk through this methodically in interviews.

References

AWS

Amazon EBS CSI Driver for EKS — block storage for EKS using the Container Storage Interface
Amazon EFS CSI Driver for EKS — shared file storage (ReadWriteMany) for EKS

GCP

GCE Persistent Disk CSI Driver — block storage for GKE with PD-SSD and PD-Balanced

Tools & Frameworks

Kubernetes Persistent Volumes — PV, PVC, and StorageClass concepts and lifecycle
Kubernetes CSI Drivers List — comprehensive list of Container Storage Interface drivers