CI/CD, Deployments & Environment Promotion

Where This Fits in the Enterprise Architecture

Enterprise CI/CD Architecture The golden rule: CI pipelines (GitHub Actions) build and push artifacts. CD pipelines (ArgoCD) deploy to clusters. CI never runs kubectl. The gitops repo is the single source of truth for what is deployed where.

The CI/CD Pipeline — End to End

Connecting GitHub Actions to Cloud Providers Securely

The Wrong Way (Static Credentials)

# DO NOT DO THIS — static keys stored as GitHub secrets
- name: Configure AWS
  env:
    AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
    AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

Why this is bad:

Long-lived credentials that can leak
No automatic rotation
Hard to audit — which workflow used which key?
Cannot scope to specific repo/branch
If compromised, attacker has persistent access

The Right Way — OIDC Federation

OIDC Authentication Flow

AWS: OIDC Setup for GitHub Actions to EKS

Step 1: Create OIDC Identity Provider in AWS

Step 2: Create IAM Role with Trust Policy

Step 3: GitHub Actions Workflow for EKS

The workflow uses OIDC authentication (no static keys), builds and pushes to ECR, scans with Trivy, and updates the GitOps repo.

GCP: Workload Identity Federation for GitHub Actions to GKE

Step 1: Create Workload Identity Pool and Provider

Step 2: Create Service Account with IAM Bindings

Step 3: GitHub Actions Workflow for GKE

The workflow uses Workload Identity Federation (no JSON keys), builds and pushes to Artifact Registry, scans with Trivy, and updates the GitOps repo.

# Terraform: GitHub OIDC provider in AWS
resource "aws_iam_openid_connect_provider" "github" {
  url             = "https://token.actions.githubusercontent.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["ffffffffffffffffffffffffffffffffffffffff"]

  tags = {
    Name      = "github-actions-oidc"
    ManagedBy = "terraform"
  }
}

# IAM Role for GitHub Actions — scoped to specific repo and branch
resource "aws_iam_role" "github_actions_cicd" {
  name = "GitHubActions-CICD-PaymentsAPI"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Federated = aws_iam_openid_connect_provider.github.arn
        }
        Action = "sts:AssumeRoleWithWebIdentity"
        Condition = {
          StringEquals = {
            "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
          }
          StringLike = {
            # Scope to specific repo and branch
            "token.actions.githubusercontent.com:sub" = "repo:bank-org/payments-api:ref:refs/heads/main"
          }
        }
      }
    ]
  })

  tags = {
    Purpose = "github-actions-cicd"
    Repo    = "bank-org/payments-api"
  }
}

# Permissions: push to ECR + read EKS cluster
resource "aws_iam_role_policy" "github_actions_permissions" {
  name = "cicd-permissions"
  role = aws_iam_role.github_actions_cicd.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "ECRPush"
        Effect = "Allow"
        Action = [
          "ecr:BatchCheckLayerAvailability",
          "ecr:CompleteLayerUpload",
          "ecr:GetDownloadUrlForLayer",
          "ecr:InitiateLayerUpload",
          "ecr:PutImage",
          "ecr:UploadLayerPart",
          "ecr:BatchGetImage",
        ]
        Resource = "arn:aws:ecr:me-south-1:111111111111:repository/payments-api"
      },
      {
        Sid    = "ECRAuth"
        Effect = "Allow"
        Action = "ecr:GetAuthorizationToken"
        Resource = "*"
      },
      {
        Sid    = "EKSDescribe"
        Effect = "Allow"
        Action = [
          "eks:DescribeCluster",
        ]
        Resource = "arn:aws:eks:me-south-1:111111111111:cluster/prod-eks-cluster"
      }
    ]
  })
}

name: CI — Build, Scan, Push to ECR

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write                        # REQUIRED for OIDC
  contents: read                         # read repo code

env:
  AWS_REGION: me-south-1
  ECR_REGISTRY: 111111111111.dkr.ecr.me-south-1.amazonaws.com
  ECR_REPOSITORY: payments-api
  IMAGE_TAG: ${{ github.sha }}

jobs:
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run linters
        run: |
          # Dockerfile linting
          docker run --rm -i hadolint/hadolint < Dockerfile

      - name: Run unit tests
        run: |
          go test ./... -v -race -coverprofile=coverage.out

  build-and-push:
    needs: lint-and-test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'  # only on main branch merge
    outputs:
      image-digest: ${{ steps.build.outputs.digest }}

    steps:
      - uses: actions/checkout@v4

      # OIDC authentication — no static keys
      - name: Configure AWS Credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::111111111111:role/GitHubActions-CICD-PaymentsAPI
          aws-region: ${{ env.AWS_REGION }}
          role-session-name: GitHubActions-${{ github.run_id }}

      - name: Login to ECR
        id: ecr-login
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push image
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: |
            ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}
            ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:latest
          cache-from: type=registry,ref=${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:cache
          cache-to: type=registry,ref=${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:cache,mode=max

      # Scan image for CVEs
      - name: Scan image with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}
          format: table
          exit-code: 1                   # fail pipeline on HIGH/CRITICAL CVEs
          severity: HIGH,CRITICAL
          ignore-unfixed: true

  update-gitops:
    needs: build-and-push
    runs-on: ubuntu-latest
    steps:
      - name: Checkout gitops repo
        uses: actions/checkout@v4
        with:
          repository: bank-org/gitops-repo
          token: ${{ secrets.GITOPS_PAT }}     # PAT with write access to gitops repo
          path: gitops

      - name: Update image tag in dev overlay
        run: |
          cd gitops/apps/payments/overlays/dev
          kustomize edit set image \
            payments-api=${{ env.ECR_REGISTRY }}/${{ env.ECR_REPOSITORY }}:${{ env.IMAGE_TAG }}

      - name: Commit and push
        run: |
          cd gitops
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add .
          git commit -m "chore(dev): update payments-api to ${{ env.IMAGE_TAG }}"
          git push

# Terraform: Workload Identity Federation for GitHub Actions
resource "google_iam_workload_identity_pool" "github" {
  project                   = var.project_id
  workload_identity_pool_id = "github-pool"
  display_name              = "GitHub Actions Pool"
  description               = "WIF pool for GitHub Actions CI/CD"
}

resource "google_iam_workload_identity_pool_provider" "github" {
  project                            = var.project_id
  workload_identity_pool_id          = google_iam_workload_identity_pool.github.workload_identity_pool_id
  workload_identity_pool_provider_id = "github-provider"
  display_name                       = "GitHub Provider"

  attribute_mapping = {
    "google.subject"       = "assertion.sub"
    "attribute.actor"      = "assertion.actor"
    "attribute.repository" = "assertion.repository"
    "attribute.ref"        = "assertion.ref"
  }

  # Restrict to your GitHub org
  attribute_condition = "assertion.repository_owner == 'bank-org'"

  oidc {
    issuer_uri = "https://token.actions.githubusercontent.com"
  }
}

# Service account for GitHub Actions
resource "google_service_account" "github_actions" {
  project      = var.project_id
  account_id   = "github-actions-cicd"
  display_name = "GitHub Actions CI/CD"
}

# Allow GitHub Actions to impersonate this SA (scoped to repo + branch)
resource "google_service_account_iam_binding" "github_wif" {
  service_account_id = google_service_account.github_actions.name
  role               = "roles/iam.workloadIdentityUser"

  members = [
    # Scoped to specific repo and branch
    "principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github.name}/attribute.repository/bank-org/payments-api"
  ]
}

# Permissions: push to Artifact Registry + access GKE
resource "google_project_iam_member" "gar_writer" {
  project = var.project_id
  role    = "roles/artifactregistry.writer"
  member  = "serviceAccount:${google_service_account.github_actions.email}"
}

resource "google_project_iam_member" "gke_developer" {
  project = var.project_id
  role    = "roles/container.developer"
  member  = "serviceAccount:${google_service_account.github_actions.email}"
}

name: CI — Build, Scan, Push to Artifact Registry

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write                        # REQUIRED for OIDC
  contents: read

env:
  GCP_PROJECT: bank-prod-cicd
  GAR_REGION: me-central1
  GAR_REPOSITORY: payments
  IMAGE_NAME: payments-api
  IMAGE_TAG: ${{ github.sha }}

jobs:
  lint-and-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run linters
        run: docker run --rm -i hadolint/hadolint < Dockerfile

      - name: Run unit tests
        run: go test ./... -v -race

  build-and-push:
    needs: lint-and-test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'

    steps:
      - uses: actions/checkout@v4

      # Workload Identity Federation — no JSON keys
      - name: Authenticate to Google Cloud via WIF
        id: auth
        uses: google-github-actions/auth@v2
        with:
          workload_identity_provider: "projects/123456789/locations/global/workloadIdentityPools/github-pool/providers/github-provider"
          service_account: "github-actions-cicd@bank-prod-cicd.iam.gserviceaccount.com"
          token_format: access_token

      - name: Login to Artifact Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.GAR_REGION }}-docker.pkg.dev
          username: oauth2accesstoken
          password: ${{ steps.auth.outputs.access_token }}

      - name: Build and push image
        uses: docker/build-push-action@v6
        with:
          context: .
          push: true
          tags: |
            ${{ env.GAR_REGION }}-docker.pkg.dev/${{ env.GCP_PROJECT }}/${{ env.GAR_REPOSITORY }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }}
            ${{ env.GAR_REGION }}-docker.pkg.dev/${{ env.GCP_PROJECT }}/${{ env.GAR_REPOSITORY }}/${{ env.IMAGE_NAME }}:latest

      - name: Scan image with Trivy
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: "${{ env.GAR_REGION }}-docker.pkg.dev/${{ env.GCP_PROJECT }}/${{ env.GAR_REPOSITORY }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }}"
          format: table
          exit-code: 1
          severity: HIGH,CRITICAL

  update-gitops:
    needs: build-and-push
    runs-on: ubuntu-latest
    steps:
      - name: Checkout gitops repo
        uses: actions/checkout@v4
        with:
          repository: bank-org/gitops-repo
          token: ${{ secrets.GITOPS_PAT }}
          path: gitops

      - name: Update image tag in dev overlay
        run: |
          cd gitops/apps/payments/overlays/dev
          kustomize edit set image \
            payments-api=${{ env.GAR_REGION }}-docker.pkg.dev/${{ env.GCP_PROJECT }}/${{ env.GAR_REPOSITORY }}/${{ env.IMAGE_NAME }}:${{ env.IMAGE_TAG }}

      - name: Commit and push
        run: |
          cd gitops
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add .
          git commit -m "chore(dev): update payments-api to ${{ env.IMAGE_TAG }}"
          git push

ArgoCD — GitOps Deployment

ArgoCD Architecture in the Enterprise

ArgoCD in the Enterprise

ArgoCD Application CRD

# ArgoCD Application for dev environment
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-dev
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io     # cleanup on delete
spec:
  project: development

  source:
    repoURL: https://github.com/bank-org/gitops-repo.git
    targetRevision: main
    path: apps/payments/overlays/dev

  destination:
    server: https://dev-eks.me-south-1.eks.amazonaws.com
    namespace: payments

  syncPolicy:
    automated:
      prune: true                        # delete resources removed from git
      selfHeal: true                     # revert manual changes in cluster
      allowEmpty: false                  # prevent accidental deletion of all resources
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true                   # prune after all other syncs
    retry:
      limit: 3
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

# ArgoCD Application for prod — manual sync + sync windows
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-prod
  namespace: argocd
spec:
  project: production

  source:
    repoURL: https://github.com/bank-org/gitops-repo.git
    targetRevision: main
    path: apps/payments/overlays/prod

  destination:
    server: https://prod-eks.me-south-1.eks.amazonaws.com
    namespace: payments

  syncPolicy:
    # NO automated sync — manual only for prod
    syncOptions:
      - CreateNamespace=false            # namespace must pre-exist in prod
      - PrunePropagationPolicy=foreground
      - RespectIgnoreDifferences=true
    retry:
      limit: 5
      backoff:
        duration: 10s
        factor: 2
        maxDuration: 5m

  # Ignore fields that are set by controllers (avoid false OutOfSync)
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas               # HPA manages replicas
    - group: autoscaling
      kind: HorizontalPodAutoscaler
      jqPathExpressions:
        - .status

ArgoCD Sync Windows (Production)

# AppProject with sync window — only deploy during business hours
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: production
  namespace: argocd
spec:
  description: Production applications

  sourceRepos:
    - https://github.com/bank-org/gitops-repo.git

  destinations:
    - server: https://prod-eks.me-south-1.eks.amazonaws.com
      namespace: payments
    - server: https://prod-eks.me-south-1.eks.amazonaws.com
      namespace: orders

  # RBAC: who can sync
  roles:
    - name: platform-admin
      policies:
        - p, proj:production:platform-admin, applications, sync, production/*, allow
        - p, proj:production:platform-admin, applications, get, production/*, allow
    - name: team-payments
      policies:
        - p, proj:production:team-payments, applications, get, production/payments-*, allow
        # Note: team cannot sync prod — only platform-admin can

  # Sync windows: only allow deploys Sun-Thu 10am-4pm Dubai time
  syncWindows:
    - kind: allow
      schedule: "0 10 * * 0-4"           # Sun-Thu 10am (Dubai work week)
      duration: 6h                        # until 4pm
      applications: ["*"]
      manualSync: true                    # manual sync allowed within window
    - kind: deny
      schedule: "0 0 * * *"              # deny all other times
      duration: 24h
      applications: ["*"]

App-of-Apps Pattern

For managing 50+ microservices, use the app-of-apps pattern — one parent Application that generates child Applications:

App-of-Apps Pattern

# Root Application (the "app of apps")
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/bank-org/gitops-repo.git
    targetRevision: main
    path: argocd/apps                    # directory containing Application YAMLs
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

ArgoCD ApplicationSets — Multi-Environment

ApplicationSets generate Applications dynamically based on generators (Git directory, list, cluster, matrix).

Git Directory Generator — Auto-Discover Apps per Environment

# One ApplicationSet → generates apps for ALL services in ALL envs
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: all-apps-all-envs
  namespace: argocd
spec:
  goTemplate: true
  goTemplateOptions: ["missingkey=error"]

  generators:
    # Matrix: combine environment list × git directory discovery
    - matrix:
        generators:
          # Generator 1: environments
          - list:
              elements:
                - env: dev
                  cluster: https://dev-eks.me-south-1.eks.amazonaws.com
                  autoSync: "true"
                - env: staging
                  cluster: https://staging-eks.me-south-1.eks.amazonaws.com
                  autoSync: "true"
                - env: prod
                  cluster: https://prod-eks.me-south-1.eks.amazonaws.com
                  autoSync: "false"        # manual sync for prod

          # Generator 2: discover apps from git directory structure
          - git:
              repoURL: https://github.com/bank-org/gitops-repo.git
              revision: main
              directories:
                - path: "apps/*/overlays/{{ .env }}"

  template:
    metadata:
      name: "{{ index .path.segments 1 }}-{{ .env }}"
      # Produces: payments-dev, payments-staging, payments-prod, etc.
    spec:
      project: "{{ .env }}"
      source:
        repoURL: https://github.com/bank-org/gitops-repo.git
        targetRevision: main
        path: "{{ .path.path }}"
      destination:
        server: "{{ .cluster }}"
        namespace: "{{ index .path.segments 1 }}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

GitOps Repository Structure

Kustomize-Based Structure (Recommended)

GitOps Repository Structure

Kustomize Base

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - deployment.yaml
  - service.yaml
  - hpa.yaml
  - pdb.yaml
  - network-policy.yaml

commonLabels:
  app.kubernetes.io/name: payments-api
  app.kubernetes.io/part-of: payments

images:
  - name: payments-api
    newName: 111111111111.dkr.ecr.me-south-1.amazonaws.com/payments-api
    newTag: latest                       # overridden per environment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
spec:
  replicas: 1                            # overridden per environment
  selector:
    matchLabels:
      app: payments-api
  template:
    metadata:
      labels:
        app: payments-api
    spec:
      serviceAccountName: payments-api
      containers:
        - name: payments-api
          image: payments-api             # placeholder, Kustomize replaces
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 250m
              memory: 256Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          env:
            - name: LOG_LEVEL
              value: "info"

Environment Overlays

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base

namespace: payments

patches:
  - path: patches/replicas.yaml
  - path: patches/resources.yaml

images:
  - name: payments-api
    newName: 111111111111.dkr.ecr.me-south-1.amazonaws.com/payments-api
    newTag: abc123def                    # CI updates this tag

configMapGenerator:
  - name: payments-config
    literals:
      - DATABASE_HOST=payments-db.dev.internal
      - LOG_LEVEL=debug
      - ENABLE_TRACING=true

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
spec:
  replicas: 1                            # dev: single replica

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
spec:
  template:
    spec:
      containers:
        - name: payments-api
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 250m
              memory: 256Mi

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../base

namespace: payments

patches:
  - path: patches/replicas.yaml
  - path: patches/resources.yaml
  - path: patches/tolerations.yaml

images:
  - name: payments-api
    newName: 111111111111.dkr.ecr.me-south-1.amazonaws.com/payments-api
    newTag: def456abc                    # promoted from staging

configMapGenerator:
  - name: payments-config
    literals:
      - DATABASE_HOST=payments-db.prod.internal
      - LOG_LEVEL=warn
      - ENABLE_TRACING=true

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
spec:
  replicas: 6                            # prod: 6 replicas across 3 AZs

apiVersion: apps/v1
kind: Deployment
metadata:
  name: payments-api
spec:
  template:
    spec:
      containers:
        - name: payments-api
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
            limits:
              cpu: "2"
              memory: 2Gi

Environment Promotion Patterns

GitOps CI/CD Pipeline — Dev to Prod

Pattern 1: PR-Based Promotion (Recommended)

PR-BASED PROMOTION FLOW
==========================

CI merges to main
       |
       v
GitHub Actions updates dev overlay (image tag)
       |
       v
ArgoCD auto-syncs dev
       |
       v
Automated tests pass in dev
       |
       v
GitHub Actions opens PR: "Promote payments-api:abc123 to staging"
  - Updates staging/kustomization.yaml with new image tag
  - PR auto-assigned to team lead (CODEOWNERS)
       |
       v
Team lead reviews + approves PR → merge
       |
       v
ArgoCD auto-syncs staging
       |
       v
Staging integration tests pass (automated)
       |
       v
Platform engineer opens PR: "Promote payments-api:abc123 to prod"
  - Updates prod/kustomization.yaml with new image tag
  - PR requires 2 approvals (CODEOWNERS)
  - Must pass branch protection rules
       |
       v
Platform team reviews + approves → merge (within sync window)
       |
       v
ArgoCD syncs prod (manual trigger or within sync window)
       |
       v
Argo Rollouts: canary 10% → 30% → 60% → 100% (with analysis)

GitHub Actions for automated promotion:

name: Promote to Staging

on:
  workflow_dispatch:
    inputs:
      image_tag:
        description: "Image tag to promote"
        required: true
      app_name:
        description: "Application name"
        required: true
        default: "payments"

jobs:
  promote:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout gitops repo
        uses: actions/checkout@v4

      - name: Update staging image tag
        run: |
          cd apps/${{ inputs.app_name }}/overlays/staging
          kustomize edit set image \
            ${{ inputs.app_name }}-api=111111111111.dkr.ecr.me-south-1.amazonaws.com/${{ inputs.app_name }}-api:${{ inputs.image_tag }}

      - name: Create promotion PR
        uses: peter-evans/create-pull-request@v6
        with:
          title: "promote(${{ inputs.app_name }}): staging ← ${{ inputs.image_tag }}"
          body: |
            ## Promotion Request

            - **App:** ${{ inputs.app_name }}
            - **Image tag:** ${{ inputs.image_tag }}
            - **Target:** staging
            - **Source:** dev (verified)

            ### Checklist
            - [ ] Dev deployment verified
            - [ ] Integration tests passed
            - [ ] No open incidents
          branch: promote/${{ inputs.app_name }}-staging-${{ inputs.image_tag }}
          labels: promotion,staging
          reviewers: team-leads

CODEOWNERS for approval gates:

# .github/CODEOWNERS

# Dev overlay — team can self-approve
apps/*/overlays/dev/         @bank-org/team-payments

# Staging overlay — team lead approval
apps/*/overlays/staging/     @bank-org/team-leads

# Prod overlay — platform team approval (2 reviewers required)
apps/*/overlays/prod/        @bank-org/platform-team

Pattern 2: Image Tag Promotion Pipeline

IMAGE TAG PROMOTION
====================

CI builds image with tag: sha-abc123
       |
       v
Writes to dev overlay: newTag: sha-abc123
       |
       v
ArgoCD syncs dev → tests pass
       |
       v
Promotion job copies SAME tag to staging overlay
  (no new build — same image, different config)
       |
       v
ArgoCD syncs staging → tests pass
       |
       v
Promotion job copies SAME tag to prod overlay
  (same image as dev/staging — guaranteed identical)
       |
       v
ArgoCD syncs prod (canary)

Argo Rollouts — Progressive Delivery

Argo Rollouts extends Kubernetes Deployments with advanced deployment strategies: canary, blue-green, and progressive delivery with automated analysis.

CANARY DEPLOYMENT WITH ARGO ROLLOUTS
======================================

                    100% traffic
                         |
                         v
                  +--------------+
Step 0:           | Stable v1    |    (current production)
                  | (10 pods)    |
                  +--------------+

Step 1:           +--------------+    +------------+
setWeight: 10     | Stable v1    |    | Canary v2  |
                  | (9 pods)     |--->| (1 pod)    |
                  | 90% traffic  |    | 10% traffic|
                  +--------------+    +------------+
                                           |
Step 2:                              Run AnalysisTemplate
pause: 5m                           (check error rate, latency)
                                           |
                                     Pass? Continue
                                     Fail? Auto-rollback

Step 3:           +--------------+    +------------+
setWeight: 30     | Stable v1    |    | Canary v2  |
                  | (7 pods)     |--->| (3 pods)   |
                  | 70% traffic  |    | 30% traffic|
                  +--------------+    +------------+

Step 4:           +--------------+    +------------+
setWeight: 60     | Stable v1    |    | Canary v2  |
                  | (4 pods)     |--->| (6 pods)   |
                  | 40% traffic  |    | 60% traffic|
                  +--------------+    +------------+

Step 5:           +--------------+
setWeight: 100    | Canary v2    |    Canary promoted to stable
                  | (10 pods)    |    Old ReplicaSet scaled to 0
                  +--------------+

Rollout with Canary Strategy

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payments-api
  namespace: payments
spec:
  replicas: 10
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      app: payments-api
  template:
    metadata:
      labels:
        app: payments-api
    spec:
      containers:
        - name: payments-api
          image: 111111111111.dkr.ecr.me-south-1.amazonaws.com/payments-api:sha-abc123
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
  strategy:
    canary:
      # Traffic management via ALB (EKS) or Istio
      trafficRouting:
        alb:
          ingress: payments-ingress
          servicePort: 80
          rootService: payments-api-root
          annotationPrefix: alb.ingress.kubernetes.io

      canaryService: payments-api-canary
      stableService: payments-api-stable

      steps:
        # Step 1: 10% canary with analysis
        - setWeight: 10
        - pause: { duration: 2m }
        - analysis:
            templates:
              - templateName: payments-success-rate
            args:
              - name: service-name
                value: payments-api-canary

        # Step 2: 30% canary
        - setWeight: 30
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: payments-success-rate

        # Step 3: 60% canary
        - setWeight: 60
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: payments-success-rate

        # Step 4: full rollout (manual gate for payments)
        - pause: {}                      # manual approval before 100%
        - setWeight: 100

      # Auto-rollback on failure
      abortScaleDownDelaySeconds: 30
      scaleDownDelayRevisionLimit: 1

AnalysisTemplate — Automated Canary Validation

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: payments-success-rate
  namespace: payments
spec:
  args:
    - name: service-name
      value: payments-api-canary
  metrics:
    # Metric 1: HTTP success rate must be > 99.5%
    - name: success-rate
      interval: 60s
      count: 5                           # run 5 measurements
      successCondition: result[0] >= 0.995
      failureLimit: 2                    # fail if 2+ measurements fail
      provider:
        prometheus:
          address: http://prometheus.monitoring.svc:9090
          query: |
            sum(rate(http_requests_total{
              service="{{args.service-name}}",
              status=~"2.."
            }[2m])) /
            sum(rate(http_requests_total{
              service="{{args.service-name}}"
            }[2m]))

    # Metric 2: P99 latency must be < 500ms
    - name: latency-p99
      interval: 60s
      count: 5
      successCondition: result[0] < 500
      failureLimit: 2
      provider:
        prometheus:
          address: http://prometheus.monitoring.svc:9090
          query: |
            histogram_quantile(0.99,
              sum(rate(http_request_duration_milliseconds_bucket{
                service="{{args.service-name}}"
              }[2m])) by (le)
            )

    # Metric 3: No increase in error logs
    - name: error-log-count
      interval: 60s
      count: 3
      successCondition: result[0] < 10
      failureLimit: 1
      provider:
        prometheus:
          address: http://prometheus.monitoring.svc:9090
          query: |
            sum(increase(log_messages_total{
              service="{{args.service-name}}",
              level="error"
            }[5m]))

Helm Charts in the Enterprise

Helm is used for packaging applications with configurable values. In a GitOps setup, ArgoCD renders Helm templates and applies the output.

# ArgoCD Application using Helm
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-prod-helm
  namespace: argocd
spec:
  source:
    repoURL: https://github.com/bank-org/helm-charts.git
    targetRevision: main
    path: charts/payments-api
    helm:
      valueFiles:
        - values.yaml
        - values-prod.yaml               # environment-specific overrides
      parameters:
        - name: image.tag
          value: "sha-abc123"
  destination:
    server: https://prod-eks.me-south-1.eks.amazonaws.com
    namespace: payments

Helm values per environment:

# values.yaml (defaults)
replicaCount: 1
image:
  repository: 111111111111.dkr.ecr.me-south-1.amazonaws.com/payments-api
  tag: latest
resources:
  requests:
    cpu: 250m
    memory: 256Mi

# values-prod.yaml (overrides for prod)
replicaCount: 6
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: "2"
    memory: 2Gi
ingress:
  enabled: true
  className: alb
  annotations:
    alb.ingress.kubernetes.io/scheme: internal

Interview Scenarios

Scenario 1: Design CI/CD for 50 Microservices

“Design CI/CD for 50 microservices deployed to EKS across dev, staging, and prod environments.”

Architecture:

50 APP REPOS              1 GITOPS REPO              3 CLUSTERS
+----------+              +-----------+              +----------+
| app-1    |--CI-->       | apps/     |              | dev      |
| app-2    |--CI-->       |   app-1/  |--ArgoCD-->   | staging  |
| ...      |              |   app-2/  |              | prod     |
| app-50   |--CI-->       |   ...     |              +----------+
+----------+              |   app-50/ |
                          +-----------+

Each app repo has:          GitOps repo has:           ArgoCD has:
- src/                      - base/ per app            - 1 ApplicationSet
- Dockerfile                - overlays/dev,staging,    - generates 150 apps
- .github/workflows/ci.yaml   prod per app              (50 x 3 envs)

Key decisions:

One CI workflow per app repo — builds, tests, pushes image, updates gitops repo
Single gitops repo — all 50 apps, Kustomize overlays for 3 envs
One ApplicationSet — matrix generator (envs x apps) creates 150 Applications
Shared CI templates — GitHub Actions reusable workflows for consistency
OIDC auth — single IAM role for all CI pipelines (scoped to org)

Scenario 2: GitHub Actions to EKS Without Static Credentials

“How do you connect GitHub Actions to EKS without static credentials?”

Answer: Use GitHub Actions OIDC federation with AWS STS.

Register GitHub as an OIDC identity provider in AWS
Create an IAM role with a trust policy that validates the GitHub OIDC token
Scope the trust policy to specific repo, branch, and optionally GitHub Environment
In the workflow, use aws-actions/configure-aws-credentials@v4 with role-to-assume
The action requests a short-lived token (15 min - 1 hour) from AWS STS
No static access keys anywhere — not in GitHub Secrets, not in environment variables

Trust policy scoping options:

repo:org/repo:*                          # any branch (too broad)
repo:org/repo:ref:refs/heads/main        # main branch only (good)
repo:org/repo:environment:production     # GitHub Environment (best)
repo:org/repo:pull_request               # PR context (for CI only)

Scenario 3: Environment Promotion with Approval Gates

“Design a promotion workflow: dev to staging to prod with approval gates.”

See the PR-based promotion pattern above. Key elements:

Dev: auto-deploy on merge to main (no approval needed)
Staging: automated PR opened by CI, requires team lead approval (CODEOWNERS)
Prod: manual PR by platform engineer, requires 2 platform team approvals
ArgoCD sync windows: prod deploys only during Sun-Thu 10am-4pm Dubai time
Canary: Argo Rollouts with analysis at 10/30/60% before 100%
Rollback: automatic if analysis fails; manual kubectl argo rollouts abort if needed

Scenario 4: Deployment Stuck — CrashLoopBackOff

“A deployment is stuck — 3 new pods are CrashLoopBackOff but old pods are still serving. What happens and how do you fix it?”

What is happening:

Rolling Update in Progress
===========================

maxSurge: 25%      (can create 25% extra pods)
maxUnavailable: 25% (can have 25% fewer ready pods)

Deployment: payments-api (replicas=10, image=v1)
  → Update triggered: image=v2

Step 1: Create 3 new pods with v2 (25% surge = ceil(10*0.25) = 3)
Step 2: New pods start → CrashLoopBackOff (bad config, missing env var, etc.)
Step 3: Rolling update STALLS — it will NOT kill old pods because:
        - maxUnavailable=25% → can have 8 ready (currently 10 ready with v1)
        - New pods are NOT ready → old pods stay
        - Users are NOT affected (v1 pods still serve traffic)

The deployment controller waits for progressDeadlineSeconds (default 600s = 10 min)
After deadline: deployment status = "ProgressDeadlineExceeded"
But old pods STILL serve traffic — no outage

Debugging:

# Check rollout status
kubectl rollout status deployment/payments-api -n payments

# Check new pod logs
kubectl logs -n payments -l app=payments-api --tail=50 | grep -i error

# Check events
kubectl describe deployment payments-api -n payments

# Common CrashLoopBackOff causes:
# - Missing ConfigMap/Secret referenced in env
# - Database connection string wrong for new env
# - Missing env variable in new version
# - OOMKilled (new version needs more memory)
# - Liveness probe path changed in new version

Fix:

# Option 1: Rollback to previous revision
kubectl rollout undo deployment/payments-api -n payments

# Option 2: Rollback to specific revision
kubectl rollout undo deployment/payments-api -n payments --to-revision=3

# In GitOps: revert the commit in gitops repo → ArgoCD syncs old version
git revert HEAD
git push

Scenario 5: Canary Deployments on EKS

“How do you implement canary deployments on EKS?”

Option 1: Argo Rollouts with ALB (recommended)

Replace Deployment with Rollout CRD
Use ALB Ingress Controller for traffic splitting
AnalysisTemplate validates canary health
Automated promotion or rollback

Option 2: Argo Rollouts with Istio

Istio VirtualService for fine-grained traffic splitting
More precise than ALB (can split by header, cookie, etc.)
Higher complexity (requires Istio service mesh)

Option 3: Native Kubernetes (manual canary)

Two Deployments (stable + canary) behind same Service
Adjust replica counts for traffic ratio
No automated analysis — purely manual
Not recommended for enterprise

Scenario 6: ArgoCD OutOfSync But Application Running Fine

“ArgoCD shows ‘OutOfSync’ but the application is running fine. Why?”

Common causes:

Cause	Fix
HPA changed replica count	Add `ignoreDifferences` for `/spec/replicas`
Mutating webhook added fields	Ignore the webhook-added fields
Server-side apply added `managedFields`	ArgoCD settings: exclude `managedFields`
Resource drift (manual kubectl edit)	Enable `selfHeal: true` to auto-revert
CRD status subresource	Ignore `.status` in diff
Defaulted fields by API server	Normalize in ArgoCD resource customization

Fix example:

# In ArgoCD Application spec
ignoreDifferences:
  - group: apps
    kind: Deployment
    jsonPointers:
      - /spec/replicas                   # HPA manages this
  - group: ""
    kind: Service
    jqPathExpressions:
      - .spec.clusterIP                  # auto-assigned by K8s
  - kind: MutatingWebhookConfiguration
    jqPathExpressions:
      - .webhooks[]?.clientConfig.caBundle

Scenario 7: Prevent Direct Deployment to Prod

“How do you prevent a developer from deploying directly to prod, bypassing the pipeline?”

Layered controls:

DEFENSE IN DEPTH — PREVENTING DIRECT PROD DEPLOYS
====================================================

Layer 1: Git (source of truth)
  - CODEOWNERS on prod/ overlay → requires platform team approval
  - Branch protection: no direct push to main
  - Require PR reviews for prod changes

Layer 2: ArgoCD (deployment engine)
  - AppProject RBAC: only platform-admin role can sync prod apps
  - Sync windows: deny outside business hours
  - No automated sync for prod (manual only)

Layer 3: Kubernetes (cluster-level)
  - RBAC: team SAs cannot create/update Deployments in prod namespaces
  - OPA/Kyverno: deny deployments not matching gitops labels
  - Namespace labels: "managed-by: argocd" — reject non-ArgoCD applies

Layer 4: Network
  - EKS API server: private endpoint only
  - No direct kubectl access from developer laptops
  - Break-glass procedure for emergencies (audited)

Kyverno policy — reject non-ArgoCD deployments:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-argocd-managed
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-argocd-label
      match:
        any:
          - resources:
              kinds: ["Deployment", "StatefulSet", "DaemonSet"]
              namespaces: ["payments", "orders", "trading"]
      exclude:
        any:
          - subjects:
              - kind: ServiceAccount
                name: argocd-application-controller
                namespace: argocd
      validate:
        message: "Resources in production namespaces must be deployed via ArgoCD."
        pattern:
          metadata:
            labels:
              app.kubernetes.io/managed-by: argocd

Scenario 8: Design GitOps Repo for 10 Microservices Across 3 Environments

“Design the gitops repo structure for a team with 10 microservices across 3 environments.”

Structure: Use the Kustomize-based structure shown above. Key principles:

One gitops repo for all 10 services (not 10 repos — simplifies management)
Kustomize base per service — shared manifests (deployment, service, HPA, PDB)
Three overlays per service — dev, staging, prod with environment-specific patches
One ApplicationSet — matrix generator creates 30 Applications automatically
CODEOWNERS — team owns dev/staging overlays, platform team owns prod
Promotion via PR — image tag updated in overlay, reviewed, merged, synced

File count:

10 services x (base: ~5 files + 3 overlays x ~3 files each) = ~140 files
1 ApplicationSet YAML = 1 file
Total: ~141 files in one repo — manageable

Scaling considerations:

At 50+ services, consider splitting into domain-specific gitops repos (payments-gitops, trading-gitops)
ArgoCD can watch multiple repos
Use ArgoCD ApplicationSet with multiple Git generators pointing to different repos

Quick Reference — CI/CD Decision Matrix

CI/CD DECISION MATRIX
======================

Need                          Tool / Pattern
----                          --------------
Container image build         GitHub Actions + docker/build-push-action
Image scanning                Trivy (OSS) or Snyk (enterprise)
Auth to AWS from CI           OIDC + aws-actions/configure-aws-credentials@v4
Auth to GCP from CI           WIF + google-github-actions/auth@v2
Container registry            ECR (AWS) / Artifact Registry (GCP)
GitOps deployment             ArgoCD (preferred) or Flux
Manifest templating           Kustomize (simple) or Helm (complex)
Multi-env management          Kustomize overlays + ArgoCD ApplicationSets
Canary deployments            Argo Rollouts + AnalysisTemplate
Traffic splitting             ALB (EKS) / Istio / Gateway API
Approval gates                GitHub CODEOWNERS + ArgoCD sync windows
Rollback                      git revert → ArgoCD sync (GitOps) or
                              kubectl argo rollouts abort (Argo Rollouts)

CI/CD, Deployments & Environment Promotion

Where This Fits in the Enterprise Architecture

The CI/CD Pipeline — End to End

Connecting GitHub Actions to Cloud Providers Securely

The Wrong Way (Static Credentials)

The Right Way — OIDC Federation

AWS: OIDC Setup for GitHub Actions to EKS

Step 1: Create OIDC Identity Provider in AWS

Step 2: Create IAM Role with Trust Policy

Step 3: GitHub Actions Workflow for EKS

GCP: Workload Identity Federation for GitHub Actions to GKE

Step 1: Create Workload Identity Pool and Provider

Step 2: Create Service Account with IAM Bindings

Step 3: GitHub Actions Workflow for GKE

ArgoCD — GitOps Deployment

ArgoCD Architecture in the Enterprise

ArgoCD Application CRD

ArgoCD Sync Windows (Production)

App-of-Apps Pattern

ArgoCD ApplicationSets — Multi-Environment

Git Directory Generator — Auto-Discover Apps per Environment

GitOps Repository Structure

Kustomize-Based Structure (Recommended)

Kustomize Base

Environment Overlays

Environment Promotion Patterns

Pattern 1: PR-Based Promotion (Recommended)

Pattern 2: Image Tag Promotion Pipeline

Argo Rollouts — Progressive Delivery

Rollout with Canary Strategy

AnalysisTemplate — Automated Canary Validation

Helm Charts in the Enterprise

Interview Scenarios

Scenario 1: Design CI/CD for 50 Microservices

Scenario 2: GitHub Actions to EKS Without Static Credentials

Scenario 3: Environment Promotion with Approval Gates

Scenario 4: Deployment Stuck — CrashLoopBackOff

Scenario 5: Canary Deployments on EKS

Scenario 6: ArgoCD OutOfSync But Application Running Fine

Scenario 7: Prevent Direct Deployment to Prod

Scenario 8: Design GitOps Repo for 10 Microservices Across 3 Environments

Quick Reference — CI/CD Decision Matrix

References

AWS

GCP

Tools & Frameworks