Skip to content

IAM Fundamentals — Roles, Policies & Role Assumption

AWS Organization structure with cross-account role assumption

The diagram shows a typical enterprise AWS Organization: the Management Account at the top, with Organizational Units (Workloads, Security, Shared Services) containing individual accounts. The dashed arrows represent cross-account role assumption — this is how identities in one account access resources in another, and it is the core pattern you will design and defend as a platform engineer.

As the central infrastructure team, you own IAM strategy across the entire organization. You define:

  • Who can access what across accounts (cross-account roles, trust policies)
  • How CI/CD pipelines authenticate to cloud APIs (IRSA, Workload Identity, OIDC federation)
  • Guardrails that prevent tenant teams from escalating privileges (SCPs, permission boundaries, org policies)

Tenant teams consume pre-built IAM roles and service accounts — they do not create their own cross-account trust relationships or manage federation.


ConceptDefinitionAWSGCP
AuthenticationProving WHO you areIAM Users, SSO, OIDC tokensGoogle Identity, Workload Identity
AuthorizationDetermining WHAT you can doIAM Policies, SCPsIAM Roles, Org Policies
IdentityA principal that can make API callsUsers, Roles, Federated usersUsers, Service Accounts, Groups
AccessPermission to perform an action on a resourceAllow/Deny in policiesRole bindings on resources

The fundamental architectural difference between AWS and GCP IAM:

AWS Model: “Attach policies TO identities”

AWS IAM policy model — policies attached to identities

  • You ask: “What can this user/role do?” — look at their attached policies
  • Policies are JSON documents with Effect/Action/Resource/Condition
  • A single user can have up to 10 managed policies + unlimited inline policies

GCP Model: “Bind roles AT resources, naming members”

GCP IAM binding model — roles bound at resource level

  • You ask: “Who can access this resource?” — look at its bindings
  • Roles are predefined bundles of permissions (not custom JSON documents)
  • Bindings INHERIT downward: org → folder → project → resource

Key Insight: Both achieve the same goal (controlling who can do what), but the mental model is inverted. AWS is identity-centric (“what can Jane do?”), GCP is resource-centric (“who can access this bucket?”).

What is a Role? (AWS) / Service Account? (GCP)

Section titled “What is a Role? (AWS) / Service Account? (GCP)”

This is the concept that confuses most people coming from traditional Linux/database user models. A role is NOT a person — it is a temporary identity that any authorized principal can assume.

An IAM Role is a temporary identity anyone authorized can “put on”

Think of it like a jacket hanging in a secure closet. The jacket has a name tag (ARN), a lock (trust policy — who is allowed to wear it), and pockets full of specific tools (permission policies — what the wearer can do). Anyone with the right key can put on the jacket, use the tools, and then hang it back. The jacket is not a person — it is an identity that grants temporary powers.

Concrete example — Priya (Platform Engineer):

Priya does NOT have an IAM User. She authenticates through Okta SSO, which creates a temporary session by assuming the PlatformAdmin role in the Payments account:

  • Trust policy on the role says: “IAM Identity Center (from our Okta federation) can assume this role”
  • Permission policy on the role says: “Allow EKS, EC2, S3, CloudWatch actions”
  • Session expires in 1 hour — Priya must re-authenticate to continue
  • When she switches to the Lending account, she gets a different role with different permissions
ConceptIAM UserIAM Role
CredentialsPermanent access keys (long-lived)Temporary STS tokens (1-12 hours)
Belongs toOne person/machineAnyone authorized by the trust policy
Created byAdmin (avoid in enterprise)Platform team (standard pattern)
RevocationMust delete keys manuallySession expires automatically
Use caseLegacy — avoidSSO, Lambda, EKS pods, cross-account

A GCP Service Account is an identity (with an email address) that can be impersonated

Unlike AWS roles (which are abstract), GCP service accounts are concrete identities — each one has an email address (e.g., etl-pipeline@prod-project.iam.gserviceaccount.com). They can be impersonated by other principals to obtain short-lived credentials, similar to AWS role assumption.

Concrete example — ETL Pipeline:

The nightly ETL pipeline does NOT run as a human user. It impersonates a dedicated service account:

  • Service account: etl-pipeline@team-a-prod.iam.gserviceaccount.com
  • IAM binding on the data bucket: roles/storage.objectViewer granted to this SA
  • Impersonation: The CI service account has roles/iam.serviceAccountTokenCreator on the ETL SA — only CI can impersonate it
  • Result: CI gets a short-lived OAuth2 token (1 hour) to read data as the ETL SA
ConceptSA with JSON KeySA with Impersonation
CredentialsPermanent JSON key file (long-lived)Short-lived OAuth2 token (1 hour)
RiskKey file can leakToken expires automatically
RotationManual — must regenerate and redeployAutomatic — new token each time
Best practiceAVOID — disable key creation via org policyUSE — standard enterprise pattern

Entity Relationships — Real-World Example

Section titled “Entity Relationships — Real-World Example”

AWS: Entity Relationships — FinServ Corp

Section titled “AWS: Entity Relationships — FinServ Corp”

Company: FinServ Corp (Banking, Dubai/ME-South-1) Accounts: Management, Security, Shared-Services (222222222222), Team-Payments-Prod (111111111111), Team-Lending-Prod, Sandbox

Example 1: Identity Chain — Priya (Platform Engineer)

Section titled “Example 1: Identity Chain — Priya (Platform Engineer)”

Priya's identity chain — Okta to IAM Identity Center to accounts

Example 2: Cross-Account Role Assumption — Ravi’s Lambda

Section titled “Example 2: Cross-Account Role Assumption — Ravi’s Lambda”

Ravi (Payments Developer) built a Lambda that reconciles bank transactions. It needs to read reconciliation rules from S3 in the Shared-Services account.

Cross-account role assumption — Lambda to Shared Services S3

Example 3: OIDC Federation — GitHub Actions CI/CD

Section titled “Example 3: OIDC Federation — GitHub Actions CI/CD”

OIDC Federation — GitHub Actions to AWS IAM

Flow: GitHub Actions generates a short-lived OIDC JWT token containing the repo name, branch, and workflow identity. This token is sent to AWS STS via AssumeRoleWithWebIdentity. STS validates the JWT signature against the GitHub OIDC provider’s public keys, checks the trust policy conditions (repo, branch, org), and returns temporary AWS credentials. No long-lived secrets are stored in GitHub — the JWT itself is the proof of identity.

Example 4: Policy Attachment Points — Where Each Type Lives

Section titled “Example 4: Policy Attachment Points — Where Each Type Lives”
Policy TypeAttaches ToFinServ ExampleCannot Attach To
Identity-BasedUsers, Groups, RolesEKSFullAccess on PlatformAdmin permission setResources
Resource-BasedS3, SQS, KMS, Lambda, SNS, API GatewayS3 bucket policy on finserv-shared-artifacts allowing Account 111’s roleUsers, Groups
Permission BoundaryUsers, RolesPlatformTeamBoundary caps platform team: no iam:CreateUserGroups (common interview trick question)
SCPOUs, AccountsWorkloads OU: Deny iam:CreateUser, Deny regions ≠ me-south-1Management Account (another trick — SCPs never apply to mgmt account)
Session PolicyAssumeRole / GetFederationToken sessionsCI passes session policy limiting to s3:GetObject on /releases/* onlyCannot EXPAND permissions, only restrict

Example 5: One API Call Through All Policy Layers

Section titled “Example 5: One API Call Through All Policy Layers”

Ravi’s Lambda calls s3:GetObject on finserv-shared-artifacts/reconciliation/rules-v2.json. Here is how AWS evaluates every policy layer:

One API call through all IAM policy layers — step by step

Reading the diagram: For every API call, AWS evaluates policies in this order: (1) Explicit deny in any policy? → DENY immediately. (2) SCP allows? (3) Resource-based policy allows? (4) Identity-based policy allows? (5) Permission boundary allows? (6) Session policy allows? ALL applicable layers must allow — if any says deny or is silent, access is denied. This is why debugging “Access Denied” requires checking every layer systematically.

GCP: Entity Relationships — FinServ Corp

Section titled “GCP: Entity Relationships — FinServ Corp”

Company: FinServ Corp on GCP (same company, GCP setup)

Example 1: Member → Binding → Resource (Platform Team Access)

Section titled “Example 1: Member → Binding → Resource (Platform Team Access)”

GCP IAM binding at folder level with inheritance

Example 2: Service Account Impersonation (CI/CD → Production)

Section titled “Example 2: Service Account Impersonation (CI/CD → Production)”

GCP Service Account Impersonation — CI/CD to Production

Example 3: GKE Workload Identity (Pod → GCP API)

Section titled “Example 3: GKE Workload Identity (Pod → GCP API)”

GKE Workload Identity — Pod to GCP API

Flow: A GKE pod’s Kubernetes service account is mapped to a GCP service account via an IAM binding (roles/iam.workloadIdentityUser). When the pod calls a GCP API, GKE intercepts the request, exchanges the K8s service account token for a GCP access token, and the API call proceeds as the mapped GCP service account. This eliminates the need for JSON key files inside pods — it is the GCP equivalent of AWS IRSA.

Example 4: Org-Level Deny Policy (Guardrail)

Section titled “Example 4: Org-Level Deny Policy (Guardrail)”

GCP Org-Level Deny Policy — no SA key creation

How it works: A deny policy at the org level blocks iam.serviceAccountKeys.create for all principals except the platform admin. Even if a developer has roles/iam.serviceAccountAdmin on their project, the org-level deny overrides it. Deny policies are GCP’s equivalent of AWS SCPs — hierarchical guardrails that prevent dangerous actions regardless of what allow policies exist lower in the hierarchy.

ConceptAWSGCPKey Difference
Identity groupingIAM Groups (attach policies to group)Google Groups (bind roles to group at resource level)GCP groups live in Google Workspace/Cloud Identity, not in IAM itself
Temporary credentialssts:AssumeRole → STS tokens (AccessKeyId + SecretAccessKey + SessionToken)SA Impersonation → OAuth2 access tokensGCP uses standard OAuth2; AWS uses proprietary STS format
Policy modelPolicy documents (JSON) attached TO identitiesMember + Role bound AT resource levelGCP has no “policy document” concept — roles are predefined permission bundles
Permission capPermission Boundaries (per user/role)Deny Policies (org/folder/project-wide) + Org Policy ConstraintsAWS boundaries are per-entity; GCP deny policies are hierarchical
InheritanceNO inheritance (each account is isolated; SCPs are the exception)FULL downward inheritance through org → folder → project → resourceBiggest difference — one GCP binding at folder level can cover 100 projects
Service identityIAM Roles (instance profiles, task roles, IRSA)Service Accounts (are principals AND can be impersonated)GCP SAs are email-addressable identities; AWS roles are abstract
Cross-account/projectCross-account role assumption with trust policies + external IDsJust add SA from another project as member in a bindingGCP is simpler — no trust policy needed for cross-project access
External federationOIDC providers + AssumeRoleWithWebIdentityWorkload Identity Federation (Pool + Provider + SA binding)Similar concept; GCP adds attribute mappings + conditions layer

AWS IAM Entity Hierarchy

The AWS IAM entity hierarchy: Users (humans or machine identities with permanent credentials), Groups (collections of users for bulk policy attachment), and Roles (temporary identities assumed via STS). All three can have identity-based policies attached. In enterprise, you will almost never create IAM Users — use SSO (Identity Center) for humans and IAM Roles for machines.

Every IAM policy consists of these elements:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3ReadForDataTeam",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::data-lake-prod",
"arn:aws:s3:::data-lake-prod/*"
],
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "ap-southeast-1"
},
"IpAddress": {
"aws:SourceIp": "10.0.0.0/8"
}
}
}
]
}
ElementPurposeNotes
EffectAllow or DenyExplicit Deny always wins
ActionAPI operationss3:GetObject, ec2:RunInstances
ResourceARN of target resourceUse wildcards carefully
ConditionWhen the policy appliesIP range, time, tags, region
PrincipalWho this applies to (resource-based only)Account, role, service, federated user

AWS IAM Policy Evaluation Flow

AWS Cross-Account Role Assumption Flow

The diagram shows the cross-account role assumption flow: a principal in Account A calls STS to assume a role in Account B. The trust policy on the target role controls who can assume it; the permission policy controls what the assumed role can do. This two-policy model (trust + permissions) is the foundation of all AWS cross-account access.

Instead of distributing long-lived access keys, an IAM principal calls sts:AssumeRole to get temporary credentials (access key + secret key + session token) that expire after 1-12 hours.

AWS STS AssumeRole flow

What STS returns: When AssumeRole succeeds, STS returns three values — a temporary AccessKeyId, SecretAccessKey, and SessionToken. These work like regular AWS credentials but expire (1-12 hours). The calling principal must include all three in subsequent API calls to act as the assumed role.

Every IAM role has two policy types:

  1. Trust Policy (who can assume this role):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::111111111111:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "bank-platform-team-2024"
}
}
}
]
}
  1. Permission Policy (what the assumed role can do):
{
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": "arn:aws:s3:::shared-artifacts-bucket/*"
}

Cross-Account Role Assumption — Step by Step

Section titled “Cross-Account Role Assumption — Step by Step”

Cross-Account Role Assumption — step by step

Step 1: Central team creates a role in Account A (Shared Services) with a trust policy allowing Account B.

Step 2: Central team creates/configures the Lambda execution role in Account B with permission to call sts:AssumeRole on the Account A role.

Step 3: At runtime, Lambda calls sts:AssumeRole with the cross-account role ARN.

Step 4: STS validates the trust policy, returns temporary credentials.

Step 5: Lambda uses temp creds to access S3/KMS in Account A.

External IDs — Preventing the Confused Deputy Problem

Section titled “External IDs — Preventing the Confused Deputy Problem”

Confused Deputy Problem — with and without External ID

Key rules:

  • The External ID is generated by the SERVICE (SaaS provider), NOT the customer
  • Must be 2-1224 characters (alphanumeric plus + = , . @ : / -)
  • Each customer gets a unique External ID
  • In enterprise: central infra team assigns External IDs per cross-org integration

AssumeRoleWithWebIdentity — OIDC Federation

Section titled “AssumeRoleWithWebIdentity — OIDC Federation”

Used for: IRSA (EKS pods), GitHub Actions, GitLab CI, any OIDC provider.

IRSA OIDC Federation Flow — EKS Pod to AWS STS

How IRSA works — step by step (follow the diagram):

  1. Pod starts with a Kubernetes ServiceAccount annotated with an IAM role ARN. The EKS mutating webhook injects a projected JWT token (signed by the cluster’s OIDC issuer) into the pod at /var/run/secrets/eks.amazonaws.com/serviceaccount/token
  2. AWS SDK detects the AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN environment variables (also injected by the webhook) and calls sts:AssumeRoleWithWebIdentity, sending the JWT + the target role ARN
  3. STS validates the JWT against the OIDC provider’s public keys (registered in IAM as an OIDC identity provider with the EKS cluster’s issuer URL). It checks the trust policy conditions: does the sub claim match the expected namespace:serviceaccount? Does the aud claim equal sts.amazonaws.com?
  4. If valid, STS returns temporary credentials (AccessKeyId + SecretAccessKey + SessionToken) scoped to the IAM role’s permission policy. These credentials expire and auto-refresh before expiry

Why this matters: Each pod gets its own IAM identity based on its Kubernetes ServiceAccount — no shared node-level IAM role. The central platform team controls which namespace:serviceaccount combinations can assume which IAM roles, and all access is logged in CloudTrail with the pod’s identity.

Trust policy for OIDC federation:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::111111111111:oidc-provider/oidc.eks.ap-southeast-1.amazonaws.com/id/ABCDEF1234567890"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.ap-southeast-1.amazonaws.com/id/ABCDEF1234567890:sub": "system:serviceaccount:payments:payment-processor",
"oidc.eks.ap-southeast-1.amazonaws.com/id/ABCDEF1234567890:aud": "sts.amazonaws.com"
}
}
}
]
}

Role A assumes Role B, then Role B assumes Role C. Use case: CI pipeline assumes a build role, which assumes a deploy role in a different account.

IAM Role Chaining — CI to Build to Deploy

Limitation: When chaining, the maximum session duration drops to 1 hour regardless of the role’s configured maximum (which can be up to 12 hours for direct assumption).

  • Direct assumption: 1-12 hours (configurable per role)
  • Chained assumption: maximum 1 hour
  • Session policies: pass an additional policy when calling AssumeRole to further restrict permissions (intersection of role permissions and session policy)

A permission boundary sets the maximum permissions an IAM entity can have. It does not GRANT permissions — it CAPS them.

Permission Boundary — intersection of boundary and identity policy

Reading the diagram: The outer box is the identity policy (what policies are attached to the role). The inner box is the permission boundary (the ceiling set by the central team). The effective permissions — what the role can actually do — is the intersection (overlap) of both. Anything in the identity policy but outside the boundary is denied.

Practical example — FinServ Corp Payments team:

Step 1 — Central platform team creates the boundary (an IAM policy):

The boundary is just a regular IAM policy document, but it will be used as a ceiling rather than a grant. The central team creates it via Terraform in the Shared Services account and replicates it to every workload account:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowedServices",
"Effect": "Allow",
"Action": ["s3:*", "dynamodb:*", "sqs:*", "sns:*", "logs:*", "cloudwatch:*", "lambda:*"],
"Resource": "*"
},
{
"Sid": "DenyPrivilegeEscalation",
"Effect": "Deny",
"Action": [
"iam:CreateUser", "iam:CreateAccessKey",
"iam:DeleteRolePermissionsBoundary",
"organizations:*"
],
"Resource": "*"
}
]
}

This becomes an IAM managed policy called arn:aws:iam::111111111111:policy/TenantBoundary.

Step 2 — Central team attaches the boundary to every tenant-created role:

There are two parts: (a) provide a Terraform module that automatically includes the boundary, and (b) enforce it via SCP so developers cannot bypass it.

Part A — Terraform module (the easy path):

The platform team publishes an internal Terraform module that all teams must use to create IAM roles. The module hardcodes the boundary:

# modules/tenant-iam-role/main.tf (maintained by platform team)
variable "role_name" {}
variable "trust_policy" {}
variable "policy_arns" { type = list(string) }
resource "aws_iam_role" "this" {
name = var.role_name
permissions_boundary = "arn:aws:iam::${data.aws_caller_identity.current.account_id}:policy/TenantBoundary"
assume_role_policy = var.trust_policy
}
resource "aws_iam_role_policy_attachment" "this" {
for_each = toset(var.policy_arns)
role = aws_iam_role.this.name
policy_arn = each.value
}

When Ravi (Payments developer) creates a Lambda execution role, he uses the module:

# Ravi's Terraform code
module "lambda_role" {
source = "git::https://github.com/finserv/terraform-modules//tenant-iam-role"
role_name = "payments-lambda-role"
trust_policy = data.aws_iam_policy_document.lambda_trust.json
policy_arns = [aws_iam_policy.payments_s3_access.arn]
}

The boundary is attached automatically — Ravi does not need to know about it and cannot remove it.

Part B — SCP enforcement (the enforcement backstop):

What if Ravi bypasses the module and creates a role directly via the AWS Console or raw Terraform aws_iam_role without the boundary? The SCP blocks it:

{
"Sid": "DenyCreateRoleWithoutBoundary",
"Effect": "Deny",
"Action": "iam:CreateRole",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"iam:PermissionsBoundary": "arn:aws:iam::*:policy/TenantBoundary"
}
}
}

This SCP says: “any iam:CreateRole call that does NOT include TenantBoundary as the permissions boundary is denied.” Even if Ravi has AdministratorAccess, he cannot create a role without the boundary. The platform team’s own roles are exempt because they use a different SCP (or are in the management account).

The result: Module makes it easy. SCP makes it mandatory. Together, every tenant role in the account is guaranteed to have the boundary.

Part C — Auto-remediation (detect and fix automatically):

What if you want to ALLOW role creation but automatically attach the boundary after the fact? Use an event-driven Lambda:

CloudTrail logs iam:CreateRole
EventBridge rule triggers on "CreateRole" events
Lambda function checks: does this role have TenantBoundary?
If NO → Lambda calls iam:PutRolePermissionsBoundary to attach it
Role now has the boundary — no human intervention needed
# EventBridge rule — triggers on any IAM role creation
resource "aws_cloudwatch_event_rule" "iam_role_created" {
name = "detect-role-without-boundary"
description = "Triggers when an IAM role is created"
event_pattern = jsonencode({
source = ["aws.iam"]
detail-type = ["AWS API Call via CloudTrail"]
detail = {
eventSource = ["iam.amazonaws.com"]
eventName = ["CreateRole"]
}
})
}
# Lambda target — auto-attaches the boundary
resource "aws_cloudwatch_event_target" "enforce_boundary" {
rule = aws_cloudwatch_event_rule.iam_role_created.name
arn = aws_lambda_function.enforce_boundary.arn
}

The Lambda function itself is simple — it reads the role name from the CloudTrail event, checks if PermissionsBoundary is set, and if not, calls PutRolePermissionsBoundary.

Which approach to use — decision matrix:

ApproachBehaviorBest for
SCP (Part B)Blocks role creation without boundaryStrict environments — “no boundary = no role, period”
Auto-remediation (Part C)Allows creation, then auto-attaches boundaryFlexible environments — don’t break developer workflow, but enforce compliance within seconds
Both togetherSCP blocks, Lambda catches edge cases (roles created by AWS services)Enterprise production — defense in depth

Step 3 — Ravi attaches AdministratorAccess (the identity policy):

Ravi attaches AdministratorAccess (which grants * on *). Here is what actually happens at runtime:

Action Ravi’s role triesAdministratorAccess saysTenantBoundary saysResult
s3:PutObject on payments bucketAllowAllowAllowed (in the intersection)
dynamodb:Query on payments tableAllowAllowAllowed
ec2:RunInstances (launch a server)AllowNot listed (implicit deny)Denied — outside the boundary
iam:CreateUser (create a backdoor user)AllowExplicit denyDenied — boundary blocks it
organizations:LeaveOrganizationAllowExplicit denyDenied — boundary blocks it

Even though Ravi attached the most powerful policy in AWS, the boundary caps his role to S3/DynamoDB/SQS/SNS/CloudWatch/Lambda only. He cannot escape the boundary because the boundary itself denies iam:DeleteRolePermissionsBoundary — the central team made it self-protecting.

These are two different kinds of IAM roles that AWS services use — one AWS creates for you automatically, the other you create yourself. Interviewers test whether you know the difference.

Service-Linked RoleService Role
Created byAWS automatically (when you enable a service)You (or your Terraform)
Managed byAWS — you cannot edit its policiesYou — full control over policies
Trust policyLocked to one specific AWS serviceYou define the trust policy
Can delete?Only after removing all resources that depend on itYes, anytime
Naming patternAWSServiceRoleFor<ServiceName>Any name you choose

Practical example — Service-Linked Role:

When you create an Application Load Balancer, AWS automatically creates AWSServiceRoleForElasticLoadBalancing in your account. This role allows the ELB service to register/deregister targets, describe EC2 instances, and manage ENIs. You did not create it, you cannot change its permissions, and you cannot delete it while any ALB exists. AWS needs this role to manage your ALBs — without it, the service cannot function.

Other common service-linked roles: AWSServiceRoleForAutoScaling, AWSServiceRoleForAmazonEKS, AWSServiceRoleForRDS.

Practical example — Service Role:

When you create a Lambda function, YOU create an execution role (e.g., payments-lambda-role) and attach policies that define what the Lambda can access (S3, DynamoDB, SQS). You control every aspect of this role — the trust policy says “Lambda service can assume this,” and the permission policy says exactly which resources the function can touch. You can edit, replace, or delete it anytime.

Other common service roles: ECS task roles, EC2 instance profiles, Step Functions execution roles.

Service Control Policies (SCPs) — Organization-Wide Guardrails

Section titled “Service Control Policies (SCPs) — Organization-Wide Guardrails”

SCPs are the top-level policy layer in AWS Organizations. They set the maximum permissions for every principal (users, roles, root) in an account or OU. Like permission boundaries, SCPs do NOT grant access — they only restrict what is allowed.

How SCPs work — the mental model:

Think of SCPs as a fence around an entire account. Inside the fence, IAM policies grant permissions as usual. But no one — not even the account root user — can do anything the fence blocks. The fence is set by the Organization management account, and individual accounts cannot modify or remove it.

Where SCPs attach:

AWS Organization SCP Hierarchy — SCPs stack downward through OUs

The diagram shows the AWS Organization hierarchy with SCPs at each level. SCPs stack downward — the Production OU’s effective permissions are the intersection of Root OU SCP + Workloads OU SCP + Production OU SCP. Each level can only further restrict, never expand.

Critical rule: SCPs never apply to the Management Account itself. This is a common interview trick question. The management account can always do everything — which is why you should never run workloads in it.

Practical example — FinServ Corp Production OU SCP:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyRegionsOutsideMEandUS",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": ["me-south-1", "us-east-1"]
},
"ArnNotLike": {
"aws:PrincipalARN": "arn:aws:iam::*:role/OrganizationAccountAccessRole"
}
}
},
{
"Sid": "DenyDisablingSecurityServices",
"Effect": "Deny",
"Action": [
"cloudtrail:StopLogging",
"cloudtrail:DeleteTrail",
"guardduty:DeleteDetector",
"guardduty:DisassociateFromMasterAccount",
"config:StopConfigurationRecorder",
"config:DeleteConfigurationRecorder"
],
"Resource": "*"
},
{
"Sid": "DenyCreatingIAMUsers",
"Effect": "Deny",
"Action": [
"iam:CreateUser",
"iam:CreateAccessKey",
"iam:CreateLoginProfile"
],
"Resource": "*"
},
{
"Sid": "DenyLeavingOrganization",
"Effect": "Deny",
"Action": "organizations:LeaveOrganization",
"Resource": "*"
}
]
}

What this SCP blocks — even for account admins with AdministratorAccess:

ActionWho tries itSCP saysResult
Launch EC2 in eu-west-1Any role in Payments-ProdDeny (not in me-south-1 or us-east-1)Blocked
Stop CloudTrail loggingAccount adminExplicit denyBlocked
Create an IAM User with access keysDeveloperExplicit denyBlocked — forces use of SSO/roles
Leave the organizationAccount root userExplicit denyBlocked — prevents account escaping
Deploy Lambda in me-south-1Developer with PowerUserNot denied by SCPAllowed (if IAM policy also allows)

SCPs vs Permission Boundaries — when to use which:

SCPPermission Boundary
ScopeEntire account or OU (every role, every user)Individual IAM role or user
Set byOrganization management accountCentral team attaches to roles
Use caseRegion lockdown, prevent disabling security tools, deny IAM user creationCap tenant-created roles to specific services
GranularityCoarse — same rules for everyone in the accountFine — different boundaries per role
Stacks withIAM policies (intersection)IAM policies (intersection)

Enterprise Identity Federation — SSO with Azure AD

Section titled “Enterprise Identity Federation — SSO with Azure AD”

In enterprise, humans never authenticate directly with AWS or GCP credentials. They authenticate through a corporate Identity Provider (IdP) — most commonly Azure AD (Microsoft Entra ID) — and the cloud platform trusts that IdP to vouch for the user’s identity.

There are two approaches: SAML-only federation (simpler, works everywhere) and SAML + SCIM (adds automatic user/group lifecycle sync). Both use Azure AD as the IdP.

AWS IAM Identity Center (formerly AWS SSO) is the single pane of glass for managing human access across all AWS accounts in an Organization. It sits between your corporate IdP and your AWS accounts:

Key features:

  • One sign-in portal (yourcompany.awsapps.com/start) — users see only the accounts/roles they are assigned to
  • Permission sets — reusable IAM policy bundles (e.g., DeveloperReadOnly, PlatformAdmin) that get deployed as IAM roles in each assigned account
  • Assignments — map a group or user to a permission set in specific accounts (e.g., “PaymentsDevelopers get DeveloperReadOnly in payments-prod”)
  • Temporary credentials — every session is short-lived (1-12 hours), no permanent access keys
  • Built-in integrations — AWS CLI v2 (aws sso login), Console, SDKs all support Identity Center natively

Approach 1: SAML-Only Federation (No SCIM)

Section titled “Approach 1: SAML-Only Federation (No SCIM)”

This is the simpler approach — Azure AD handles authentication, but you manage users and groups manually inside Identity Center.

How it works:

  1. Configure Azure AD as a SAML IdP in Identity Center (exchange metadata XML between Azure AD and AWS)
  2. When a user signs in: They go to the AWS SSO portal → redirected to Azure AD → authenticate with MFA → Azure AD sends a SAML assertion back to AWS → Identity Center creates a session
  3. You manually create groups and users inside Identity Center (or use Identity Center’s built-in directory)
  4. You manually assign groups to permission sets and accounts

Practical example:

Azure AD authenticates Priya (SAML assertion: "priya@finserv.com, MFA verified")
Identity Center receives the assertion, matches to local user "priya@finserv.com"
Priya sees her assigned accounts: Payments-Dev (PowerUser), Payments-Prod (ReadOnly)
She clicks Payments-Dev → Identity Center creates a temporary IAM role session
Session expires in 1 hour — she must re-authenticate to continue

When to use this approach:

  • Small teams (under 50 people) where manual group management is feasible
  • Organizations that do not have Azure AD Premium P1/P2 (SCIM requires it)
  • Proof-of-concept or initial setup before migrating to SCIM

Limitation: When Priya leaves the company and IT disables her Azure AD account, she cannot sign in anymore. But her Identity Center user and group memberships still exist — you must manually clean them up. At scale (200+ people, frequent joins/leaves), this becomes an operational burden.

Approach 2: SAML + SCIM (Automatic Lifecycle Sync)

Section titled “Approach 2: SAML + SCIM (Automatic Lifecycle Sync)”

This adds SCIM provisioning on top of SAML — Azure AD automatically creates, updates, and deletes users and groups in Identity Center.

How it works:

  1. Configure SAML (same as above — handles authentication)
  2. Configure SCIM (Azure AD → Identity Center API endpoint + bearer token). Azure AD pushes user/group changes every 40 minutes (or on-demand)
  3. When HR onboards Priya: IT adds her to Azure AD group SG-AWS-Payments-Developers → SCIM syncs the group to Identity Center within minutes → Priya automatically gets access to the assigned accounts/permission sets
  4. When Priya leaves: IT disables her Azure AD account → SCIM removes her from Identity Center → ALL AWS access across ALL accounts is revoked automatically

When to use this approach:

  • Any enterprise with 50+ people
  • Organizations where people frequently join, leave, or change teams
  • When you need automated audit compliance (who had access when?)

# In Account A (Shared Services — 222222222222)
# Role that Account B (Workload — 111111111111) can assume
resource "aws_iam_role" "cross_account_artifacts" {
name = "cross-account-artifacts-reader"
max_session_duration = 3600 # 1 hour
# Trust policy — WHO can assume this role
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
# Allow the specific Lambda execution role in Account B
AWS = "arn:aws:iam::111111111111:role/team-alpha-lambda-role"
}
Action = "sts:AssumeRole"
Condition = {
StringEquals = {
"sts:ExternalId" = "team-alpha-shared-2024"
}
}
}
]
})
tags = {
ManagedBy = "platform-team"
Environment = "shared"
Purpose = "cross-account-artifact-access"
}
}
# Permission policy — WHAT the assumed role can do
resource "aws_iam_role_policy" "artifacts_read" {
name = "artifacts-read-policy"
role = aws_iam_role.cross_account_artifacts.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:ListBucket"
]
Resource = [
"arn:aws:s3:::shared-artifacts-prod",
"arn:aws:s3:::shared-artifacts-prod/*"
]
},
{
Effect = "Allow"
Action = [
"kms:Decrypt",
"kms:DescribeKey"
]
Resource = [
aws_kms_key.artifacts_key.arn
]
}
]
})
}
# OIDC provider for EKS cluster (created once per cluster)
data "tls_certificate" "eks" {
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "eks" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}
# IAM role for the payments service in the payments namespace
resource "aws_iam_role" "payment_processor" {
name = "eks-payment-processor-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.eks.arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")}:sub" = "system:serviceaccount:payments:payment-processor"
"${replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")}:aud" = "sts.amazonaws.com"
}
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "payment_sqs" {
role = aws_iam_role.payment_processor.name
policy_arn = aws_iam_policy.sqs_payment_queue.arn
}
# Kubernetes service account annotation (via Helm or kubectl)
# metadata:
# annotations:
# eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/eks-payment-processor-role
# Central team creates this boundary — applied to ALL tenant-created roles
resource "aws_iam_policy" "tenant_boundary" {
name = "tenant-permission-boundary"
description = "Maximum permissions for any role created by tenant teams"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowCommonServices"
Effect = "Allow"
Action = [
"s3:*",
"dynamodb:*",
"sqs:*",
"sns:*",
"logs:*",
"cloudwatch:*",
"xray:*",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"ecr:GetAuthorizationToken",
]
Resource = "*"
},
{
Sid = "DenyIAMEscalation"
Effect = "Deny"
Action = [
"iam:CreateUser",
"iam:CreateAccessKey",
"iam:AttachUserPolicy",
"iam:PutUserPolicy",
"iam:DeleteRolePermissionsBoundary",
"organizations:*",
"account:*",
]
Resource = "*"
},
{
Sid = "DenyNetworkChanges"
Effect = "Deny"
Action = [
"ec2:CreateVpc",
"ec2:DeleteVpc",
"ec2:CreateSubnet",
"ec2:DeleteSubnet",
"ec2:ModifyVpcAttribute",
"ec2:CreateInternetGateway",
"ec2:AttachInternetGateway",
]
Resource = "*"
}
]
})
}
# More readable than raw JSON — the preferred Terraform pattern
data "aws_iam_policy_document" "deploy_role_trust" {
statement {
effect = "Allow"
actions = ["sts:AssumeRole"]
principals {
type = "AWS"
identifiers = [
"arn:aws:iam::${var.cicd_account_id}:role/github-actions-runner"
]
}
condition {
test = "StringEquals"
variable = "sts:ExternalId"
values = [var.external_id]
}
}
}
resource "aws_iam_role" "deploy" {
name = "deploy-to-production"
assume_role_policy = data.aws_iam_policy_document.deploy_role_trust.json
}

Scenario 1: Cross-Account S3 Access from Lambda

Section titled “Scenario 1: Cross-Account S3 Access from Lambda”

Q: “Explain how a Lambda function in Account A reads from S3 in Account B.”

Model Answer:

There are two approaches — I would recommend the cross-account role assumption pattern for enterprise environments:

Approach: Cross-Account Role Assumption (Recommended)

  1. In Account B (S3 owner), create an IAM role s3-reader-for-account-a with:

    • Trust policy allowing Account A’s Lambda execution role as principal
    • Permission policy granting s3:GetObject and s3:ListBucket on the specific bucket
    • External ID condition to prevent confused deputy if this is a multi-tenant setup
  2. In Account A, the Lambda execution role needs:

    • Permission to call sts:AssumeRole on the Account B role ARN
  3. At runtime:

    • Lambda calls sts:AssumeRole with Account B’s role ARN
    • STS validates the trust policy and returns temporary credentials
    • Lambda uses those temp creds to call s3:GetObject in Account B
    • Credentials expire after the configured session duration

Why not just a bucket policy? A bucket policy (resource-based) in Account B allowing Account A’s Lambda role would also work for same-org accounts. However, the role assumption approach is preferred because:

  • It provides explicit audit trail (CloudTrail shows the AssumeRole call)
  • External IDs prevent confused deputy
  • Session duration can be limited
  • Central team controls the trust relationship

Cross-Account Role Assumption — Lambda to S3


Scenario 2: Pod-to-Cloud-API — IRSA (AWS) / Workload Identity (GCP)

Section titled “Scenario 2: Pod-to-Cloud-API — IRSA (AWS) / Workload Identity (GCP)”

Q: “How do you give a Kubernetes pod secure access to cloud APIs? Explain the full chain.”

Model Answer:

Both clouds solve this the same way: map a Kubernetes ServiceAccount to a cloud IAM identity so each pod gets its own least-privilege credentials. No static keys, no shared node-level permissions.

IRSA (IAM Roles for Service Accounts) lets EKS pods assume IAM roles without instance-level credentials.

Setup (done once by the platform team):

  1. Create an OIDC provider in IAM that trusts the EKS cluster’s OIDC issuer URL
  2. Create an IAM role with a trust policy allowing sts:AssumeRoleWithWebIdentity from the OIDC provider, scoped to a specific namespace and Kubernetes service account
  3. Create a Kubernetes ServiceAccount annotated with the IAM role ARN: eks.amazonaws.com/role-arn: arn:aws:iam::111:role/my-role

Runtime flow (every API call):

  1. Pod starts with the annotated ServiceAccount
  2. EKS mutating webhook injects:
    • AWS_ROLE_ARN environment variable
    • AWS_WEB_IDENTITY_TOKEN_FILE pointing to /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    • A projected service account token volume (JWT signed by the EKS OIDC issuer)
  3. AWS SDK detects these env vars and calls sts:AssumeRoleWithWebIdentity
  4. STS validates the JWT against the OIDC provider’s public keys
  5. STS checks the trust policy conditions (sub = correct namespace:serviceaccount, aud = sts.amazonaws.com)
  6. STS returns temporary credentials (AccessKeyId, SecretAccessKey, SessionToken)
  7. SDK uses these credentials for the actual AWS API call (e.g., S3, DynamoDB)
  8. Credentials are refreshed automatically before expiry

Why this matters for enterprise:

  • No IAM user access keys embedded in pods
  • Each pod gets only the permissions it needs (no node-level IAM role sharing)
  • Central team controls which namespace:serviceaccount combos can assume which roles
  • Audit trail shows the pod identity in CloudTrail

IRSA vs GKE Workload Identity — key differences:

AWS IRSAGKE Workload Identity
MechanismPod gets a JWT → calls STS → gets temp credsGKE metadata server intercepts → exchanges token transparently
App code changesNone (AWS SDK auto-detects env vars)None (GCP client library auto-detects metadata server)
Setup complexityMore complex: OIDC provider + trust policy + conditionsSimpler: one IAM binding + one annotation
Trust modelTrust policy on the IAM role (explicit JSON document)workloadIdentityUser binding on the GCP SA
Credential typeSTS temporary credentials (3 values)OAuth2 access token (1 value)
AuditCloudTrail shows pod identity via assumed role sessionCloud Audit Logs show the GCP SA identity
Without itPods share the EC2 instance profile (node IAM role)Pods share the node’s default SA (often has Editor role)

Q: “Design IAM for a 200-person org with 50 AWS accounts.”

Model Answer:

I would use AWS IAM Identity Center (successor to AWS SSO) as the single source of truth, federated from the corporate IdP (Azure AD / Microsoft Entra ID in most enterprises).

Architecture:

IAM Identity Center Architecture for 200-Person Org

How the federation works — Azure AD to AWS:

The diagram shows the flow from Azure AD (the corporate identity provider where all 200 employees already have accounts) through IAM Identity Center into individual AWS accounts. Here is how each layer connects:

  1. Azure AD (Microsoft Entra ID) is the source of truth for all identities. HR onboards Priya → IT creates her Azure AD account → she is added to the SG-AWS-Payments-Developers security group. No one touches AWS directly.

  2. SCIM provisioning (System for Cross-domain Identity Management) automatically syncs Azure AD users and groups into IAM Identity Center every few minutes. When Priya is added to the Azure AD group, SCIM creates her identity in Identity Center and adds her to the matching group — no manual AWS work.

  3. IAM Identity Center maps groups to permission sets (which define what you can do) and account assignments (which define where you can go). This is the core mapping:

Azure AD GroupIdentity Center GroupPermission SetAssigned AccountsWhat They Can Do
SG-AWS-Platform-AdminsPlatformAdminsAdministratorAccessAll 50 accountsFull admin (platform team only)
SG-AWS-Payments-DevelopersPaymentsDevelopersDeveloperPowerUserPayments-Dev, Payments-StagingDeploy, read logs, manage Lambda/ECS
SG-AWS-Payments-DevelopersPaymentsDevelopersDeveloperReadOnlyPayments-ProdRead-only in production
SG-AWS-Lending-DevelopersLendingDevelopersDeveloperPowerUserLending-Dev, Lending-StagingDeploy, read logs, manage Lambda/ECS
SG-AWS-Lending-DevelopersLendingDevelopersDeveloperReadOnlyLending-ProdRead-only in production
SG-AWS-Data-EngineersDataEngineersDataEngineerAccessDataLake-Prod, Analytics-ProdGlue, Athena, S3, Redshift
SG-AWS-Security-AuditorsSecurityAuditorsSecurityAuditAll 50 accountsRead-only security review
SG-AWS-BreakGlassBreakGlassAdministratorAccess + MFAAll 50 accountsEmergency only (see below)
  1. When Priya signs in: She goes to the AWS SSO portal (finserv.awsapps.com/start), authenticates via Azure AD (including MFA), and sees only the accounts and roles she is assigned to. She clicks Payments-DevDeveloperPowerUser and gets a 1-hour session with temporary credentials. No permanent access keys exist anywhere.

  2. When Priya leaves the company: IT disables her Azure AD account. SCIM sync removes her from Identity Center within minutes. ALL her AWS access across ALL 50 accounts is revoked instantly — no manual cleanup needed.

Key design decisions:

  1. Groups, not individual users. Never assign permission sets to individual users. Map Azure AD security groups to permission sets. When someone joins/leaves, update the Azure AD group — AWS access follows automatically via SCIM.

  2. Permission sets per role, not per team. A DeveloperPowerUser set works for all teams. Team-specific access comes from assignment (which accounts the group is assigned to), not from the permission set itself. This means you maintain 5-6 permission sets, not 50.

  3. Separate dev/staging vs prod access. Developers get PowerUser in dev/staging but only ReadOnly in production. Deployments to prod happen through CI/CD (OIDC federation), not through human access.

  4. SCPs as guardrails. Even if a permission set grants broad access, SCPs on the Production OU prevent destructive actions:

    • Deny ec2:TerminateInstances without an approved tag
    • Deny regions outside me-south-1 and us-east-1
    • Deny disabling CloudTrail or GuardDuty
    • Deny creating IAM Users or access keys
  5. Break-glass access. A dedicated BreakGlass permission set with AdministratorAccess, assigned only to an emergency Azure AD group that requires:

    • MFA step-up authentication
    • Approval workflow (PIM / Privileged Identity Management in Azure AD)
    • Auto-revocation after 4 hours
    • All break-glass sessions trigger a PagerDuty alert via CloudTrail → EventBridge → SNS
  6. Permission boundaries. All tenant-created IAM roles must include the platform team’s permission boundary (as shown in the Permission Boundaries section above), preventing privilege escalation even if a developer creates a role with AdministratorAccess.


Scenario 4: GCP Equivalent of AWS Role Assumption

Section titled “Scenario 4: GCP Equivalent of AWS Role Assumption”

Q: “What’s the GCP equivalent of AWS role assumption? Walk through the flow.”

Model Answer:

The GCP equivalent is service account impersonation. Here is the complete mapping:

AWSGCP
sts:AssumeRolegenerateAccessToken() on a SA
Trust policy (principal)roles/iam.serviceAccountTokenCreator
Temporary credentialsShort-lived OAuth2 access token
External IDNo direct equivalent (not needed — impersonation is SA-to-SA, not account-to-account)
Role ARNService Account email
Session duration (1-12h)Token lifetime (default 1h, max 12h)

Flow:

  1. A CI/CD pipeline runs as cicd-runner@shared-services.iam.gserviceaccount.com
  2. It needs to deploy to the production project as deploy-sa@prod-project.iam.gserviceaccount.com
  3. Platform team grants roles/iam.serviceAccountTokenCreator on deploy-sa to cicd-runner
  4. At runtime, cicd-runner calls generateAccessToken(deploy-sa)
  5. IAM service validates the permission and returns a short-lived OAuth2 token
  6. CI/CD pipeline uses this token to make API calls as deploy-sa
  7. Cloud Audit Logs show: caller = cicd-runner, acting as = deploy-sa

Key differences from AWS:

  • No concept of “trust policy” — instead, you bind serviceAccountTokenCreator on the target SA
  • Impersonation works across projects without needing a shared identity account
  • Can chain impersonation with explicit delegation chains
  • In Terraform, use impersonate_service_account in the provider block

Q: “A developer says they can’t access an S3 bucket. Walk through debugging.”

Model Answer:

I follow a systematic approach through the policy evaluation chain:

Step 1: Identify the principal and action

data-bucket/reports/q1.csv
# Who is making the request?
aws sts get-caller-identity
# What exactly are they trying to do?

Step 2: Check explicit denies

  • SCPs on the account’s OU: is there an SCP denying S3 or restricting regions?
  • VPC endpoint policy (if accessing via VPC endpoint): does it allow the bucket?
  • Bucket policy explicit denies: does the bucket have a Deny statement matching this principal?

Step 3: Check allow path (identity-based)

  • Does the user/role have an identity policy allowing s3:GetObject on this specific resource ARN?
  • If there is a permission boundary, does it also allow s3:GetObject?

Step 4: Check allow path (resource-based)

  • Does the bucket policy allow this principal?
  • For cross-account: BOTH identity AND resource policies must allow

Step 5: Check conditions

  • aws:SourceVpc or aws:SourceVpce condition — is the request coming from the expected VPC/endpoint?
  • s3:prefix condition — is the request path matching the allowed prefix?
  • MFA condition — does the policy require MFA and the user does not have an MFA session?
  • Encryption condition — does the policy require s3:x-amz-server-side-encryption?

Step 6: Tools to use

Terminal window
# IAM Access Analyzer — checks what a principal can access
aws accessanalyzer create-access-preview ...
# IAM Policy Simulator — test policies without making real calls
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::111:role/dev-role \
--action-names s3:GetObject \
--resource-arns arn:aws:s3:::data-bucket/reports/q1.csv
# CloudTrail — find the actual denied request
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=GetObject \
--start-time "2026-03-15T00:00:00Z"
# Look for errorCode: AccessDenied, errorMessage tells you which policy denied

Scenario 6: GCP Default Service Accounts Danger

Section titled “Scenario 6: GCP Default Service Accounts Danger”

Q: “What are GCP default service accounts and why are they dangerous?”

Model Answer:

When you enable certain GCP APIs (Compute Engine, App Engine, Cloud Functions), GCP automatically creates default service accounts in the project. These are dangerous because:

  1. They get the Editor role by default. Editor grants read/write access to almost every GCP service — storage, databases, Pub/Sub, compute, networking. This violates least privilege.

  2. Workloads use them automatically. If you create a GCE VM or GKE node pool without specifying a service account, it runs as the default compute SA — meaning it has Editor-level access to everything in the project.

  3. Key sprawl risk. Developers may create JSON keys for the default SA without realizing its scope.

Enterprise remediation:

Step 1: Set org policy to disable automatic role grants
constraints/iam.automaticIamGrantsForDefaultServiceAccounts = enforced
Step 2: Disable or delete existing default SAs in all projects
google_project_default_service_accounts { action = "DISABLE" }
Step 3: Create dedicated SAs per workload with minimal permissions
etl-pipeline-sa → roles/storage.objectViewer + roles/bigquery.dataEditor
web-app-sa → roles/cloudsql.client + roles/secretmanager.secretAccessor
Step 4: Specify SA explicitly on every resource
GKE node pool → service_account = "gke-nodes@project.iam.gserviceaccount.com"
Compute VM → service_account { email = "vm-sa@project.iam.gserviceaccount.com" }

Scenario 7: Google-Managed Agents vs User-Managed SAs

Section titled “Scenario 7: Google-Managed Agents vs User-Managed SAs”

Q: “What are Google-managed service agents vs user-managed service accounts?”

Model Answer:

AspectUser-Managed SAGoogle-Managed Agent
Created byYou (or Terraform)Google automatically
Email formatname@PROJECT_ID.iam.gserviceaccount.comservice-PROJECT_NUM@SERVICE.iam.gserviceaccount.com
PurposeYour applications and workloadsInternal GCP service-to-service operations
You can delete?YesNo (managed by Google)
You manage keys?Yes (but avoid keys, use WIF)No
Exampleetl-pipeline@my-proj.iam.gserviceaccount.comservice-12345@compute-system.iam.gserviceaccount.com

When you interact with Google-managed agents:

  1. CMEK (Customer-Managed Encryption Keys): When using Cloud KMS to encrypt Compute Engine disks, you must grant roles/cloudkms.cryptoKeyEncrypterDecrypter to the Compute Engine service agent so it can encrypt/decrypt on your behalf.

  2. Shared VPC: The GKE service agent in a service project needs roles/container.hostServiceAgentUser on the host project’s GKE service agent.

  3. Cross-project access: Service agents sometimes need IAM roles in other projects for features like cross-project Pub/Sub delivery or cross-project BigQuery reads.

Common Google-managed agents:

Compute Engine: service-PROJECT_NUM@compute-system.iam.gserviceaccount.com
GKE: service-PROJECT_NUM@container-engine-robot.iam.gserviceaccount.com
Cloud Build: service-PROJECT_NUM@gcp-sa-cloudbuild.iam.gserviceaccount.com
Pub/Sub: service-PROJECT_NUM@gcp-sa-pubsub.iam.gserviceaccount.com
Dataflow: service-PROJECT_NUM@dataflow-service-producer-prod.iam.gserviceaccount.com
Cloud Composer: service-PROJECT_NUM@cloudcomposer-accounts.iam.gserviceaccount.com

Scenario 8: Minimal-Privilege CI/CD Pipeline Design

Section titled “Scenario 8: Minimal-Privilege CI/CD Pipeline Design”

Q: “Design a Terraform pipeline where the CI/CD tool has minimal permissions but can deploy to prod.”

Model Answer:

The key principle: the CI runner itself has almost no permissions — it only has the ability to assume/impersonate a deploy role that has the actual permissions.

CI/CD Pipeline — AWS OIDC to Deploy Role to EKS

Key security controls:

  • GitHub Actions uses OIDC, no stored AWS credentials
  • Trust policy scoped to specific GitHub repo and branch: repo:bank/infra:ref:refs/heads/main
  • Deploy role has 1-hour session max
  • CloudTrail logs every AssumeRole call with the GitHub run ID
  • SCP on production OU prevents the deploy role from modifying IAM or networking

AWS ConceptGCP EquivalentNotes
IAM RoleService AccountBoth provide temporary identity for workloads
sts:AssumeRolegenerateAccessToken() (SA impersonation)Both return short-lived credentials
Trust Policyroles/iam.serviceAccountTokenCreator bindingWho can assume/impersonate
Permission PolicyIAM role bindings on resourcesWhat the identity can do
External IDNo direct equivalentGCP uses SA-level binding instead of account-level trust
AssumeRoleWithWebIdentityWorkload Identity FederationBoth federate external OIDC tokens
IRSA (EKS)GKE Workload IdentityBoth map K8s SAs to cloud IAM
STS temporary credentialsOAuth2 access tokenBoth expire, both auto-refresh
SCP (Organization)Organization PolicyBoth are account/project-level guardrails
Permission BoundaryNo direct equivalentGCP uses deny policies + org policies instead
IAM Access AnalyzerIAM RecommenderBoth analyze unused permissions
IAM Policy SimulatorPolicy TroubleshooterBoth test policy evaluation
Service-Linked RoleGoogle-Managed Service AgentBoth are managed by the cloud provider
IAM User + Access KeysSA + JSON Key (AVOID both)Both are long-lived credentials — avoid
IAM Identity Center (SSO)Cloud Identity + Google GroupsBoth centralize human access
CloudTrail (API logging)Cloud Audit LogsBoth log every API call with caller identity
Resource-based policy (S3, SQS)IAM binding on resourceGCP applies all IAM at resource hierarchy level

AWS IAM Policy Evaluation Cheatsheet

Quick reference: Trace any API call through this flowchart. Start at the top: is there an explicit deny? If yes, stop — access denied. Then check each allow layer (SCP → resource-based → identity-based → permission boundary → session policy). All applicable layers must say “Allow” for the request to succeed.

MistakeRiskFix
Using IAM Users with access keysKey leaks, no expiry, no rotationUse IAM roles + OIDC federation everywhere
Wildcard Resource: "*"Overly broad accessSpecify exact ARNs or use conditions
No External ID in cross-org trustConfused deputy attacksAlways require External ID for third-party access
GCP default SA with EditorNear-admin on every serviceDisable default SAs, use dedicated SAs
Inline policies instead of managedHard to audit, cannot reuseUse managed policies attached to roles
Not using permission boundariesTenant teams can escalate privilegesEnforce boundaries on all tenant-created roles
SA JSON key files in reposCredential exposureUse Workload Identity Federation (keyless)
IRSA without namespace scopingAny pod can assume the roleScope trust policy to namespace:serviceaccount
GCP google_project_iam_policy in TFRemoves all other bindingsUse google_project_iam_member (additive)
No MFA on break-glass rolesUnauthorized emergency accessRequire MFA condition in trust policy

GCP Service Account Types — Quick Reference

Section titled “GCP Service Account Types — Quick Reference”

GCP Service Account Types — Quick Reference

Interview tip: When asked “What types of service accounts exist in GCP?”, name all three: (1) User-managed — you create and control these, the only type you should use for workloads. (2) Default — auto-created with the dangerous Editor role, disable via org policy. (3) Google-managed agents — internal service-to-service communication, you may need to grant them KMS access for CMEK.