Skip to content

Network Security — Firewall, IPS/IDS, WAF

Network security lives in the Network Hub Account. Every byte of traffic — inbound, outbound, and east-west — flows through the centralized inspection VPC before reaching workload accounts. The central infra team owns firewall rules, IPS signatures, WAF policies. Workload teams manage only their application-level security groups.

Network security context — Hub Account, Workloads, Security OU


Enterprise network security uses multiple layers. No single control is sufficient — each layer catches what the previous one missed.

Defense in depth — network security layers


NACLs vs Security Groups vs Network Firewall

Section titled “NACLs vs Security Groups vs Network Firewall”

This is one of the most frequently asked interview questions. Understanding the differences — and when each is appropriate — is essential.

  • Layer: subnet boundary (Layer 3/4)
  • Stateless: must define BOTH inbound and outbound rules separately. A request coming IN on port 443 needs an outbound rule for ephemeral ports (1024-65535) for the response.
  • Rules: numbered, processed in order (lowest number first). First match wins.
  • Default: allow all inbound and outbound (in default NACL)
  • Scope: applies to ALL traffic entering/leaving the subnet — cannot target specific instances
  • Use case: broad subnet-level controls (e.g., block a known-bad CIDR range)

NACL example — data subnet rules

  • Layer: ENI (Elastic Network Interface) level — attached to instances, ALBs, RDS, Lambda, etc.
  • Stateful: if you allow inbound TCP 443, the response is automatically allowed outbound (no need for ephemeral port rules)
  • Allow-only: you can only create ALLOW rules. There are no DENY rules. Anything not explicitly allowed is denied.
  • Reference other SGs: rules can reference security group IDs instead of CIDRs — “allow port 5432 from sg-app” — this is far more maintainable than CIDR-based rules
  • Default: deny all inbound, allow all outbound (can be restricted)
  • Scope: per-ENI — different instances in the same subnet can have different security groups
Security Group Example — 3-Tier Application:
sg-alb:
Inbound: 443 from 0.0.0.0/0 (or from CloudFront prefix list)
Outbound: 8080 to sg-app
sg-app (EKS pods / EC2):
Inbound: 8080 from sg-alb
Outbound: 5432 to sg-rds
6379 to sg-redis
443 to 0.0.0.0/0 (for external API calls — via NAT)
sg-rds:
Inbound: 5432 from sg-app
Outbound: (none needed — stateful handles responses)
sg-redis:
Inbound: 6379 from sg-app
Outbound: (none needed)
  • Layer: VPC level — deployed in a dedicated subnet within the inspection VPC
  • Deep packet inspection: examines packet headers AND payload
  • Stateful + stateless rule groups: stateless rules for simple allow/deny, stateful rules for protocol-aware inspection
  • Suricata-compatible: IPS/IDS rules written in Suricata syntax — detects malware, C2, exploits
  • Managed rule groups: AWS provides pre-built threat intelligence rules
  • Logging: flow logs, alert logs, full packet capture — sent to S3, CloudWatch, Kinesis
  • Scope: centralized in inspection VPC, processes all traffic routed through it via TGW
ControlUse WhenExample
NACLBroad subnet isolation, emergency IP blockingBlock a CIDR during incident response
Security GroupApplication-level access control (primary tool)Allow app → database on port 5432
Network FirewallDeep inspection, IPS/IDS, domain filtering, complianceDetect malware C2 callbacks, block non-TLS egress

AWS Security Groups vs GCP Firewall Rules — Comparison

Section titled “AWS Security Groups vs GCP Firewall Rules — Comparison”

This is a frequent interview question because the two models differ significantly in philosophy. AWS uses a purely allow-based model scoped to individual network interfaces, while GCP uses a priority-based allow/deny model scoped to the entire VPC with hierarchical policy inheritance. Understanding these differences is essential for multi-cloud architects and for answering “which model is better and why” questions.

AspectAWS Security GroupsGCP VPC Firewall Rules
StatefulnessStatefulStateful
Default behaviorDeny all inbound, allow all outboundDefault network: allow-internal + SSH/RDP/ICMP; Custom VPC: deny all
ActionsAllow only (implicit deny)Allow AND Deny
ScopePer ENI (network interface)Per VPC (via targets)
Rule targetsSG ID or CIDRNetwork tags, service accounts, or all instances
PriorityNo priority (all rules evaluated, union of allows)Priority 0-65535 (lower number = higher priority)
Cross-referenceReference other SG IDsReference service accounts
HierarchyNo hierarchy (flat per-VPC)Hierarchical: Org Policy → Folder Policy → VPC Rules
Limits60 inbound + 60 outbound rules per SG, 5 SGs per ENI500 rules per project (quota, can be increased)
Best practiceReference SG IDs instead of CIDRsUse service accounts instead of network tags
Deny rulesNot possible (must remove allow rule)Explicit deny with higher priority than allow
LoggingVPC Flow Logs (separate feature)Firewall Rules Logging (per-rule toggle)

Key architectural implications:

AWS’s allow-only model is simpler but less flexible. If you need to block a specific IP that was previously allowed by a broad CIDR rule, you cannot add a deny rule to the security group — you must narrow the CIDR range or use NACLs (which are stateless and operate at the subnet level, adding complexity). In practice, emergency IP blocking in AWS requires NACLs or AWS Network Firewall, not security groups.

GCP’s priority-based allow/deny model is more powerful. You can create a high-priority deny rule (e.g., priority 100: deny traffic from 198.51.100.0/24) that overrides a lower-priority allow rule (e.g., priority 1000: allow TCP 443 from 0.0.0.0/0). This makes incident response easier — block a bad actor without touching existing allow rules.

GCP’s hierarchical firewall policies are a significant enterprise advantage. The central security team can create organization-level policies (e.g., “deny all ingress on port 22 except from bastion subnet”) that CANNOT be overridden by project-level rules. In AWS, there is no equivalent hierarchy — each account’s security groups are independent, and you rely on SCPs (Service Control Policies) to restrict what security groups can be created, which is less granular.

GCP’s service account targeting is more secure than network tags. Tags are just strings — anyone with compute.instances.setTags IAM permission can add a tag to a VM and potentially match firewall rules they should not. Service accounts are IAM-controlled, providing cryptographic identity verification.


AWS Network Firewall — Deep Dive with IPS/IDS

Section titled “AWS Network Firewall — Deep Dive with IPS/IDS”

AWS Network Firewall is a managed stateful firewall service powered by Suricata. It inspects traffic at the VPC level — including headers, payloads, and protocol behavior.

Inspection VPC architecture — TGW, Firewall, NAT, IGW subnets

Network Firewall uses Suricata syntax for stateful rules. Suricata is an open-source IDS/IPS engine. You write rules that match specific traffic patterns — protocol, source/dest, content, keywords — and take actions (pass, drop, alert, reject).

Suricata Rule Syntax:
action protocol source_ip source_port -> dest_ip dest_port (options;)
Example Rules for Enterprise Bank:
# Block traffic to known C2 (command-and-control) servers
drop tls $HOME_NET any -> $EXTERNAL_NET any \
(tls.sni; content:"malware-c2.example.com"; \
msg:"C2 callback blocked"; sid:1000001; rev:1;)
# Alert on SQL injection attempts in HTTP traffic
alert http $EXTERNAL_NET any -> $HOME_NET any \
(http.uri; content:"UNION"; nocase; content:"SELECT"; nocase; \
msg:"Possible SQL injection in URI"; sid:1000002; rev:1;)
# Block unauthorized DNS-over-HTTPS (DoH) — enforce internal DNS
drop tls $HOME_NET any -> $EXTERNAL_NET 443 \
(tls.sni; content:"dns.google"; \
msg:"DNS-over-HTTPS blocked - use internal DNS"; sid:1000003; rev:1;)
# Detect outbound SSH tunneling (data exfiltration risk)
alert tcp $HOME_NET any -> $EXTERNAL_NET 22 \
(msg:"Outbound SSH detected - review for tunneling"; \
flow:established,to_server; sid:1000004; rev:1;)
# Block known bad TLS certificate fingerprints
drop tls $HOME_NET any -> $EXTERNAL_NET any \
(tls.cert_fingerprint; content:"ab:cd:ef:..."; \
msg:"Known malicious TLS certificate"; sid:1000005; rev:1;)
# Allow only HTTPS egress (block HTTP, non-standard ports)
pass tls $HOME_NET any -> $EXTERNAL_NET 443 \
(msg:"HTTPS egress allowed"; sid:1000010; rev:1;)
drop tcp $HOME_NET any -> $EXTERNAL_NET any \
(msg:"Non-HTTPS egress blocked"; sid:1000011; rev:1;)

AWS provides pre-built managed rule groups updated by AWS threat intelligence:

Managed Rule GroupWhat It Detects
AbusedLegitMalwareDomainsActionOrderDomains hosting malware on legitimate services
MalwareDomainsActionOrderKnown malware distribution domains
BotNetCommandAndControlDomainsActionOrderKnown botnet C2 domains
ThreatSignaturesDoSActionOrderDenial of service attack patterns
ThreatSignaturesExploitsActionOrderKnown exploit signatures
ThreatSignaturesMalwareActionOrderMalware traffic signatures
ThreatSignaturesWebAttacksActionOrderWeb application attack patterns
  • IPS (Intrusion Prevention System): inline inspection — traffic passes THROUGH the firewall. Can DROP malicious packets before they reach the workload. This is our bank’s configuration.
  • IDS (Intrusion Detection System): passive monitoring — traffic is mirrored to the firewall. Can ALERT but cannot block. Useful for initial deployment to evaluate false positives before switching to IPS mode.

In AWS Network Firewall, the mode is controlled by the rule action:

  • drop = IPS (blocks traffic)
  • alert = IDS (logs but allows traffic)
  • reject = IPS + sends TCP RST or ICMP unreachable

Recommended rollout: deploy with alert actions first (IDS mode) for 2-4 weeks. Review alerts. Tune rules to eliminate false positives. Then change actions to drop (IPS mode).


WAF protects web applications at Layer 7 (HTTP/HTTPS). It inspects request bodies, headers, URIs, and query strings for attack patterns.

AWS WAF attaches to CloudFront, ALB, API Gateway, or AppSync. It evaluates Web ACL rules against HTTP requests.

Key concepts:

  • Web ACL: collection of rules with a default action (allow or block)
  • Rule Group: reusable set of rules (managed or custom)
  • Managed Rule Groups: pre-built by AWS or marketplace vendors
  • Custom Rules: match on IP, geo, rate, string match, regex, size, SQL injection, XSS
  • Rule actions: ALLOW, BLOCK, COUNT (monitor without blocking), CAPTCHA, CHALLENGE

Essential Managed Rule Groups for Enterprise:

Rule GroupPurpose
AWSManagedRulesCommonRuleSetOWASP Top 10 — SQLi, XSS, LFI, RFI, path traversal
AWSManagedRulesKnownBadInputsRuleSetLog4j, Spring4Shell, known bad patterns
AWSManagedRulesSQLiRuleSetSQL injection (dedicated, deeper than Common)
AWSManagedRulesLinuxRuleSetLinux-specific exploits (for EC2/EKS workloads)
AWSManagedRulesBotControlRuleSetBot detection — scrapers, scanners, credential stuffers
AWSManagedRulesATPRuleSetAccount Takeover Prevention — credential stuffing detection
AWSManagedRulesAmazonIpReputationListKnown bad IPs — botnets, proxies, Tor exit nodes
AWSManagedRulesAnonymousIpListVPN, proxy, hosting provider IPs

Rate-Based Rules:

  • Automatically block IPs that exceed a request threshold (e.g., 2000 requests in 5 minutes)
  • Essential for DDoS mitigation at the application layer
  • Can scope by URI, header, or query string (e.g., rate limit /api/login separately)

AWS WAF architecture — CloudFront edge WAF + regional ALB WAF


Shield Standard (free, automatic):

  • Protects against L3/L4 volumetric attacks (SYN floods, UDP reflection, amplification)
  • Active on ALL AWS accounts by default — no configuration needed
  • Protects CloudFront, Route 53, Global Accelerator, ALB, NLB, EC2 Elastic IPs

Shield Advanced ($3,000/month per organization):

  • Everything in Standard plus:
  • L7 DDoS protection (requires WAF)
  • DDoS Response Team (DRT) — AWS experts help during active attacks
  • Cost protection: AWS credits for scale-up costs during DDoS
  • Real-time metrics and attack forensics
  • Automatic application-layer mitigations (creates WAF rules based on attack patterns)
  • Health-based detection (uses Route 53 health checks to detect impact)
  • Proactive engagement: DRT contacts you when they detect an attack targeting your resources

When to use Shield Advanced: regulated workloads (banks, healthcare), internet-facing applications with revenue impact from downtime, compliance requirements mandating DDoS mitigation documentation.


Centralized Inspection Architecture — Full Design

Section titled “Centralized Inspection Architecture — Full Design”

This is the complete network security architecture for our enterprise bank. All traffic — egress, ingress, and east-west — is inspected at the Network Hub Account.

Complete enterprise network security architecture — all layers

Egress (workload → internet):

  1. Pod in payments-prod VPC sends HTTPS to api.stripe.com
  2. Private subnet route table: 0.0.0.0/0 → TGW
  3. TGW prod-rt: 0.0.0.0/0 → inspection VPC attachment
  4. Inspection VPC TGW subnet RT: 0.0.0.0/0 → Network Firewall endpoint
  5. Network Firewall inspects:
    • Stateless: check against deny-list CIDRs
    • Stateful: verify TLS to allowed domain (api.stripe.com in allowlist)
    • IPS: scan for C2 patterns, malware signatures
    • Result: PASS
  6. Firewall subnet RT: 0.0.0.0/0 → NAT GW
  7. NAT GW translates private IP → Elastic IP → Internet → Stripe

Ingress (internet → workload):

  1. Customer hits payments.bank.com
  2. Route 53 → CloudFront (Shield absorbs L3/L4 DDoS)
  3. CloudFront → WAF inspects (OWASP rules, bot check, rate limit)
  4. CloudFront → ALB in inspection VPC public subnet (origin)
  5. ALB → Network Firewall inspects response/request
  6. After inspection → TGW → workload VPC → internal ALB → EKS pod

East-west (workload → workload):

  1. payments-prod pod calls trading-prod API at 10.11.1.50:8080
  2. payments VPC route table: 10.11.0.0/16 matches 10.0.0.0/8 → TGW
  3. TGW prod-rt: 10.11.0.0/16 → trading-prod attachment
  4. If east-west inspection required: static route 10.0.0.0/8 → inspection VPC in prod-rt
    • Traffic goes through Network Firewall before reaching trading-prod
    • Significant latency impact — only enable for high-sensitivity workloads
  5. trading-prod VPC: security group allows 8080 from 10.10.0.0/16

network-hub-account/firewall.tf
# ─── Firewall Policy ───────────────────────────────
resource "aws_networkfirewall_firewall_policy" "main" {
name = "bank-inspection-policy"
firewall_policy {
stateless_default_actions = ["aws:forward_to_sfe"]
stateless_fragment_default_actions = ["aws:drop"]
# Stateless rule group — fast path deny-lists
stateless_rule_group_reference {
priority = 1
resource_arn = aws_networkfirewall_rule_group.stateless_deny.arn
}
# Stateful rule groups — IPS/IDS
stateful_engine_options {
rule_order = "STRICT_ORDER"
}
stateful_rule_group_reference {
priority = 1
resource_arn = aws_networkfirewall_rule_group.ips_custom.arn
}
stateful_rule_group_reference {
priority = 2
resource_arn = aws_networkfirewall_rule_group.domain_allowlist.arn
}
# AWS Managed threat intelligence rules
stateful_rule_group_reference {
priority = 10
resource_arn = "arn:aws:network-firewall:eu-west-1:aws-managed:stateful-rulegroup/AbusedLegitMalwareDomainsActionOrder"
}
stateful_rule_group_reference {
priority = 11
resource_arn = "arn:aws:network-firewall:eu-west-1:aws-managed:stateful-rulegroup/BotNetCommandAndControlDomainsActionOrder"
}
stateful_rule_group_reference {
priority = 12
resource_arn = "arn:aws:network-firewall:eu-west-1:aws-managed:stateful-rulegroup/ThreatSignaturesMalwareActionOrder"
}
stateful_rule_group_reference {
priority = 13
resource_arn = "arn:aws:network-firewall:eu-west-1:aws-managed:stateful-rulegroup/ThreatSignaturesExploitsActionOrder"
}
}
}
# ─── Stateless Deny-List Rule Group ─────────────────
resource "aws_networkfirewall_rule_group" "stateless_deny" {
name = "stateless-deny-list"
capacity = 100
type = "STATELESS"
rule_group {
rules_source {
stateless_rules_and_custom_actions {
# Drop traffic from known bad CIDRs
stateless_rule {
priority = 1
rule_definition {
actions = ["aws:drop"]
match_attributes {
source {
address_definition = "198.51.100.0/24" # Example bad range
}
}
}
}
# Drop all inbound ICMP from internet (anti-reconnaissance)
stateless_rule {
priority = 2
rule_definition {
actions = ["aws:drop"]
match_attributes {
source {
address_definition = "0.0.0.0/0"
}
destination {
address_definition = "10.0.0.0/8"
}
protocols = [1] # ICMP
}
}
}
}
}
}
}
# ─── Custom IPS Rules (Suricata) ────────────────────
resource "aws_networkfirewall_rule_group" "ips_custom" {
name = "bank-ips-rules"
capacity = 500
type = "STATEFUL"
rule_group {
rule_variables {
ip_sets {
key = "HOME_NET"
ip_set {
definition = ["10.0.0.0/8"]
}
}
ip_sets {
key = "EXTERNAL_NET"
ip_set {
definition = ["0.0.0.0/0"]
}
}
}
rules_source {
rules_string = <<-RULES
# Block C2 callbacks to known threat domains
drop tls $HOME_NET any -> $EXTERNAL_NET any (tls.sni; content:"malicious-domain.com"; nocase; msg:"C2 callback blocked"; sid:2000001; rev:1;)
# Detect outbound SSH (potential tunneling/exfiltration)
alert tcp $HOME_NET any -> $EXTERNAL_NET 22 (msg:"Outbound SSH - review for tunneling"; flow:established,to_server; sid:2000002; rev:1;)
# Block DNS over HTTPS (enforce internal DNS)
drop tls $HOME_NET any -> $EXTERNAL_NET 443 (tls.sni; content:"dns.google"; msg:"DoH blocked"; sid:2000003; rev:1;)
drop tls $HOME_NET any -> $EXTERNAL_NET 443 (tls.sni; content:"cloudflare-dns.com"; msg:"DoH blocked"; sid:2000004; rev:1;)
drop tls $HOME_NET any -> $EXTERNAL_NET 443 (tls.sni; content:"dns.quad9.net"; msg:"DoH blocked"; sid:2000005; rev:1;)
# Alert on large outbound data transfers (exfiltration detection)
alert tcp $HOME_NET any -> $EXTERNAL_NET any (msg:"Large outbound transfer >10MB"; flow:established,to_server; dsize:>10000000; sid:2000006; rev:1;)
# Block non-TLS HTTP egress (enforce encryption)
drop tcp $HOME_NET any -> $EXTERNAL_NET 80 (msg:"Unencrypted HTTP egress blocked"; flow:established,to_server; sid:2000007; rev:1;)
# Allow HTTPS egress (explicit pass after inspection)
pass tls $HOME_NET any -> $EXTERNAL_NET 443 (msg:"HTTPS egress allowed"; flow:established,to_server; sid:2000099; rev:1;)
RULES
}
stateful_rule_options {
capacity = 500
}
}
}
# ─── Domain Allowlist ───────────────────────────────
resource "aws_networkfirewall_rule_group" "domain_allowlist" {
name = "domain-allowlist"
capacity = 200
type = "STATEFUL"
rule_group {
rule_variables {
ip_sets {
key = "HOME_NET"
ip_set {
definition = ["10.0.0.0/8"]
}
}
}
rules_source {
rules_source_list {
generated_rules_type = "ALLOWLIST"
target_types = ["TLS_SNI", "HTTP_HOST"]
targets = [
".amazonaws.com", # AWS services
".aws.amazon.com", # AWS console
".docker.io", # Container images
".docker.com",
".github.com", # Source control
".githubusercontent.com",
".stripe.com", # Payment processor
".twilio.com", # Communications
".datadoghq.com", # Monitoring (if used)
".grafana.net", # Grafana Cloud
".bank.com", # Our own domains
]
}
}
}
}
# ─── Firewall Instance ──────────────────────────────
resource "aws_networkfirewall_firewall" "main" {
name = "bank-inspection-firewall"
firewall_policy_arn = aws_networkfirewall_firewall_policy.main.arn
vpc_id = aws_vpc.inspection.id
dynamic "subnet_mapping" {
for_each = aws_subnet.firewall[*].id
content {
subnet_id = subnet_mapping.value
}
}
tags = { Name = "bank-inspection-firewall" }
}
# ─── Logging Configuration ──────────────────────────
resource "aws_networkfirewall_logging_configuration" "main" {
firewall_arn = aws_networkfirewall_firewall.main.arn
logging_configuration {
# Alert logs → CloudWatch (for real-time alerting)
log_destination_config {
log_destination = {
logGroup = aws_cloudwatch_log_group.fw_alerts.name
}
log_destination_type = "CloudWatchLogs"
log_type = "ALERT"
}
# Flow logs → S3 (for compliance and forensics)
log_destination_config {
log_destination = {
bucketName = aws_s3_bucket.fw_logs.id
prefix = "network-firewall/flow"
}
log_destination_type = "S3"
log_type = "FLOW"
}
}
}
resource "aws_cloudwatch_log_group" "fw_alerts" {
name = "/aws/network-firewall/alerts"
retention_in_days = 90
}
# CloudWatch alarm for firewall alerts
resource "aws_cloudwatch_metric_alarm" "fw_alerts" {
alarm_name = "network-firewall-ips-alerts"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "DroppedPackets"
namespace = "AWS/NetworkFirewall"
period = 300
statistic = "Sum"
threshold = 100
alarm_description = "Network Firewall IPS dropped >100 packets in 5 min"
alarm_actions = [aws_sns_topic.security_alerts.arn]
dimensions = {
FirewallName = aws_networkfirewall_firewall.main.name
}
}

WAF — Enterprise Architecture & Terraform

Section titled “WAF — Enterprise Architecture & Terraform”

Beyond the WAF basics covered above, this section focuses on enterprise WAF architecture patterns, Terraform implementation, and interview-ready scenarios for designing WAF rules at scale.

Why WAF — OWASP Top 10 Protection at Layer 7

Section titled “Why WAF — OWASP Top 10 Protection at Layer 7”

WAF inspects HTTP/HTTPS requests at Layer 7 — examining request bodies, headers, URIs, query strings, and cookies for attack patterns. Without WAF, your ALB and application are directly exposed to SQL injection, XSS, SSRF, path traversal, and other OWASP Top 10 attacks. Network firewalls (L3/L4) cannot inspect HTTP payloads — they see encrypted TLS traffic as opaque bytes.

AWS WAF Architecture — Web ACL, Rules, Rule Groups

Section titled “AWS WAF Architecture — Web ACL, Rules, Rule Groups”

AWS WAF Evaluation Flow

Integration points: CloudFront (edge — recommended for internet-facing), ALB (regional), API Gateway (REST APIs), AppSync (GraphQL), Cognito (user pools).

Best practice: Deploy WAF at CloudFront (edge) for internet-facing apps. This blocks attacks before they reach your region, reducing load on ALBs and origin servers. For internal APIs exposed via ALB, attach WAF directly to the ALB.

Enterprise WAF Architecture

In this architecture:

  • CloudFront terminates TLS at the edge, caches static content, absorbs volumetric attacks
  • AWS WAF (attached to CloudFront) inspects all HTTP requests against rule groups
  • Shield Standard protects CloudFront from L3/L4 DDoS automatically
  • ALB performs target group routing to EKS pods
  • API Gateway handles REST API requests with its own throttling and auth

Logging: WAF logs to CloudWatch Logs (real-time, expensive at scale), S3 (cost-effective, query with Athena), or Kinesis Data Firehose (streaming to SIEM/Splunk/Datadog).

Terraform — AWS WAF with Managed + Custom Rules

Section titled “Terraform — AWS WAF with Managed + Custom Rules”
resource "aws_wafv2_web_acl" "main" {
name = "enterprise-waf"
description = "Enterprise WAF for production workloads"
scope = "CLOUDFRONT" # or "REGIONAL" for ALB
default_action {
allow {}
}
# Rule 1: Rate-based — block IPs exceeding 2000 req/5min
rule {
name = "rate-limit-per-ip"
priority = 1
action {
block {}
}
statement {
rate_based_statement {
limit = 2000
aggregate_key_type = "IP"
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "RateLimitPerIP"
}
}
# Rule 2: AWS Managed — IP Reputation
rule {
name = "aws-ip-reputation"
priority = 2
override_action {
none {} # Use managed rule actions (BLOCK)
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesAmazonIpReputationList"
vendor_name = "AWS"
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "AWSIPReputation"
}
}
# Rule 3: AWS Managed — Core Rule Set (OWASP Top 10)
rule {
name = "aws-common-rules"
priority = 3
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesCommonRuleSet"
vendor_name = "AWS"
# Exclude specific rules that cause false positives
rule_action_override {
name = "SizeRestrictions_BODY"
action_to_use {
count {} # Count instead of block for large file uploads
}
}
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "AWSCommonRules"
}
}
# Rule 4: AWS Managed — SQL Injection
rule {
name = "aws-sqli"
priority = 4
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesSQLiRuleSet"
vendor_name = "AWS"
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "AWSSQLi"
}
}
# Rule 5: AWS Managed — Bot Control
rule {
name = "aws-bot-control"
priority = 5
override_action {
none {}
}
statement {
managed_rule_group_statement {
name = "AWSManagedRulesBotControlRuleSet"
vendor_name = "AWS"
managed_rule_group_configs {
aws_managed_rules_bot_control_rule_set {
inspection_level = "COMMON" # or "TARGETED" for advanced
}
}
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "AWSBotControl"
}
}
# Rule 6: Custom — Geo-blocking sanctioned countries
rule {
name = "geo-block-sanctioned"
priority = 6
action {
block {}
}
statement {
geo_match_statement {
country_codes = ["KP", "IR", "CU", "SY"]
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "GeoBlockSanctioned"
}
}
# Rule 7: Custom — Rate limit login endpoint specifically
rule {
name = "rate-limit-login"
priority = 7
action {
block {}
}
statement {
rate_based_statement {
limit = 100
aggregate_key_type = "IP"
scope_down_statement {
byte_match_statement {
search_string = "/api/login"
positional_constraint = "STARTS_WITH"
field_to_match {
uri_path {}
}
text_transformation {
priority = 0
type = "LOWERCASE"
}
}
}
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "RateLimitLogin"
}
}
visibility_config {
sampled_requests_enabled = true
cloudwatch_metrics_enabled = true
metric_name = "EnterpriseWAF"
}
}
# Associate WAF with CloudFront distribution
resource "aws_wafv2_web_acl_association" "cloudfront" {
resource_arn = aws_cloudfront_distribution.main.arn
web_acl_arn = aws_wafv2_web_acl.main.arn
}
# WAF logging to S3 via Kinesis Firehose
resource "aws_wafv2_web_acl_logging_configuration" "main" {
log_destination_configs = [aws_kinesis_firehose_delivery_stream.waf_logs.arn]
resource_arn = aws_wafv2_web_acl.main.arn
redacted_fields {
single_header {
name = "authorization"
}
}
}

Interview — “Design WAF rules for a banking API exposed to partners”

Answer: (1) Layer the defenses: CloudFront → WAF → ALB → API Gateway → EKS. (2) IP allowlisting: Create a custom rule allowing ONLY partner IP ranges (known CIDRs). Block all other source IPs. This is the first rule (highest priority). (3) Mutual TLS (mTLS): Require client certificates — partners must present a valid cert. This is handled at ALB or API Gateway, not WAF. (4) Rate limiting per partner: Different rate limits per partner IP range or API key. High-value partners get higher limits. (5) OWASP protection: Even trusted partners can send malformed requests (compromised systems, bugs). Enable SQLi, XSS, LFI rules. (6) Request validation: Custom WAF rules to enforce required headers (API key, content-type), maximum body size, allowed HTTP methods (POST only for sensitive endpoints). (7) Logging: All WAF decisions logged to S3 + streamed to SIEM. Alert on blocked requests from partner IPs (may indicate their systems are compromised). (8) Deployment: Start in COUNT mode for 2 weeks to identify false positives, then switch to BLOCK.

Interview — “How do you handle WAF false positives without reducing security?”

Answer: (1) Identify the false positive: Check WAF logs — which rule blocked which request? Look at the request URI, headers, body. (2) Surgical exclusion: Do NOT disable the entire rule. Instead, create a rule exclusion for the specific condition. For example, if SizeRestrictions_BODY blocks file uploads on /api/upload, override ONLY that rule to COUNT mode and add a scope-down statement for /api/upload path. (3) Custom rule alternative: If a managed rule is too broad, switch it to COUNT and write a custom rule that covers the same attack pattern but with tighter scope. (4) Label + custom rule pattern: Set the managed rule to COUNT with a label, then create a custom rule that blocks based on the label AND excludes the false-positive path. (5) Testing pipeline: Always test WAF changes in a staging environment with production-like traffic before deploying to production. (6) Continuous tuning: Review WAF COUNT metrics weekly. New application features may trigger new false positives.


DDoS protection requires defense at every layer — not just one product. A volumetric L3/L4 flood is handled differently than a slow L7 application-layer attack. This section covers the multi-layer defense strategy.

Layer 3/4 (Network/Transport):
AWS: Shield Standard (free, always-on on CloudFront/ALB/NLB/Route53/EIP)
GCP: Cloud Armor built-in (Global LB absorbs at Google edge)
Layer 7 (Application):
AWS: Shield Advanced ($3K/mo) + WAF rate-based rules + Bot Control
GCP: Cloud Armor Adaptive Protection (ML-based anomaly detection)
Edge (Content Delivery):
AWS: CloudFront absorbs volumetric attacks at 450+ edge locations
GCP: Global LB anycast absorbs at Google's edge (one of largest networks globally)
Application Layer:
API Gateway throttling (per-client rate limits)
K8s HPA auto-scaling (scale pods to absorb legitimate traffic spikes)
Circuit breakers (Envoy/Istio — prevent cascading failures)
Connection draining (graceful handling of dropped connections)

Shield Standard (free, automatic):

  • Active on ALL AWS accounts by default — zero configuration
  • Protects against L3/L4 volumetric attacks: SYN floods, UDP reflection, DNS amplification
  • Covers CloudFront, Route 53, Global Accelerator, ALB, NLB, Elastic IPs
  • Cannot see attack metrics or get notifications (happens silently)

Shield Advanced ($3,000/month per organization):

  • Everything in Standard, plus:
  • 24/7 DDoS Response Team (DRT) — AWS experts assist during active attacks. They can modify WAF rules on your behalf.
  • Cost protection — AWS credits your account for scaling costs incurred during a DDoS attack (e.g., Auto Scaling adding instances, data transfer spikes)
  • Advanced metrics — real-time attack visibility in CloudWatch: attack vectors, volume, duration
  • Proactive engagement — DRT proactively contacts you when they detect an attack targeting your resources (requires Route 53 health checks configured)
  • Automatic application-layer mitigations — Shield Advanced can automatically create WAF rules based on observed attack patterns
  • Health-based detection — uses Route 53 health check status to detect when an attack is impacting your application (not just traffic volume)
  • Group protection — protect up to 1,000 resources under one subscription

When Shield Advanced is worth $3K/month:

  • Revenue-generating internet-facing applications (if 1 hour of downtime costs > $36K/year, Shield Advanced pays for itself)
  • Regulated industries requiring documented DDoS mitigation (banking: PCI DSS, healthcare: HIPAA)
  • Applications with SLA commitments to customers
  • When you need cost protection — a major DDoS attack can cause thousands in Auto Scaling and data transfer costs that Shield Advanced refunds

AWS Full DDoS Defense Architecture

Terraform for Shield Advanced:

resource "aws_shield_protection" "cloudfront" {
name = "cloudfront-shield"
resource_arn = aws_cloudfront_distribution.main.arn
}
resource "aws_shield_protection" "alb" {
name = "alb-shield"
resource_arn = aws_lb.main.arn
}
# Shield Advanced requires subscription (done once via console or CLI)
# aws shield create-subscription
# Proactive engagement requires Route53 health check
resource "aws_route53_health_check" "primary" {
fqdn = "api.example.com"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 10
tags = {
Name = "primary-api-health"
}
}
resource "aws_shield_proactive_engagement" "main" {
enabled = true
emergency_contact {
email_address = "security@example.com"
phone_number = "+971501234567"
contact_notes = "24/7 Security Operations Center"
}
}

“When is $3K/month for Shield Advanced worth it?”

FactorWithout Shield AdvancedWith Shield Advanced
Attack responseYour team manually investigates and mitigates. 4-8 hours of senior engineer time at $200/hr = $800-$1,600 per incidentDRT handles mitigation. Your team monitors.
Scaling costsYou pay for Auto Scaling during attack. Major attack can cost $5K-$50K+ in compute/data transferAWS credits scaling costs (cost protection)
Revenue lossApplication degradation during attack. 1 hour downtime for a $10M/yr app = $1,140/hrFaster mitigation = less downtime
Annual costIncident costs are unpredictable, potentially > $36K/yrFixed $36K/yr with predictable outcomes
ComplianceMust demonstrate DDoS mitigation capability (PCI DSS, SOC 2)Shield Advanced provides compliance documentation

Rule of thumb: If your internet-facing application generates >$500K/year in revenue OR operates in a regulated industry, Shield Advanced pays for itself.

Interview — “Your banking app is under DDoS attack. Walk through your response plan.”

Answer: (1) Detection (0-5 min) — CloudWatch alarms fire on Shield Advanced attack metrics (or Adaptive Protection alerts on GCP). PagerDuty pages the on-call engineer. Route 53 health checks may start failing, triggering failover routing. (2) Triage (5-15 min) — Check WAF and Shield dashboards: is this L3/L4 (volumetric) or L7 (application layer)? Check attack source IPs, geographic distribution, attack vectors. If L3/L4, Shield Standard/Cloud Armor is already mitigating — verify. If L7, proceed to step 3. (3) L7 mitigation (15-30 min) — Review WAF logs for attack patterns (common URI, user-agent, headers). Add targeted WAF rules: block offending user-agents, rate-limit specific paths under attack, geo-block if attack originates from specific countries. If Shield Advanced, engage DRT (they can push WAF rules on your behalf). (4) Scale (parallel) — Verify HPA is scaling pods. Verify Karpenter/ASG is adding nodes. Check ALB connection counts. If API Gateway, verify throttling is in place. (5) Communication — Notify stakeholders (status page update). If customer-facing, activate incident response communication plan. (6) Post-incident (24-48 hrs) — Review attack forensics. Update WAF rules permanently if attack pattern is novel. Request Shield Advanced cost protection credit. Update runbook with lessons learned. Conduct blameless post-mortem.


Scenario 1: “Design a centralized network inspection architecture for all internet egress across 50 workload accounts”

Section titled “Scenario 1: “Design a centralized network inspection architecture for all internet egress across 50 workload accounts””

Answer:

I would implement a hub-spoke model with a dedicated inspection VPC in the Network Hub Account:

Architecture: Centralized egress architecture

Key design decisions:

  1. No IGW in workload VPCs — enforced via SCP that denies ec2:CreateInternetGateway in workload accounts
  2. Appliance mode ON on the inspection VPC TGW attachment — ensures symmetric routing for stateful inspection
  3. Multi-AZ deployment — Network Firewall endpoints in all 3 AZs, NAT GWs in all 3 AZs
  4. TGW route table segmentation — prod and non-prod have separate route tables. Both route 0.0.0.0/0 to inspection, but they cannot reach each other’s VPCs unless explicitly propagated
  5. Logging — all firewall alerts go to CloudWatch for real-time monitoring + S3 for long-term compliance. SIEM integration (Splunk/Sentinel) consumes from S3

Scaling considerations: AWS Network Firewall auto-scales. For 50 VPCs, TGW can handle up to 5,000 attachments per region. Data processing cost at $0.02/GB is the main cost driver — estimate monthly egress and budget accordingly.


Scenario 2: “How do you implement IPS/IDS in the cloud?”

Section titled “Scenario 2: “How do you implement IPS/IDS in the cloud?””

Answer:

IPS/IDS in the cloud replaces the traditional on-prem intrusion detection appliances with cloud-native services:

AWS — Network Firewall with Suricata:

  • Deploy Network Firewall in the centralized inspection VPC
  • Write rules in Suricata syntax (open-source IDS/IPS engine)
  • Start in IDS mode (alert action) for 2-4 weeks to baseline traffic and identify false positives
  • Switch to IPS mode (drop action) for blocking after tuning
  • Use AWS managed threat intelligence rule groups for immediate coverage (malware domains, C2 servers, exploit signatures)
  • Custom rules for bank-specific threats: block DNS-over-HTTPS (enforce internal DNS), detect large outbound transfers (exfiltration), enforce TLS-only egress

GCP — Cloud NGFW Enterprise:

  • Deploy firewall endpoints in each zone where inspection is needed
  • Create security profiles with threat prevention enabled
  • Powered by Palo Alto Networks threat intelligence — auto-updated signatures
  • Supports TLS inspection (decrypt → inspect → re-encrypt) with your CA certificate
  • Configure severity-based actions: CRITICAL/HIGH = DENY, MEDIUM = ALERT, LOW = LOG

Alerting and SIEM integration:

  • AWS: Network Firewall alert logs → CloudWatch Logs → CloudWatch Alarm → SNS → PagerDuty. Long-term: S3 → Splunk/Sentinel
  • GCP: Firewall logs → Cloud Logging → Log-based metrics → alerting policy → PagerDuty. Export to BigQuery or SIEM via Pub/Sub

Key difference from on-prem: Cloud IPS does not require you to manage hardware, update Suricata/Snort versions, or maintain rule feeds. AWS and GCP managed rule groups are continuously updated. You focus on custom rules specific to your environment and tuning false positives.


Scenario 3: “How do you protect a public-facing API from DDoS and bot traffic?”

Section titled “Scenario 3: “How do you protect a public-facing API from DDoS and bot traffic?””

Answer:

I would implement a multi-layer defense using edge services:

DNS resolution layers

For the banking API specifically:

  • Enable Shield Advanced ($3K/month) — provides DDoS Response Team, cost protection during attacks, and proactive engagement
  • WAF bot control in Targeted mode — uses browser fingerprinting and behavioral analysis
  • Rate-based rules scoped to sensitive endpoints: login (100 req/5min), password reset (10 req/5min), account creation (5 req/5min)
  • Enable WAF logging and pipe to SIEM for SOC visibility
  • API key + mutual TLS for B2B API consumers — bots cannot obtain valid certificates

Scenario 4: “Explain the difference between NACLs, Security Groups, and Network Firewall — when to use each”

Section titled “Scenario 4: “Explain the difference between NACLs, Security Groups, and Network Firewall — when to use each””

Answer:

These operate at different layers of the network stack:

AspectNACLSecurity GroupNetwork Firewall
LayerSubnet boundaryENI (instance)VPC (inspection point)
StatefulnessStateless (must allow both directions)Stateful (responses auto-allowed)Both (stateless fast-path + stateful deep inspect)
RulesAllow + Deny, numbered orderAllow only, all rules evaluatedAllow + Deny + Alert, Suricata syntax
Protocol awarenessIP/port onlyIP/port onlyDeep packet inspection (payload, TLS SNI, HTTP headers)
ScopeApplies to all traffic in subnetPer-instance, can reference other SGsAll traffic routed through firewall
Use caseEmergency blocking, broad isolationPrimary app-level access controlIPS/IDS, domain filtering, compliance

Layered approach at our bank:

  1. Network Firewall (inspection VPC): catches threats, enforces domain allowlists, IPS/IDS — this is the perimeter
  2. NACLs (workload subnets): enforce tier isolation — data subnets only accept traffic from private subnets. Used as an additional barrier in case SG misconfiguration
  3. Security Groups (per resource): primary access control — sg-app allows 8080 from sg-alb, sg-rds allows 5432 from sg-app

A common mistake I see is relying solely on security groups. If a developer accidentally opens a security group to 0.0.0.0/0, the NACL and Network Firewall still protect. Defense in depth means no single control failure exposes the workload.


Scenario 5: “Design GCP firewall architecture for an org with 100+ projects using Shared VPC”

Section titled “Scenario 5: “Design GCP firewall architecture for an org with 100+ projects using Shared VPC””

Answer:

GCP’s hierarchical firewall model is purpose-built for this:

Security rule evaluation order

Key design decisions for 100+ projects:

  1. Organization policy is immutable — project owners cannot override org-level DENY rules
  2. GOTO_NEXT — delegates to the next level, allowing folder and VPC policies to add rules without duplicating org rules
  3. Service account targets (not tags) — prevents privilege escalation where a user adds a permissive tag to their VM
  4. Cloud NGFW Enterprise on the Shared VPC — IPS inspection without needing a separate inspection VPC (unlike AWS)
  5. Logging — firewall rule logging enabled on all rules. Logs exported to Cloud Logging → SIEM
  6. Terraform modules — firewall policies defined as Terraform modules in the infra repo. Changes go through PR review. No one manually edits firewall rules in the console

Scenario 6: “A workload account team says they need direct internet access. How do you handle this?”

Section titled “Scenario 6: “A workload account team says they need direct internet access. How do you handle this?””

Answer:

Short answer: deny the request and provide an alternative.

The whole point of centralized network security is that no workload VPC has direct internet access. Allowing it undermines the architecture and creates a security gap that cannot be monitored or controlled.

Step 1 — Understand the requirement: Ask WHY they need internet access. Common reasons:

  • “We need to call a third-party API” → route through centralized NAT + inspection
  • “We need to pull container images” → use VPC endpoint for ECR / Private Google Access for Artifact Registry
  • “We need to install packages” → use internal package mirror in Shared Services or S3 endpoint
  • “We need to receive webhooks” → use centralized ingress via ALB in Network Hub DMZ
  • “We need to debug connectivity issues” → provide CloudShell or SSM Session Manager (no SSH, no public IP)

Step 2 — Provide the approved solution:

  • Egress: all outbound traffic goes through TGW → inspection VPC → Network Firewall (IPS) → NAT GW. If they need to reach a specific external API, add the domain to the firewall allowlist. This is a change request, not a direct internet grant.
  • Ingress: internet-facing load balancers live in the Network Hub DMZ VPC. Workload teams provide their target group; the central team configures the ALB listener rule routing traffic to the workload VPC via TGW.

Step 3 — Enforce via policy:

  • SCP (AWS): deny ec2:CreateInternetGateway, ec2:AttachInternetGateway, ec2:CreateNatGateway in all workload accounts
  • Organization Policy (GCP): constraints/compute.restrictVpcExternalIpAccess to deny external IPs on VMs. Shared VPC firewall policies at org level deny direct internet egress.

Step 4 — Document and communicate: Publish an internal wiki page: “How to get internet access for your workload” — explains the centralized egress architecture, how to request domain allowlisting, and SLA for request processing (e.g., domain allowlist changes processed within 4 business hours).