Core Cloud Differences
AWS • Identity: IAM Role / Instance Profile
• Storage: S3
• Storage access model: IAM Role
• Region definition: VPC
• Networking: VPC + Subnets
• Query federation: Redshift, port 5439
• Workspace setup: Bucket + IAM Role
Azure • Identity: Service Principal / Managed Identity
• Storage: ADLS Gen2
• Storage access model: Access Connector
• Region definition: Workspace configuration
• Networking: VNet + Subnets
• Query federation: Synapse / SQL DB
• Workspace setup: Resource Group + Region
GCP • Identity: Service Account
• Storage: GCS Bucket
• Storage access model: Service Account
• Region definition: Subnet
• Networking: VPC + Subnets
• Query federation: BigQuery
• Workspace setup: VPC + Principal
Identity and Access
• AWS uses IAM Role with assume-role and cross-account access
• Azure uses Service Principal or Managed Identity
• GCP uses Service Account with direct binding
Architect Tip
• AWS = role assumption model
• Azure = Azure AD identity model
• GCP = simple service account model
Exam Traps
• AWS uses IAM roles, not service accounts
• GCP uses service accounts, not IAM roles
• Azure uses Azure AD identity, not AWS-style IAM
Networking and Region
• AWS region is tied to the VPC
• Azure region is defined during workspace setup
• GCP region is determined by subnet
Memory Trick
• If the question says “subnet determines region”, the answer is GCP
Exam Mapping
• AWS = VPC
• Azure = Workspace configuration
• GCP = Subnet
Storage Access Pattern
• AWS uses IAM Role to access S3
• Azure uses Access Connector to access ADLS
• GCP uses Service Account to access GCS
Memory Hack
• AWS = Role
• Azure = Connector
• GCP = Service Account
Permission Mapping
• AWS grants access using IAM policy
• Azure grants access using RBAC role
• GCP grants access using IAM binding
Query Federation
• Query federation is read-only
• Data stays in the source system
• No ETL is required
• Federation means querying without moving data
Cloud Mapping
• AWS = Redshift
• Azure = Synapse / SQL DB
• GCP = BigQuery
Exam Traps
• Federation is not ETL
• Federation is not write-enabled
• Federation does not copy data into Databricks
Unity Catalog
• Unity Catalog is cloud-agnostic
• The hierarchy is Catalog → Schema → Table
• The top-level object is Metastore
Unity Catalog Functions
• Governance
• Permissions
• Lineage
• Cross-workspace sharing
Cloud-Specific Difference
• AWS uses IAM roles
• Azure uses Azure AD + Access Connector
• GCP uses Service Account
External Location Setup
• Create Catalog
• Create Connection
• Create Storage Credential
Cost Optimization
• Use autoscaling
• Select the right instance type
• Apply tagging
• Use serverless where available
• Use job clusters for ephemeral workloads
Cloud Cost Lens
• AWS = instance profile and billing visibility
• Azure = budget policies and cost management
• GCP = project-level billing and tagging
Databricks Data Platform Stack
• Ingestion = Lakeflow Connect
• Transformation = Spark Declarative Pipelines
• Orchestration = Lakeflow Jobs
• Governance = Unity Catalog
• Performance = Photon + Caching + Z-Order
Workspace Infrastructure Requirements
AWS • Requires S3 bucket
• Requires cross-account IAM role
• Uses bucket + IAM role pattern
Azure • Requires workspace name
• Requires resource group
• Requires region
• VNet can be optional depending on setup
GCP • Requires VPC
• Requires IAM principal
• Uses VPC + principal pattern
Architect Decision Model
Step 1: Identify the cloud
• AWS
• Azure
• GCP
Step 2: Map the problem
• Access data → check identity model
• Deploy workspace → check infrastructure requirement
• Networking → check where region is defined
• Governance → apply Unity Catalog rules
• Integration → identify the required connector or service
Step 3: Apply the cloud-specific answer
Example: How to access storage?
• AWS → IAM Role
• Azure → Access Connector
• GCP → Service Account
Golden Memory Block
• AWS = IAM Role + VPC + S3
• Azure = Service Principal + Region + Access Connector
• GCP = Service Account + Subnet + Bucket
Final Exam Hack
• AWS thinks in roles
• Azure thinks in identities and connectors
• GCP thinks in service accounts and subnets