🔐 Identity & Access
Flashcard 1
Q: What enables Databricks to access GCP services?
A: Service account attached to clusters
Flashcard 2
Q: What is required for connecting to Google-managed services?
A: Enable API + attach a privileged service account
Flashcard 3
Q: What is identity federation used for?
A: Centralized user/group management via IdP (e.g., Azure AD, Okta)
Flashcard 4
Q: Who grants privileges on data objects?
A: Data object owner
🗂️ Storage & Data Access
Flashcard 5
Q: Where are storage credentials created?
A: Workspace → Data Explorer
Flashcard 6
Q: What is required to create a metastore in GCP?
A: Cloud storage bucket
Flashcard 7
Q: Who gets permission when granting access to external storage?
A: Service account generated by the metastore
Flashcard 8
Q: What is an external location?
A: Secure access control over part of a storage bucket
🌐 Networking & Architecture
Flashcard 9
Q: What defines regionality in GCP Databricks?
A: Subnet
Flashcard 10
Q: What is a standalone VPC?
A: VPC in same project as workspace resources
Flashcard 11
Q: What registers a VPC in Databricks?
A: Network configuration
Flashcard 12
Q: What are prerequisites for workspace creation in a custom VPC?
A: VPC + principal with appropriate permissions
🔐 Security & Encryption
Flashcard 13
Q: What does encryption key configuration do?
A: Registers Cloud KMS key for Databricks
Flashcard 14
Q: Can encryption keys be rotated?
A: Yes
Flashcard 15
Q: What resources can be encrypted?
A: Root bucket, system bucket, cluster disks
🔄 Data Federation & Integration
Flashcard 16
Q: What is query federation?
A: Query external systems without moving data
Flashcard 17
Q: Is query federation read/write or read-only?
A: Read-only
Flashcard 18
Q: What is required for BigQuery federation?
A: Connection to BigQuery + foreign catalog
⚙️ Compute & Cost Optimization
Flashcard 19
Q: How can you reduce compute cost?
A: Autoscaling + proper instance type + tagging
Flashcard 20
Q: What helps optimize query performance?
A: Photon engine + caching + clustering (Z-order, liquid clustering)
🔗 Unity Catalog & Governance
Flashcard 21
Q: What does Unity Catalog provide?
A: Centralized governance, permissions, lineage
Flashcard 22
Q: What is the Unity Catalog hierarchy?
A: Catalog → Schema → Table
Flashcard 23
Q: What sits at the top of the data hierarchy?
A: Metastore
🚀 Data Engineering & Pipelines
Flashcard 24
Q: What tool handles ingestion in Databricks?
A: Lakeflow Connect
Flashcard 25
Q: What is Spark Declarative Pipelines (SDP)?
A: Declarative ETL framework for batch and streaming
Flashcard 26
Q: What handles orchestration in Databricks?
A: Lakeflow Jobs