Complete your Databricks User Groups profile!

Fill out a few details about yourself so the community can get to know you.

Get Certified: GCP Databricks Platform Architect — Lakehouse Design, Governance & Real-Time Pipelines on Google Cloud

Hong Kong Databricks FSI group

Jun 13, 4:00 – 7:00 AM (UTC)

7 RSVPs

Login to RSVP

About this event

This hands-on meetup is tailored for data engineers and architects preparing for the GCP Databricks Platform Architect Accreditation, with a focus on deploying scalable, secure, and high-performance Lakehouse architectures on Google Cloud. The session emphasizes real-world implementation patterns using Databricks Lakehouse Platform integrated with GCS, BigQuery, and GCP networking services.

Participants will walk through practical scenarios such as provisioning a workspace within a custom VPC and subnet (region-defining layer), configuring service accounts for secure access to GCP-managed services, and enabling APIs required for integrations. We will demonstrate how to design governed data platforms using Unity Catalog, including setting up external locations, storage credentials, and fine-grained access policies backed by metastore-generated service identities.

The session also explores query federation with BigQuery, comparing it with ETL-based ingestion strategies, alongside performance tuning using Photon engine, Delta caching, and clustering techniques. Attendees will build and orchestrate pipelines using Lakeflow Connect, Spark Declarative Pipelines, and Lakeflow Jobs, covering both batch and streaming workloads.

By the end, participants will gain architect-level insights into identity, networking, cost optimization, and observability, enabling them to design production-grade Databricks solutions aligned with certification requirements and enterprise best practices.



Question 1: Storage credentials creation

Where are storage credentials created?

A Workspace/Data explorer
B Workspace/Admin console
C Account console/Data page
D SQL

Answer: A

Rationale:
A is correct. Storage credentials are created in the Workspace Data Explorer.
B is incorrect. Admin console is for workspace administration, not credential creation.
C is incorrect. Account console does not manage workspace-level storage credentials.
D is incorrect. SQL is not used to create storage credentials.

Question 2: Workspace prerequisites (VPC)

What are two prerequisites for creating a Databricks workspace in a VPC that you manage?

A A VPC and a principal with appropriate permissions
B A bucket and a service account
C A service account and a VPC
D A bucket and a VPC

Answer: A

Rationale:
A is correct. You must have a VPC and a properly permissioned principal to create a workspace.
B is incorrect. A bucket and service account are not the required pair here.
C is incorrect. A service account alone is insufficient without the principal permissions requirement.
D is incorrect. A bucket is not listed as a prerequisite for this step.

Question 3: Connecting to Google-managed services

Which two tasks are part of the general pattern involved in connecting Databricks to Google-managed services?

A Enable the appropriate API and attach a privileged service account
B Define a custom role and enable account console access
C Create a storage credential and enable APIs
D Attach service account and create a storage credential

Answer: A

Rationale:
A is correct. Enabling APIs and attaching a service account are key required steps.
B is incorrect. Custom roles are optional, and account console access is not required here.
C is incorrect. Storage credentials are not part of the general initial pattern.
D is incorrect. Storage credentials alone are insufficient without API enablement.

Question 4: Encryption key configuration scope

Which three elements can encryption key configurations be applied to?

A Root storage bucket, system storage bucket, and cluster disk volumes
B External storage bucket, metastore bucket, cluster volumes
C Root bucket, external bucket, cluster volumes
D System bucket, metastore bucket, external bucket

Answer: A

Rationale:
A is correct. These are supported targets for encryption key configurations.
B is incorrect. External and metastore buckets are not all valid in this context.
C is incorrect. External bucket is not part of the correct combination.
D is incorrect. Metastore bucket is not included in the correct set.

Question 5: External storage permission recipient

Who is the recipient of the permission grant when granting permissions for an external storage bucket?

A The service account generated by the metastore creation
B The service account provisioning the workspace
C The service account generated during workspace creation
D The Databricks account admin identity

Answer: A

Rationale:
A is correct. The metastore-generated service account is granted access.
B is incorrect. Workspace provisioning account is not the recipient.
C is incorrect. Workspace service account is not used here.
D is incorrect. Admin identity is not directly used for storage permissions.

Question 6: Cost optimization strategies

Which two steps reduce cost in GCP Databricks platform?

A Create cluster tags, choose correct instance type, and enable autoscaling
B Choose VPC architecture and enable autoscaling
C Create cluster tags and select VPC
D Choose instance type only

Answer: A

Rationale:
A is correct. Tagging, right sizing, and autoscaling optimize cost.
B is incorrect. VPC architecture does not directly reduce cost.
C is incorrect. VPC selection is not a cost reduction strategy.
D is incorrect. Instance type alone is insufficient.

Question 7: Workspace bucket requirements

What are the workspace bucket requirements?

A Two buckets, one for system and one for DBFS
B One bucket for system and DBFS
C Three buckets including metastore
D Workspaces can share a bucket

Answer: A

Rationale:
A is correct. Two separate buckets are required.
B is incorrect. A single bucket is insufficient.
C is incorrect. A metastore bucket is not required here.
D is incorrect. Buckets are not shared across workspaces.

Question 8: External location capability

What is true of an external location?

A It provides the ability to control access to a portion of an external storage bucket
B It provides access to the entire storage bucket only
C It requires account admin privileges
D It requires metastore admin privileges

Answer: A

Rationale:
A is correct. External locations allow fine-grained access control.
B is incorrect. It is not limited to whole bucket access.
C is incorrect. Account admin is not strictly required.
D is incorrect. Metastore admin is not always required.

Question 9: Accessing Google-managed services

Which mechanism enables Databricks clusters to access Google-managed services?

A Service account
B OIDC token
C VPC peering
D API settings

Answer: A

Rationale:
A is correct. Service accounts provide access to Google services.
B is incorrect. OIDC is used for authentication, not access mechanism here.
C is incorrect. VPC peering handles networking, not service access.
D is incorrect. API settings alone do not grant access.

Question 10: Programmatic authentication

What authentication scheme is required to query a list of workspaces programmatically?

A OIDC
B PAT
C Credentials passthrough
D Basic

Answer: A

Rationale:
A is correct. OIDC is required for programmatic workspace queries.
B is incorrect. PAT is not the required method here.
C is incorrect. Credentials passthrough applies to data access.
D is incorrect. Basic authentication is not used.

Question 11: Encryption key configuration behavior

Which two statements are true of encryption key configurations?

A They register a KMS key and allow rotation of cryptographic material
B They must be created per workspace only
C They cannot be rotated
D Data is unencrypted without them

Answer: A

Rationale:
A is correct. Encryption configs register KMS keys and support rotation.
B is incorrect. They are not strictly per workspace only.
C is incorrect. Rotation is supported.
D is incorrect. Data is still encrypted by default.

Question 12: Standalone VPC definition

What does the term “standalone VPC” refer to?

A VPC resides in the same project as the workspace resources
B VPC is in a separate project
C VPC supports only one workspace
D Automatically allocated VPC

Answer: A

Rationale:
A is correct. Standalone VPC exists in the same project.
B is incorrect. This describes another architecture.
C is incorrect. It is not limited to one workspace.
D is incorrect. Not automatically provisioned.

Question 13: VPC registration mechanism

Which mechanism registers a VPC into your Databricks account?

A Network configuration
B Service account
C Metastore
D VPC endpoint

Answer: A

Rationale:
A is correct. Network configuration registers the VPC in Databricks.
B is incorrect. Service accounts handle permissions, not networking.
C is incorrect. Metastore handles governance.
D is incorrect. VPC endpoints are not used here.

Question 14: Query federation properties

Which two statements are true in query federation?

A Data remains in place and it is a read-only connection
B SELECT is not allowed and UPSERT optimized
C UPDATE optimized and SELECT restricted
D Full write capability allowed

Answer: A

Rationale:
A is correct. Query federation keeps data in place and is read-only.
B is incorrect. SELECT is allowed.
C is incorrect. Updates are not supported.
D is incorrect. Federation does not allow writes.

Question 15: Regionality determination

Where is regionality determined when setting up workspaces in your own VPC?

A Subnet
B VPC
C Workspace
D IP address ranges

Answer: A

Rationale:
A is correct. Region is determined by subnet location.
B is incorrect. VPC alone does not define region.
C is incorrect. Workspace inherits region from subnet.
D is incorrect. IP ranges do not define region.

Question 16: Integration requirement

Which element do you have to create when integrating Databricks to Google-managed services?

A Service account
B VPC
C Bucket
D Role

Answer: A

Rationale:
A is correct. Service account is required for integration.
B is incorrect. VPC is unrelated to service integration.
C is incorrect. Bucket is not required for all integrations.
D is incorrect. Roles are secondary to service accounts.

Presentation(s)

Facilitators

  • Sandy Feng

    Databricks

    BDR

  • Alana Lam

    Frontier Capital

    Senior Asset Manager

  • D martin

    AWS Builder Center

    Trader Builder Group Leader (TBGL)

  • Kenny Chan

    Frontier Capital

    Macro Trader

  • Dan Chan

    Tech for Trading

    Dev

  • Andy Zhang

    AWS

    User Group Leader, Guangzhou

Partners

AWS community builder logo

AWS community builder

AWS FSI Customer Acceleration Group logo

AWS FSI Customer Acceleration Group

DataBricks logo

DataBricks

Databricks for Financial Services logo

Databricks for Financial Services

DataBricks User Groups logo

DataBricks User Groups

Frontier Capital logo

Frontier Capital

MongoDB logo

MongoDB

MongoDB Creators Program logo

MongoDB Creators Program

MongoDB for Financial Services logo

MongoDB for Financial Services

Organizer

  • Dan Chan

    Tech for Trading

    Dev

Contact Us