Complete your Databricks User Groups profile!

Fill out a few details about yourself so the community can get to know you.

Get Certified: AWS Databricks Platform Architect Accreditation — Real-World Lakehouse Architecture & Hands-On Design

Hong Kong Databricks FSI group

May 30, 4:00 – 7:00 AM (UTC)

1 RSVPs

About this event

This hands-on session is designed for data engineers and architects preparing for the AWS Databricks Platform Architect Accreditation, focusing on real-world implementation patterns rather than theory alone. Participants will explore how to design and operate a production-grade Lakehouse architecture on AWS, leveraging core components such as S3 storage, IAM roles (instance profiles), Unity Catalog, and Databricks SQL.

We’ll walk through practical scenarios: configuring cross-account IAM roles for secure S3 access, setting up external locations and storage credentials in Unity Catalog, and understanding how metastore governance scales across multiple workspaces. Attendees will also build pipelines using job clusters and autoscaling compute, applying cost optimization strategies like instance type selection, tagging, and ephemeral workloads to control spend.

The session will cover query federation with Amazon Redshift (port 5439) versus ingestion design decisions, along with performance tuning using Photon execution engine, Delta caching, and Z-order clustering. You’ll also troubleshoot real cases using query profiles, system tables, and workflow observability dashboards.

By the end, participants will gain architect-level clarity on security, networking (VPC + private subnets), data governance, and scalable pipeline design, fully aligned with certification objectives and enterprise deployment standards.


Question 1: Encryption key configurations

Which two statements are true of encryption key configurations?

A They register a KMS key into your Databricks account and allow rotation of cryptographic material
B They must be created for each workspace only
C They cannot be rotated once created
D Data is not encrypted if they are not used

Answer: A

Rationale:
A is correct. Encryption configurations integrate KMS keys and support rotation.
B is incorrect. They are not strictly per workspace.
C is incorrect. Rotation is supported.
D is incorrect. Data is still encrypted by default without custom keys.

Question 2: Encryption key targets

Which three elements can encryption key configurations be applied to?

A Root storage bucket, system storage bucket, and cluster disk volumes
B External storage bucket, metastore bucket, cluster volumes
C Root bucket, external bucket, and metastore bucket
D Only cluster disk volumes

Answer: A

Rationale:
A is correct. These are supported encryption targets.
B is incorrect. External storage and metastore bucket are not fully correct.
C is incorrect. Includes invalid components.
D is incorrect. Encryption can apply to more than cluster disks.

Question 3: Workspace authentication

Which two authentication schemes are supported for querying a list of workspaces programmatically?

A OAuth and Basic authentication
B PAT and credential passthrough
C OAuth and PAT
D Basic and credential passthrough

Answer: A

Rationale:
A is correct. OAuth and Basic authentication are supported.
B is incorrect. Credential passthrough is not used here.
C is incorrect. PAT is not part of the expected answer.
D is incorrect. Credential passthrough is irrelevant.

Question 4: Metastore prerequisites

What are two prerequisites for creating an AWS Databricks metastore?

A Bucket and IAM role with appropriate permissions
B VPC and subnet
C Workspace and service account
D Storage credential and cluster

Answer: A

Rationale:
A is correct. Metastore requires storage bucket and IAM role.
B is incorrect. VPC is not required.
C is incorrect. Workspace creation is separate.
D is incorrect. Storage credential is not a prerequisite.

Question 5: Storage credentials

Where are storage credentials created?

A Workspace/Data explorer
B Workspace/Admin console
C Account console/Data page
D SQL

Answer: A

Rationale:
A is correct. Storage credentials are created in the Data Explorer.
B is incorrect. Admin console is for settings.
C is incorrect. Account console does not create workspace credentials.
D is incorrect. SQL is not used.

Question 6: Object storage access

What roles do you need to connect to object storage?

A Self-assuming cross-account role
B Unity Catalog workspace only
C Assume permissions
D Local IAM user

Answer: A

Rationale:
A is correct. Cross-account IAM role enables secure storage access.
B is incorrect. Workspace alone is insufficient.
C is incorrect. Not a defined role.
D is incorrect. IAM user is not scalable.

Question 7: Serverless compute security

Which statement confirms that serverless compute and cloud resources are secured?

A By communicating on the same IP
B By configuring serverless compute only
C By using cloud resources
D By enabling compute only

Answer: A

Rationale:
A is correct. Same-IP communication ensures secure boundary.
B is incorrect. Configuration alone is insufficient.
C is incorrect. Resource usage does not guarantee security.
D is incorrect. Compute alone is not security.

Question 8: External location setup

Which three steps are required to create an external location?

A Create a catalog, create connection, create storage credential
B Create workspace, create cluster, create connection
C Create IAM role, create VPC, create subnet
D Create notebook, create query, create dataset

Answer: A

Rationale:
A is correct. These steps define external access configuration.
B is incorrect. Workspace/cluster not required here.
C is incorrect. Infrastructure steps are unrelated.
D is incorrect. Analytical steps irrelevant.

Question 9: AWS service integration

Which element must be created when integrating Databricks with AWS-managed services?

A IAM role
B VPC
C Bucket
D Key

Answer: A

Rationale:
A is correct. IAM role enables secure service integration.
B is incorrect. VPC is not required for integration.
C is incorrect. Bucket is not always required.
D is incorrect. Key alone is insufficient.

Question 10: NCC firewall

Which statements confirm NCC firewall enablement?

A Serverless compute uses NCC and firewall allowlist is configured
B Using compute resources only
C Using one IP for communication
D Enabling SQL warehouse

Answer: A

Rationale:
A is correct. NCC usage and allowlisting confirm firewall setup.
B is incorrect. Compute usage alone is insufficient.
C is incorrect. IP communication does not confirm firewall.
D is incorrect. SQL warehouse unrelated.

Question 11: Query federation privileges

What privileges do you need to perform query federation?

A Metastore admin and account admin
B Read-only connection
C Create connection only
D Workspace admin

Answer: A

Rationale:
A is correct. Both admin roles are required.
B is incorrect. Read-only insufficient.
C is incorrect. Connection alone insufficient.
D is incorrect. Workspace admin insufficient.

Question 12: Workspace prerequisites (AWS)

Which two elements must be created before a Databricks workspace?

A Bucket and cross-account IAM role
B VPC and subnet
C Storage credential and metastore
D Workspace and cluster

Answer: A

Rationale:
A is correct. Storage and IAM role required.
B is incorrect. VPC not mandatory.
C is incorrect. Not prerequisites.
D is incorrect. Circular dependency.

Question 13: Subnet requirements

What are the subnet requirements for each Databricks workspace?

A Two private subnets
B One public subnet only
C Two public subnets
D One public and one private subnet

Answer: A

Rationale:
A is correct. Two private subnets are required.
B is incorrect. Not enough.
C is incorrect. Public subnets not required.
D is incorrect. Architecture requires two private.

Question 14: Foundation model service

Which service allows fine-tuning of foundational models?

A Amazon Bedrock
B SageMaker
C AWS Glue
D S3

Answer: A

Rationale:
A is correct. Bedrock supports model fine-tuning.
B is incorrect. SageMaker is broader ML platform.
C is incorrect. Glue is ETL tool.
D is incorrect. S3 is storage.

Question 15: Regionality

Where is regionality determined when setting up workspaces?

A VPC
B Subnet
C Workspace
D IP range

Answer: A

Rationale:
A is correct. VPC determines regional deployment.
B is incorrect. Subnet is secondary.
C is incorrect. Workspace inherits region.
D is incorrect. IP range does not define region.

Question 16: AWS connection pattern

Which two steps are required to connect Databricks to AWS-managed services?

A Enable/configure service and create IAM role
B Create connection only
C Enable console access only
D Create storage credential only

Answer: A

Rationale:
A is correct. Both steps are required for integration.
B is incorrect. Connection alone insufficient.
C is incorrect. Console access irrelevant.
D is incorrect. Storage credential alone insufficient.

Question 17: Workspace API

Which API is used to create a workspace?

A Account API
B Workspace API
C SCIM API
D Unity Catalog API

Answer: A

Rationale:
A is correct. Account API provisions workspaces.
B is incorrect. Workspace API operates within workspace.
C is incorrect. SCIM is for identity.
D is incorrect. UC handles governance.

Question 18: Redshift federation

Which statement confirms federated query on Redshift?

A Default port is 5439
B Port is 5349
C All ports must be open
D No firewall needed

Answer: A

Rationale:
A is correct. Redshift uses port 5439.
B is incorrect. Invalid port.
C is incorrect. Not required.
D is incorrect. Firewall rules required.

Question 19: Trusted principal

Who is the trusted principal for external storage permissions?

A Static Unity Catalog IAM role
B Workspace IAM role
C Metastore IAM role
D User IAM role

Answer: A

Rationale:
A is correct. Static UC IAM role is used for trust relationship.
B is incorrect. Workspace role is not the principal.
C is incorrect. Metastore role is not correct in AWS context.
D is incorrect. User role is not used.

Presentation(s)

Moderators

  • Sandy Feng

    Databricks

    BDR

  • Kenny Chan

    Frontier Capital

    Macro Trader

  • Andy Zhang

    AWS

    User Group Leader, Guangzhou

  • Alana Lam

    Frontier Capital

    Senior Asset Manager

  • Dan Chan

    Tech for Trading

    Dev

  • D martin

    AWS Builder Center

    Trader Builder Group Leader (TBGL)

Partners

AWS community builder logo

AWS community builder

AWS FSI Customer Acceleration Group logo

AWS FSI Customer Acceleration Group

DataBricks logo

DataBricks

Databricks for Financial Services logo

Databricks for Financial Services

DataBricks User Groups logo

DataBricks User Groups

Frontier Capital logo

Frontier Capital

MongoDB logo

MongoDB

MongoDB Creators Program logo

MongoDB Creators Program

MongoDB for Financial Services logo

MongoDB for Financial Services

Organizer

  • Dan Chan

    Tech for Trading

    Dev

Contact Us