Complete your Databricks User Groups profile!

Fill out a few details about yourself so the community can get to know you.

Get Certified: Azure Databricks Platform Architect — Secure Lakehouse Design, Networking & Governance Deep Dive

Hong Kong Databricks FSI group

Jun 6, 4:00 – 7:00 AM (UTC)

2 RSVPs

Data Engineering

About this event

Join this hands-on meetup designed for professionals preparing for the Azure Databricks Platform Architect Accreditation, with a strong focus on real-world deployment patterns, security architecture, and enterprise-grade governance. This session goes beyond theory to demonstrate how modern data teams design and operate scalable Lakehouse platforms on Azure.

We will walk through practical scenarios such as deploying a workspace using Azure Resource Manager (ARM), configuring VNet-injected architectures for private network control, and implementing Secure Cluster Connectivity (no public IP) for production environments. Participants will explore how to integrate Azure Data Lake Storage Gen2 using Access Connectors (managed identities), applying correct RBAC roles like Storage Blob Data Contributor for secure data access.

The session also covers architect-level governance using Unity Catalog, including managing identities via Azure Active Directory (Entra ID), setting up fine-grained permissions, and enabling cross-workspace data sharing. We will demonstrate Private Link setup, DNS configuration, and UDR-based traffic routing for secure hybrid connectivity.

Attendees will gain hands-on insights into building pipelines, controlling cost, and troubleshooting workloads using query profiles, monitoring tools, and network diagnostics, preparing them to confidently design, secure, and optimize enterprise Databricks solutions aligned with certification standards.

Question 1: First-party service benefits

What are two benefits of Azure Databricks being a first-party service to Microsoft?

A It is covered by Microsoft’s Trust and Compliance assurances and supported by Microsoft SLAs
B It reduces storage costs and removes Azure dependency
C It only supports open-source tooling without Azure integration
D It eliminates the need for governance tools

Answer: A

Rationale:
A is correct. As a first-party service, Azure Databricks inherits Microsoft compliance and SLA guarantees.
B is incorrect. It does not remove Azure dependency.
C is incorrect. It integrates deeply with Azure services.
D is incorrect. Governance is still required.

Question 2: Workspace creation requirements

Which three items are minimally required for creating an Azure Databricks workspace?

A Workspace name, resource group, and Azure region
B VNet, subnet, and NAT gateway
C Subscription ID, VNet, and storage account
D Workspace name, VNet, and cluster

Answer: A

Rationale:
A is correct. These are the minimum ARM deployment inputs.
B is incorrect. VNet is optional.
C is incorrect. Storage account is not required upfront.
D is incorrect. Cluster creation is not required initially.

Question 3: Service principal

What is a service principal in Azure Databricks?

A A trusted identity object used for automation and authentication
B A storage account for managing data
C A compute resource for running clusters
D A governance policy object

Answer: A

Rationale:
A is correct. Service principals act as non-human identities for automation.
B is incorrect. Storage accounts store data.
C is incorrect. Compute is not related.
D is incorrect. Governance objects are separate.

Question 4: VNet injection

Why would you create a VNet-injected Azure Databricks workspace?

A To gain additional control over the network
B To reduce compute costs
C To eliminate cluster setup
D To enable automatic scaling

Answer: A

Rationale:
A is correct. VNet injection allows custom networking and security controls.
B is incorrect. It does not directly reduce costs.
C is incorrect. Clusters are still required.
D is incorrect. Autoscaling is unrelated.

Question 5: Secure Cluster Connectivity

What is Secure Cluster Connectivity?

A A configuration with no public IP for compute nodes
B A method to enable faster processing
C A storage optimization technique
D A monitoring tool

Answer: A

Rationale:
A is correct. It removes public IP exposure for improved security.
B is incorrect. It is not performance-related.
C is incorrect. It is not storage optimization.
D is incorrect. It is not a monitoring feature.

Question 6: Private Link

What is one use case for Azure Databricks Private Link?

A Private communication between control plane and data plane
B Public data sharing across clouds
C Automatic VNet creation
D Performance tuning

Answer: A

Rationale:
A is correct. Private Link keeps traffic on Microsoft backbone.
B is incorrect. It is not for public sharing.
C is incorrect. It does not create VNets.
D is incorrect. It does not optimize queries.

Question 7: Deployment service

Which service enables Terraform and ARM deployments for Azure Databricks?

A Azure Resource Manager
B Azure Active Directory
C Unity Catalog
D Azure Monitor

Answer: A

Rationale:
A is correct. ARM provides infrastructure-as-code deployment.
B is incorrect. AAD handles identity.
C is incorrect. UC handles governance.
D is incorrect. Monitor handles observability.

Question 8: Data architecture layers

Azure Databricks is used in which layers of a data architecture?

A Ingest, Process, and Serve
B Storage, Security, and Monitoring
C Compute, Identity, Networking
D Catalog, Schema, Table

Answer: A

Rationale:
A is correct. Databricks operates across ingestion, processing, and serving layers.
B is incorrect. These are support components.
C is incorrect. These are platform layers.
D is incorrect. These are data hierarchy levels.

Question 9: Subnet purpose

What are the two designated subnets used for?

A Enabling communication between compute nodes and infrastructure
B Storing data permanently
C Running SQL queries
D Managing authentication

Answer: A

Rationale:
A is correct. Subnets enable communication between cluster components.
B is incorrect. Storage is separate.
C is incorrect. Queries are executed by compute.
D is incorrect. Authentication is handled elsewhere.

Question 10: VNet Peering

Where is local VNet peering used?

A Between cloud resources using private IP addresses
B Between external internet endpoints
C Between storage and compute logs
D Between SQL warehouses

Answer: A

Rationale:
A is correct. VNet peering connects networks via private IPs.
B is incorrect. Not used for public internet routing.
C is incorrect. Logs are not peered.
D is incorrect. SQL warehouses are not directly peered.

Question 11: UDR usage

For which two reasons would you use UDRs?

A To block public ingress and control routing through firewalls
B To increase compute performance
C To store data securely
D To enable autoscaling

Answer: A

Rationale:
A is correct. UDRs allow traffic control and security enforcement.
B is incorrect. Not performance-related.
C is incorrect. Storage is separate.
D is incorrect. Autoscaling is compute configuration.

Question 12: IP access list

What is the purpose of an IP access list?

A To restrict access to approved IP addresses
B To store credentials
C To manage compute clusters
D To track query lineage

Answer: A

Rationale:
A is correct. It enforces network-level access control.
B is incorrect. Credentials are stored elsewhere.
C is incorrect. Cluster management is separate.
D is incorrect. Lineage is handled by governance tools.

Question 13: Governance tools

Which two tools provide governance and access control?

A Azure Active Directory and Unity Catalog
B Azure Monitor and SQL Editor
C ARM and Storage Accounts
D VNet and Firewall

Answer: A

Rationale:
A is correct. AAD manages identity; Unity Catalog manages data governance.
B is incorrect. These are monitoring/query tools.
C is incorrect. These are infrastructure tools.
D is incorrect. These are networking tools.

Question 14: Private Link DNS

Which two DNS records are required for Private Link?

A Control plane record and Azure DB AName record
B Public DNS and external gateway record
C Storage and compute records
D Firewall and VNet records

Answer: A

Rationale:
A is correct. These ensure correct private resolution.
B is incorrect. Public DNS is not used.
C is incorrect. These records are not required.
D is incorrect. Firewall does not define DNS.

Question 15: External storage connection

What resource is needed to connect control plane to external storage?

A Azure Databricks Access Connector
B VNet Peering
C SQL Warehouse
D Private Endpoint

Answer: A

Rationale:
A is correct. Access Connector provides managed identity.
B is incorrect. Networking only.
C is incorrect. SQL warehouse is compute.
D is incorrect. Not the required component.

Question 16: Service principal interaction

How does a service principal interact with Azure Databricks?

A Through the REST API
B Through UI dashboards
C Through SQL queries only
D Through cluster configuration

Answer: A

Rationale:
A is correct. Service principals use APIs for automation.
B is incorrect. UI is for human users.
C is incorrect. Not limited to SQL.
D is incorrect. Cluster config is not interaction method.

Question 17: Unity Catalog objects

Which three items can Unity Catalog manage?

A Users, service principals, and tables
B Clusters, notebooks, pipelines
C Storage, VNet, compute
D Logs, metrics, dashboards

Answer: A

Rationale:
A is correct. UC governs identities and data assets.
B is incorrect. These are workspace assets.
C is incorrect. Infrastructure elements are not governed by UC.
D is incorrect. Monitoring elements are separate.

Question 18: Access connector role

Which role must an access connector have?

A Storage Blob Data Contributor
B Owner
C Reader
D Contributor

Answer: A

Rationale:
A is correct. This role allows read/write to storage.
B is incorrect. Owner is excessive.
C is incorrect. Reader is insufficient.
D is incorrect. Generic contributor is not specific enough.


Presentation(s)

Moderators

  • Sandy Feng

    Databricks

    BDR

  • Andy Zhang

    AWS

    User Group Leader, Guangzhou

  • Alana Lam

    Frontier Capital

    Senior Asset Manager

  • D martin

    AWS Builder Center

    Trader Builder Group Leader (TBGL)

  • Kenny Chan

    Frontier Capital

    Macro Trader

  • Dan Chan

    Tech for Trading

    Dev

Partners

AWS community builder logo

AWS community builder

AWS FSI Customer Acceleration Group logo

AWS FSI Customer Acceleration Group

DataBricks logo

DataBricks

Databricks for Financial Services logo

Databricks for Financial Services

DataBricks User Groups logo

DataBricks User Groups

Frontier Capital logo

Frontier Capital

MongoDB logo

MongoDB

MongoDB Creators Program logo

MongoDB Creators Program

MongoDB for Financial Services logo

MongoDB for Financial Services

Organizer

  • Dan Chan

    Tech for Trading

    Dev

Contact Us