Complete your Databricks User Groups profile!

Fill out a few details about yourself so the community can get to know you.

Get Certified as a Data Analyst with Databricks: From Lakehouse Fundamentals to Real-World AI/BI Dashboards

Hong Kong Databricks FSI group

May 23, 4:00 – 7:00 AM (UTC)

1 RSVPs

About this event

This hands-on meetup walks participants through the Databricks Certified Data Analyst Learning Plan, focusing on practical workflows used in modern data teams. We will explore how analysts operate within the Lakehouse architecture, combining SQL analytics, governed data access via Unity Catalog, and interactive dashboards powered by Databricks AI/BI.

The session dives into real-world scenarios: building materialized views for high-volume transactional datasets, optimizing refresh schedules based on ingestion frequency (e.g., 4-hour batch pipelines), and reducing query latency using precomputed aggregates. Attendees will work with Databricks SQL Editor to execute analytical queries, investigate performance using query profiles, and apply data lineage tracking in Catalog Explorer to trace transformations end-to-end.

We’ll also demonstrate how to design production-ready dashboards—defining datasets with SQL, applying multi-level filters (global, page, widget), and integrating AI/BI Genie for conversational analytics so business users can query data using natural language. Practical tips include managing permissions via UC (SELECT access), uploading raw files into Unity Catalog Volumes, and transforming them into Delta tables using CTAS patterns.

By the end, participants will gain a strong, job-ready understanding of data modeling, governance, and dashboard delivery within Databricks—aligned directly to certification and real enterprise use cases.


Quiz: AI/BI for Data Analysts

Question 1: Counter widget

Which widget typically displays a single numerical summary statistic, such as a sales goal?

A Counter
B Combo Chart
C Text Box
D Pivot Table

Answer: A

Rationale:
A is correct. A Counter displays a single KPI or numerical summary.
B is incorrect. A Combo Chart compares data series using combined chart types.
C is incorrect. A Text Box displays text, not a calculated metric.
D is incorrect. A Pivot Table summarizes data in a tabular layout.

Question 2: AI/BI Genie interaction medium

AI/BI Genie empowers end users by allowing them to interact with data using what medium?

A Natural language chats
B Python scripting
C Predefined query execution
D Tableau integration

Answer: A

Rationale:
A is correct. AI/BI Genie allows users to ask questions using natural language.
B is incorrect. Python scripting is not the primary user interaction method for Genie.
C is incorrect. Genie is designed for conversational exploration, not only predefined queries.
D is incorrect. Tableau integration is not the medium described.

Question 3: Dashboard filter types

What filter types are available when building an AI/BI dashboard?

A Global, page level, and widget level filters
B Global and page level filters
C Page and widget level filters
D Global filters only

Answer: A

Rationale:
A is correct. AI/BI dashboards support global, page-level, and widget-level filters.
B is incorrect. It omits widget-level filters.
C is incorrect. It omits global filters.
D is incorrect. Dashboards are not limited to global filters only.

Question 4: Databricks Assistant for datasets

What is the function of the Databricks Assistant when defining a dataset for a dashboard?

A It helps compose SQL queries
B It publishes dashboards
C It manages data ingestion pipelines
D It creates visualizations automatically during dataset definition

Answer: A

Rationale:
A is correct. Databricks Assistant helps users write SQL queries.
B is incorrect. Publishing dashboards is a separate dashboard action.
C is incorrect. Managing ingestion pipelines is not its role in dashboard dataset definition.
D is incorrect. It does not automatically create visualizations during dataset definition.

Question 5: Genie Space compute resource

Which compute resource must be associated with a Genie Space during its creation?

A SQL warehouse
B Metastore
C All-purpose cluster
D External location

Answer: A

Rationale:
A is correct. A Genie Space must be associated with a SQL warehouse.
B is incorrect. A metastore is a governance layer, not the compute resource.
C is incorrect. An all-purpose cluster is not the required compute for Genie Space.
D is incorrect. An external location is used for storage governance, not Genie compute.

Question 6: Adding a new dataset to a dashboard

What is the first step when enhancing an existing dashboard with a new dataset in Databricks?

A Locate and explore the new dataset
B Create a Genie Space
C Add new visualizations
D Publish the dashboard

Answer: A

Rationale:
A is correct. You first need to find and understand the dataset.
B is incorrect. Creating a Genie Space is unrelated to enhancing a dashboard with a dataset.
C is incorrect. Visualizations come after the dataset is identified and added.
D is incorrect. Publishing happens after dashboard changes are completed.

Question 7: Sharing dashboards

What action is the prerequisite step for sharing a dashboard with stakeholders?

A Publishing the dashboard
B Cloning the draft version
C Using AI Assistant to verify queries
D Setting a refresh schedule

Answer: A

Rationale:
A is correct. A dashboard must be published before it can be shared with stakeholders.
B is incorrect. Cloning is not required for sharing.
C is incorrect. Query verification is helpful but not the sharing prerequisite.
D is incorrect. A refresh schedule is optional and not required for sharing.

Question 8: Dataset definition tab

When creating a dashboard in Databricks, which tab allows you to define a dataset using a SQL query?

A Data
B Genie
C Filters
D Visualizations

Answer: A

Rationale:
A is correct. The Data tab is used to define dashboard datasets using SQL.
B is incorrect. Genie is for conversational analytics.
C is incorrect. Filters are used to control displayed data, not define datasets.
D is incorrect. Visualizations are created after datasets are defined.

Question 9: Three-level namespace hierarchy

What is the correct order of the three-level namespace hierarchy in Databricks SQL?

A Catalog, Schema, Table
B Table, Schema, Catalog
C Workspace, Table, Schema
D Catalog, Table, Metastore

Answer: A

Rationale:
A is correct. Databricks SQL uses the hierarchy Catalog → Schema → Table.
B is incorrect. The order is reversed.
C is incorrect. Workspace is not part of the three-level namespace.
D is incorrect. Metastore is above catalogs, and table does not come before schema.

Question 10: Dashboard settings

What is the main purpose of customizing dashboard settings in Databricks AI/BI dashboards?

A To control the overall look, behavior, and formatting of the dashboard
B To change the underlying data in the datasets
C To define widget level filters for individual visualizations
D To edit SQL queries used by dashboard datasets

Answer: A

Rationale:
A is correct. Dashboard settings control appearance, behavior, and formatting.
B is incorrect. Dataset data is changed through data/query configuration, not dashboard settings.
C is incorrect. Widget-level filters are filter configurations, not the main purpose of dashboard settings.
D is incorrect. SQL queries are edited in dataset configuration.

Question 11: Unity Catalog permissions

How are permissions handled when adding an existing UC table as a dashboard dataset?

A Unity Catalog governance
B Permissions are managed at the dashboard level
C Permissions are ignored by the dashboard
D Permissions default to full access

Answer: A

Rationale:
A is correct. Permissions are governed by Unity Catalog.
B is incorrect. Dashboard-level permissions do not override UC data governance.
C is incorrect. Permissions are not ignored.
D is incorrect. Access does not default to full access.

Question 12: Dashboard data updates

What setting is configured to automatically update the data assets used by a dashboard?

A Refresh schedules
B Git push frequency
C Cross-filtering options
D Embedded credentials

Answer: A

Rationale:
A is correct. Refresh schedules automatically update dashboard data assets.
B is incorrect. Git push frequency is unrelated to dashboard data refresh.
C is incorrect. Cross-filtering affects interactions, not automatic data updates.
D is incorrect. Embedded credentials are not the refresh mechanism.

Question 13: Catalog Explorer purpose

What is the primary purpose of Catalog Explorer in Databricks for data analysts?

A To discover datasets for analytics
B To manage clusters
C To create visualizations
D To publish dashboards

Answer: A

Rationale:
A is correct. Catalog Explorer helps analysts find and inspect datasets.
B is incorrect. Cluster management is handled elsewhere.
C is incorrect. Visualizations are created in dashboards or query tools.
D is incorrect. Publishing dashboards is not the main purpose of Catalog Explorer.

Question 14: Genie governance principle

All Genie interactions are governed by UC's security policies and data access controls, representing what principle?

A Unified security and governance
B Decentralized architecture
C Automatic optimization
D Transparent management

Answer: A

Rationale:
A is correct. Unity Catalog provides unified security and governance.
B is incorrect. The question describes centralized governance, not decentralization.
C is incorrect. It refers to access control, not optimization.
D is incorrect. Transparent management is not the principle described.

Question 15: Column descriptions for Genie

What feature in Catalog Explorer assists in populating column descriptions for Genie to utilize?

A AI Generate button
B Manual text input
C DESCRIBE TABLE
D ANALYZE TABLE

Answer: A

Rationale:
A is correct. The AI Generate button can help populate column descriptions.
B is incorrect. Manual input is possible but not the assisted feature.
C is incorrect. DESCRIBE TABLE shows metadata but does not auto-generate descriptions for Genie.
D is incorrect. ANALYZE TABLE gathers statistics, not descriptions.

Presentation(s)

Speakers

  • Kenny Chan

    Frontier Capital

    Macro Trader

  • Dan Chan

    Tech for Trading

    Dev

  • Alana Lam

    Frontier Capital

    Senior Asset Manager

  • Andy Zhang

    AWS

    User Group Leader, Guangzhou

  • D Martin

    AWS Builder Center

    Trader Builder Group Leader (TBGL)

Moderators

  • Sandy Feng

    Databricks

    BDR

  • Kenny Chan

    Frontier Capital

    Macro Trader

  • Andy Zhang

    AWS

    User Group Leader, Guangzhou

  • Alana Lam

    Frontier Capital

    Senior Asset Manager

  • Dan Chan

    Tech for Trading

    Dev

  • D Martin

    AWS Builder Center

    Trader Builder Group Leader (TBGL)

Partners

AWS community builder logo

AWS community builder

AWS FSI Customer Acceleration Group logo

AWS FSI Customer Acceleration Group

DataBricks logo

DataBricks

Databricks for Financial Services logo

Databricks for Financial Services

DataBricks User Groups logo

DataBricks User Groups

Frontier Capital logo

Frontier Capital

MongoDB logo

MongoDB

MongoDB Creators Program logo

MongoDB Creators Program

MongoDB for Financial Services logo

MongoDB for Financial Services

Organizer

  • Dan Chan

    Tech for Trading

    Dev

Contact Us