Complete your Databricks User Groups profile!

Fill out a few details about yourself so the community can get to know you.

From Signal to Execution: Building a Real-Time Alpha Factory on the Lakehouse

Hong Kong Databricks FSI group

Jun 20, 4:00 – 7:00 AM (UTC)

1 RSVPs

Login to RSVP

About this event

In this hands-on session, we walk through how an institutional multi-asset trading desk operationalizes alpha pipelines using Databricks on AWS. Starting from raw ingestion of tick data, options Greeks, and DeFi liquidity pool flows, we demonstrate how to construct a production-grade stack that feeds directly into execution and risk systems. Attendees will see how Delta Lake, streaming transformations, and feature pipelines enable sub-minute recalculation of IC, ICIR, and regime-shift detection.

We’ll dive into PnL explain, VaR drift, and intraday drawdown monitoring, showing how traders slice performance by book, instrument, and execution venue using AI/BI dashboards and dynamic filters. Expect real-world details on slippage attribution, impermanent loss tracking, and cross-regime performance comparison.

This meetup is not theoretical. We’ll share how desks survive volatile markets by coupling SQL Warehouses, Photon acceleration, and low-latency streaming pipelines to maintain signal freshness. You’ll leave understanding how to design a scalable, fault-tolerant alpha factory where data engineering decisions directly protect Sharpe ratio, reduce execution leakage, and keep your desk trading ahead of regime shifts—not reacting to them.

[S1] big-book-data-engineering

https://www.databricks.com/sites/default/files/2025-11/big-book-data-engineering.pdf




Background

After three decades on the sell-side and buy-side trading desks—from voice brokerage to fully automated cross-asset execution—we moved our alpha pipelines onto Databricks running on AWS. The desk consumes tick data, options Greeks, liquidity pool flows from DeFi AMMs, and factor signals (IC, IR, breadth) streamed into Delta Lake. Traders rely on AI/BI dashboards for intraday risk—PnL explain, VaR drift, and drawdown monitoring during regime shifts. In production, filters must slice by book, instrument, and execution venue. Without flexible filtering, you cannot isolate slippage, track impermanent loss in liquidity pools, or compare strategy performance across bull, bear, and sideways regimes.

Question 1: Dashboard filter types

What filter types are available when building an Databricks AI/BI dashboard?
A Global, page level, and widget level filters
B Global and page level filters
C Page and widget level filters
D Global filters only

Answer: A

Rationale:
A is correct. Databricks AI/BI dashboards support global, page-level, and widget-level filters to enable granular slicing of trading metrics such as PnL, risk, and liquidity exposures across portfolios.
B is incorrect. This omits widget-level filters, which are critical for isolating specific trading signals.
C is incorrect. It excludes global filters, which are needed for consistent filtering across all dashboard views.
D is incorrect. Only global filters are insufficient for detailed analysis at desk or instrument level.

Question 2: Genie natural language interaction

How does AI/BI Genie empower trading desk users to explore data?
A By allowing interaction using natural language chats
B By requiring predefined SQL queries only
C By enforcing Python scripting for all interactions
D By integrating only external BI tools

Answer: A

Rationale:
A is correct. Genie enables natural language interaction, allowing traders to quickly query exposures such as “show drawdown by strategy during high-volatility regimes” without writing SQL.
B is incorrect. Genie's value is reducing reliance on predefined queries.
C is incorrect. Python scripting is not required for basic interaction.
D is incorrect. Genie is native to Databricks and does not rely solely on external BI tools.

Question 3: SQL hierarchy

What is the correct order of the Databricks SQL namespace hierarchy?
A Catalog → Schema → Table
B Schema → Table → Catalog
C Table → Schema → Catalog
D Workspace → Schema → Table

Answer: A

Rationale:
A is correct. Unity Catalog organizes data in a three-level hierarchy: Catalog → Schema → Table, enabling governance across trading datasets such as tick data and factor models.
B is incorrect. The order is reversed.
C is incorrect. Tables do not sit above schema or catalog.
D is incorrect. Workspace is not part of the SQL namespace hierarchy.

Background

On a multi-asset hedge fund desk running systematic strategies, we validate signals using IC, ICIR, and regime-conditioned backtests. Data pipelines ingest market data, DeFi liquidity pool metrics, and execution logs into Lakehouse architecture. Analysts need rapid dashboard refresh and governed access to UC tables for audit and compliance. When publishing dashboards to PMs and risk committees, correctness of permissions, refresh cadence, and dataset lifecycle is critical. From monitoring slippage to detecting alpha decay, dashboards act as real-time control towers. Efficient dataset management ensures traders react before liquidity dries up or volatility regimes shift, avoiding catastrophic drawdowns.

Question 4: Dashboard sharing prerequisite

What must a data analyst do before sharing a dashboard with stakeholders?
A Publish the dashboard
B Clone the draft version
C Configure refresh schedules
D Validate SQL queries

Answer: A

Rationale:
A is correct. Publishing the dashboard makes it available to stakeholders such as portfolio managers and risk teams.
B is incorrect. Cloning does not make the dashboard accessible.
C is incorrect. Refresh schedules are important but not required before sharing.
D is incorrect. Query validation is useful but not a prerequisite for sharing.

Question 5: Dataset creation tab

Which tab allows defining a dataset using SQL in Databricks dashboards?
A Data
B Genie
C Filters
D Visualizations

Answer: A

Rationale:
A is correct. The Data tab is where SQL queries define datasets for dashboards.
B is incorrect. Genie is used for conversational queries, not dataset definition.
C is incorrect. Filters modify views, not datasets.
D is incorrect. Visualizations display data but do not define datasets.

Question 6: Permissions governance

How are permissions handled when using a Unity Catalog table in dashboards?
A Unity Catalog governance controls permissions
B Dashboard-level permissions override all
C Permissions are ignored
D Full access is granted by default

Answer: A

Rationale:
A is correct. Unity Catalog enforces centralized governance, ensuring compliance with trading desk data policies.
B is incorrect. Dashboard permissions do not override UC governance.
C is incorrect. Permissions are strictly enforced.
D is incorrect. Access must be explicitly granted.

Background

Running real-time risk and PnL dashboards on AWS-backed Databricks requires tight integration between compute and SQL warehouses. On the desk, we run intraday factor recalcs, VaR shocks, and liquidity stress scenarios. Genie Spaces must be backed by compute resources capable of handling high concurrency queries. Traders frequently onboard new datasets—from order book depth to DeFi yield signals—and must explore them before building visualizations. In fast markets, publishing stale dashboards is equivalent to trading blind; data freshness and compute coupling define whether you catch a regime shift or miss it entirely.

Question 7: Genie space compute

Which compute resource must be associated when creating a Genie Space?
A SQL warehouse
B All-purpose cluster
C External location
D Metastore

Answer: A

Rationale:
A is correct. SQL warehouses power query execution for BI and Genie interactions.
B is incorrect. All-purpose clusters are not required for Genie Spaces.
C is incorrect. External locations relate to storage, not compute.
D is incorrect. Metastore handles metadata, not execution.

Question 8: First step for new dataset

What is the first step when enhancing a dashboard with a new dataset?
A Locate and explore the dataset
B Create visualizations
C Publish dashboard
D Create Genie Space

Answer: A

Rationale:
A is correct. Analysts must first understand the dataset—similar to validating signal quality before deploying trading strategies.
B is incorrect. Visualization comes after understanding data.
C is incorrect. Publishing occurs later.
D is incorrect. Genie Space is not required for dataset onboarding.

Question 9: Refresh mechanism

How is dashboard data updated in Databricks?
A Refreshed on demand or on a schedule
B Automatically by Unity Catalog
C Only by manual SQL execution
D Only when cloning dashboard

Answer: A

Rationale:
A is correct. Data refresh can be scheduled or triggered, ensuring traders view up-to-date risk and performance metrics.
B is incorrect. Unity Catalog does not refresh data.
C is incorrect. Manual refresh is not the only method.
D is incorrect. Cloning does not refresh datasets.

Background

In quantitative research, we constantly optimize queries to investigate why execution costs spike or why IC deteriorates. Using Photon engine, caching, and Z-ordering, we accelerate queries on billions of rows—tick data, options chains, and DeFi transaction logs. On AWS-backed Databricks, performance tuning directly impacts time-to-decision. During volatile markets, delays in query execution can lead to slippage or missed arbitrage. The Lakehouse architecture allows unified processing of batch and streaming data, which is critical when managing hybrid CeFi and DeFi portfolios across multiple liquidity venues.

Question 10: Query performance optimization

What helps optimize query performance in Databricks?
A Photon engine, caching, and clustering
B Increasing number of dashboards
C Using only notebooks
D Disabling autoscaling

Answer: A

Rationale:
A is correct. Photon, caching, and clustering improve query performance, critical for real-time analytics.
B is incorrect. Dashboards do not improve performance.
C is incorrect. Notebooks are not a performance optimization tool.
D is incorrect. Autoscaling improves efficiency, not performance reduction.

Presentation(s)

Moderators

  • Sandy Feng

    Databricks

    BDR

  • Alana Lam

    Frontier Capital

    Senior Asset Manager

  • D martin

    AWS Builder Center

    Trader Builder Group Leader (TBGL)

  • Kenny Chan

    Frontier Capital

    Macro Trader

  • Dan Chan

    Tech for Trading

    Dev

  • Andy Zhang

    AWS

    User Group Leader, Guangzhou

  • Y.C Law

    Investor Group

    Financial Officer ASZEN

Partners

AWS community builder logo

AWS community builder

AWS FSI Customer Acceleration Group logo

AWS FSI Customer Acceleration Group

DataBricks logo

DataBricks

Databricks for Financial Services logo

Databricks for Financial Services

DataBricks User Groups logo

DataBricks User Groups

Frontier Capital logo

Frontier Capital

MongoDB logo

MongoDB

MongoDB Creators Program logo

MongoDB Creators Program

MongoDB for Financial Services logo

MongoDB for Financial Services

Organizer

  • Dan Chan

    Tech for Trading

    Dev

Contact Us