Jun 20, 4:00 – 7:00 AM (UTC)
1 RSVPs
In this hands-on session, we walk through how an institutional multi-asset trading desk operationalizes alpha pipelines using Databricks on AWS. Starting from raw ingestion of tick data, options Greeks, and DeFi liquidity pool flows, we demonstrate how to construct a production-grade stack that feeds directly into execution and risk systems. Attendees will see how Delta Lake, streaming transformations, and feature pipelines enable sub-minute recalculation of IC, ICIR, and regime-shift detection.
We’ll dive into PnL explain, VaR drift, and intraday drawdown monitoring, showing how traders slice performance by book, instrument, and execution venue using AI/BI dashboards and dynamic filters. Expect real-world details on slippage attribution, impermanent loss tracking, and cross-regime performance comparison.
This meetup is not theoretical. We’ll share how desks survive volatile markets by coupling SQL Warehouses, Photon acceleration, and low-latency streaming pipelines to maintain signal freshness. You’ll leave understanding how to design a scalable, fault-tolerant alpha factory where data engineering decisions directly protect Sharpe ratio, reduce execution leakage, and keep your desk trading ahead of regime shifts—not reacting to them.
[S1] big-book-data-engineering
https://www.databricks.com/sites/default/files/2025-11/big-book-data-engineering.pdf

After three decades on the sell-side and buy-side trading desks—from voice brokerage to fully automated cross-asset execution—we moved our alpha pipelines onto Databricks running on AWS. The desk consumes tick data, options Greeks, liquidity pool flows from DeFi AMMs, and factor signals (IC, IR, breadth) streamed into Delta Lake. Traders rely on AI/BI dashboards for intraday risk—PnL explain, VaR drift, and drawdown monitoring during regime shifts. In production, filters must slice by book, instrument, and execution venue. Without flexible filtering, you cannot isolate slippage, track impermanent loss in liquidity pools, or compare strategy performance across bull, bear, and sideways regimes.
What filter types are available when building an Databricks AI/BI dashboard?
A Global, page level, and widget level filters
B Global and page level filters
C Page and widget level filters
D Global filters only
Answer: A
Rationale:
A is correct. Databricks AI/BI dashboards support global, page-level, and widget-level filters to enable granular slicing of trading metrics such as PnL, risk, and liquidity exposures across portfolios.
B is incorrect. This omits widget-level filters, which are critical for isolating specific trading signals.
C is incorrect. It excludes global filters, which are needed for consistent filtering across all dashboard views.
D is incorrect. Only global filters are insufficient for detailed analysis at desk or instrument level.
How does AI/BI Genie empower trading desk users to explore data?
A By allowing interaction using natural language chats
B By requiring predefined SQL queries only
C By enforcing Python scripting for all interactions
D By integrating only external BI tools
Answer: A
Rationale:
A is correct. Genie enables natural language interaction, allowing traders to quickly query exposures such as “show drawdown by strategy during high-volatility regimes” without writing SQL.
B is incorrect. Genie's value is reducing reliance on predefined queries.
C is incorrect. Python scripting is not required for basic interaction.
D is incorrect. Genie is native to Databricks and does not rely solely on external BI tools.
What is the correct order of the Databricks SQL namespace hierarchy?
A Catalog → Schema → Table
B Schema → Table → Catalog
C Table → Schema → Catalog
D Workspace → Schema → Table
Answer: A
Rationale:
A is correct. Unity Catalog organizes data in a three-level hierarchy: Catalog → Schema → Table, enabling governance across trading datasets such as tick data and factor models.
B is incorrect. The order is reversed.
C is incorrect. Tables do not sit above schema or catalog.
D is incorrect. Workspace is not part of the SQL namespace hierarchy.
On a multi-asset hedge fund desk running systematic strategies, we validate signals using IC, ICIR, and regime-conditioned backtests. Data pipelines ingest market data, DeFi liquidity pool metrics, and execution logs into Lakehouse architecture. Analysts need rapid dashboard refresh and governed access to UC tables for audit and compliance. When publishing dashboards to PMs and risk committees, correctness of permissions, refresh cadence, and dataset lifecycle is critical. From monitoring slippage to detecting alpha decay, dashboards act as real-time control towers. Efficient dataset management ensures traders react before liquidity dries up or volatility regimes shift, avoiding catastrophic drawdowns.
What must a data analyst do before sharing a dashboard with stakeholders?
A Publish the dashboard
B Clone the draft version
C Configure refresh schedules
D Validate SQL queries
Answer: A
Rationale:
A is correct. Publishing the dashboard makes it available to stakeholders such as portfolio managers and risk teams.
B is incorrect. Cloning does not make the dashboard accessible.
C is incorrect. Refresh schedules are important but not required before sharing.
D is incorrect. Query validation is useful but not a prerequisite for sharing.
Which tab allows defining a dataset using SQL in Databricks dashboards?
A Data
B Genie
C Filters
D Visualizations
Answer: A
Rationale:
A is correct. The Data tab is where SQL queries define datasets for dashboards.
B is incorrect. Genie is used for conversational queries, not dataset definition.
C is incorrect. Filters modify views, not datasets.
D is incorrect. Visualizations display data but do not define datasets.
How are permissions handled when using a Unity Catalog table in dashboards?
A Unity Catalog governance controls permissions
B Dashboard-level permissions override all
C Permissions are ignored
D Full access is granted by default
Answer: A
Rationale:
A is correct. Unity Catalog enforces centralized governance, ensuring compliance with trading desk data policies.
B is incorrect. Dashboard permissions do not override UC governance.
C is incorrect. Permissions are strictly enforced.
D is incorrect. Access must be explicitly granted.
Running real-time risk and PnL dashboards on AWS-backed Databricks requires tight integration between compute and SQL warehouses. On the desk, we run intraday factor recalcs, VaR shocks, and liquidity stress scenarios. Genie Spaces must be backed by compute resources capable of handling high concurrency queries. Traders frequently onboard new datasets—from order book depth to DeFi yield signals—and must explore them before building visualizations. In fast markets, publishing stale dashboards is equivalent to trading blind; data freshness and compute coupling define whether you catch a regime shift or miss it entirely.
Which compute resource must be associated when creating a Genie Space?
A SQL warehouse
B All-purpose cluster
C External location
D Metastore
Answer: A
Rationale:
A is correct. SQL warehouses power query execution for BI and Genie interactions.
B is incorrect. All-purpose clusters are not required for Genie Spaces.
C is incorrect. External locations relate to storage, not compute.
D is incorrect. Metastore handles metadata, not execution.
What is the first step when enhancing a dashboard with a new dataset?
A Locate and explore the dataset
B Create visualizations
C Publish dashboard
D Create Genie Space
Answer: A
Rationale:
A is correct. Analysts must first understand the dataset—similar to validating signal quality before deploying trading strategies.
B is incorrect. Visualization comes after understanding data.
C is incorrect. Publishing occurs later.
D is incorrect. Genie Space is not required for dataset onboarding.
How is dashboard data updated in Databricks?
A Refreshed on demand or on a schedule
B Automatically by Unity Catalog
C Only by manual SQL execution
D Only when cloning dashboard
Answer: A
Rationale:
A is correct. Data refresh can be scheduled or triggered, ensuring traders view up-to-date risk and performance metrics.
B is incorrect. Unity Catalog does not refresh data.
C is incorrect. Manual refresh is not the only method.
D is incorrect. Cloning does not refresh datasets.
In quantitative research, we constantly optimize queries to investigate why execution costs spike or why IC deteriorates. Using Photon engine, caching, and Z-ordering, we accelerate queries on billions of rows—tick data, options chains, and DeFi transaction logs. On AWS-backed Databricks, performance tuning directly impacts time-to-decision. During volatile markets, delays in query execution can lead to slippage or missed arbitrage. The Lakehouse architecture allows unified processing of batch and streaming data, which is critical when managing hybrid CeFi and DeFi portfolios across multiple liquidity venues.
What helps optimize query performance in Databricks?
A Photon engine, caching, and clustering
B Increasing number of dashboards
C Using only notebooks
D Disabling autoscaling
Answer: A
Rationale:
A is correct. Photon, caching, and clustering improve query performance, critical for real-time analytics.
B is incorrect. Dashboards do not improve performance.
C is incorrect. Notebooks are not a performance optimization tool.
D is incorrect. Autoscaling improves efficiency, not performance reduction.
Databricks
BDR
Frontier Capital
Senior Asset Manager
AWS Builder Center
Trader Builder Group Leader (TBGL)
Frontier Capital
Macro Trader
Tech for Trading
Dev
AWS
User Group Leader, Guangzhou
Investor Group
Financial Officer ASZEN
Tech for Trading
Dev
Contact Us