Question 1: Materialized views and refresh schedules
You are working with a large dataset containing millions of customer transactions. You need to create a view that shows monthly sales summaries by department. This summary involves complex aggregations and joins that take several minutes to compute each time they are executed. The business team needs to access this data frequently throughout the day, but the underlying transaction data is only updated once every 4 hours.
Which type of view would be most appropriate for this scenario, and what additional configuration should you implement?
A Create a materialized view with scheduled refresh every 4 hours to align with the data update frequency
B Create a materialized view with manual refresh only, refreshing it whenever the business team requests updated data
C Create a temporary view because it is faster and does not take up storage space
D Create a standard view because it will always show the most up-to-date data
Answer: A
Rationale:
A is correct. A materialized view stores precomputed results, and a 4-hour schedule matches the data update cycle.
B is incorrect. Manual refresh would create unnecessary dependency on user requests.
C is incorrect. Temporary views do not solve the repeated expensive computation problem for business users.
D is incorrect. A standard view recomputes the query each time, which is inefficient for this scenario.
Question 2: CTAS
Which SQL statement is recommended for creating Delta Lake tables from existing data in Databricks?
A CREATE TABLE _ AS SELECT (CTAS)
B DROP TABLE
C INSERT INTO
D UPDATE TABLE
Answer: A
Rationale:
A is correct. CTAS creates a table from the result of a SELECT query.
B is incorrect. DROP TABLE removes a table.
C is incorrect. INSERT INTO adds rows to an existing table.
D is incorrect. UPDATE TABLE modifies existing rows and is not used to create tables.
Question 3: Privilege grants
Which entity is responsible for granting privileges on data objects?
A Data object owner
B Billing Administrator
C Databricks Account Administrator
D Workspace Administrator
Answer: A
Rationale:
A is correct. The data object owner can grant privileges on that object.
B is incorrect. Billing administrators manage billing, not data object privileges.
C is incorrect. Account administrators manage account-level settings, not necessarily object privileges.
D is incorrect. Workspace administrators do not automatically own all data objects.
Question 4: Streaming tables prerequisite
What must you do before creating streaming tables with Databricks SQL?
A Select a SQL warehouse
B Create a dashboard
C Install a Python library
D Request admin access
Answer: A
Rationale:
A is correct. A SQL warehouse must be selected before creating streaming tables in Databricks SQL.
B is incorrect. A dashboard is not required.
C is incorrect. Installing a Python library is not required for this SQL workflow.
D is incorrect. Admin access is not the stated prerequisite.
Question 5: Default catalog and schema
What is the primary benefit of setting a default catalog and schema in Databricks?
A Avoids using the full three level namespace
B Enables cluster auto-scaling
C Limits data visibility
D Allows querying Hive tables
Answer: A
Rationale:
A is correct. Setting defaults lets users reference objects without the full catalog.schema.object name.
B is incorrect. Auto-scaling is a compute feature.
C is incorrect. Visibility is controlled by permissions, not default namespace settings.
D is incorrect. This is not primarily about querying Hive tables.
Question 6: Data lineage
In the Catalog Explorer, which metadata concept shows end-to-end flow of data?
A Data lineage
B Data access controls
C Query history
D Object owner details
Answer: A
Rationale:
A is correct. Data lineage shows how data flows through objects and processes.
B is incorrect. Access controls show permissions, not flow.
C is incorrect. Query history shows executed queries, not complete lineage.
D is incorrect. Owner details identify responsibility, not data flow.
Question 7: Unity Catalog namespace
What is the hierarchical structure of the Unity Catalog namespace?
A Catalog.Schema.Data object
B Schema.Table.Catalog
C Workspace.Table.Schema
D Metastore.Schema.Catalog
Answer: A
Rationale:
A is correct. Unity Catalog organizes data using catalog, schema, and data object.
B is incorrect. The order is wrong.
C is incorrect. Workspace is not part of the UC namespace hierarchy.
D is incorrect. Metastore is above catalogs and is not part of the three-level namespace.
Question 8: Unity Catalog management scope
Unity Catalog centrally manages security, audit, and management for what?
A Data and AI assets
B Workspace-level notebooks only
C Databricks Account identities
D Cloud provider network settings
Answer: A
Rationale:
A is correct. Unity Catalog centrally governs data and AI assets.
B is incorrect. It is not limited to workspace notebooks.
C is incorrect. Account identities are not the main managed assets described here.
D is incorrect. Cloud network settings are managed outside Unity Catalog.
Question 9: SQL Editor
You are a SQL developer who is used to working in traditional SQL editors for writing and running analytical queries. In Databricks, which tool provides the most similar experience?
A Databricks SQL Editor
B Lakeflow Spark Declarative Pipelines
C Databricks notebooks
D Lakeflow Jobs
Answer: A
Rationale:
A is correct. Databricks SQL Editor provides a traditional SQL query-writing experience.
B is incorrect. Lakeflow pipelines are for declarative data pipelines.
C is incorrect. Notebooks are broader and multi-language, not most similar to a traditional SQL editor.
D is incorrect. Lakeflow Jobs are for job orchestration.
Question 10: Programmatic exploration
A data analyst needs to programmatically explore their Databricks environment. They want to view all schemas in the “production” catalog and then see all tables in the “sales” schema within that catalog. Which SQL statements should they use in the correct order?
A SHOW SCHEMAS IN production; SHOW TABLES IN production.sales;
B USE SCHEMA sales; SHOW SCHEMAS; SHOW TABLES;
C SHOW CATALOGS; USE CATALOG production; SHOW TABLES IN sales;
D SET DEFAULT CATALOG production; SHOW SCHEMAS; SHOW TABLES IN sales;
Answer: A
Rationale:
A is correct. It first lists schemas in the production catalog, then lists tables in production.sales.
B is incorrect. It does not first specify the production catalog.
C is incorrect. It includes extra steps and does not directly show schemas in production first.
D is incorrect. SET DEFAULT CATALOG is not the direct statement pattern expected here.
Question 11: Standard views
You have created a standard view called sales_summary_vw that aggregates daily sales data from a transactions table. After creating the view, 100 new transactions are added to the transactions table. What happens when you query sales_summary_vw?
A The view automatically executes the stored query and includes the new transactions in the results
B The view returns an error because the underlying data has changed
C The view needs to be manually refreshed before it can include the new transactions
D The view returns the original data from when it was created, ignoring the new transactions
Answer: A
Rationale:
A is correct. A standard view stores the query definition and executes it against current data.
B is incorrect. Underlying data changes do not automatically cause an error.
C is incorrect. Standard views do not require manual refresh.
D is incorrect. Standard views do not store static results from creation time.
Question 12: Unity Catalog function
What is the main function of Unity Catalog in Databricks?
A To govern and manage datasets, including non-tabular files like CSVs
B To create dashboards
C To manage user accounts
D To schedule jobs
Answer: A
Rationale:
A is correct. Unity Catalog governs and manages datasets and files.
B is incorrect. Dashboards are created in Databricks SQL/AI/BI tools.
C is incorrect. User account management is not the main function described.
D is incorrect. Job scheduling is handled by workflow/job features.
Question 13: Identifying object owner
How can you programmatically identify the owner of a Databricks object?
A DESCRIBE TABLE EXTENDED statement
B QUERY PERFORMANCE statement
C SELECT * FROM INFORMATION_SCHEMA
D SHOW SCHEMAS command
Answer: A
Rationale:
A is correct. DESCRIBE TABLE EXTENDED can show detailed metadata including ownership.
B is incorrect. QUERY PERFORMANCE is not used to identify object owners.
C is incorrect. This option is too generic and not the expected method here.
D is incorrect. SHOW SCHEMAS lists schemas, not object ownership details.
Question 14: Catalog Explorer usage
What is the primary purpose of using Catalog Explorer in Databricks?
A To discover and inspect datasets for analytics
B To manage compute resources
C To write Python scripts
D To monitor cluster
Answer: A
Rationale:
A is correct. Catalog Explorer helps users discover and inspect data assets.
B is incorrect. Compute resources are managed elsewhere.
C is incorrect. Python scripts are written in notebooks or editors.
D is incorrect. Cluster monitoring is not the primary Catalog Explorer function.
Question 15: Files in Databricks volumes
Which statement best describes the types of files you can upload to a Databricks volume?
A Structured, semi structured, and unstructured files such as CSV, JSON, images, or PDFs
B Only files that will be immediately converted into Delta tables
C Only files smaller than 10 MB and in text format
D Only CSV files used for structured tables
Answer: A
Rationale:
A is correct. Volumes can store structured, semi-structured, and unstructured files.
B is incorrect. Files do not need to be immediately converted into Delta tables.
C is incorrect. The option adds unsupported restrictions.
D is incorrect. Volumes are not limited to CSV files.
Question 16: Creating a table with read_files()
You have a CSV file named customer_data.csv stored in a Unity Catalog volume at /Volumes/company/marketing/files/. The file includes a header row, and you want to create a table named customers from this file using read_files().
Which option correctly reads the CSV file and creates the table?
A
CREATE TABLE customers AS
SELECT * FROM read_files(
'/Volumes/company/marketing/files/customer_data.csv',
header => true
);
B
CREATE TABLE customers
AS SELECT * FROM read_files(/Volumes/company/marketing/files/customer_data.csv, header = True)
C
CREATE TABLE customers AS
SELECT * FROM read_files(
'/Volumes/company/marketing/files/customer_data.csv',
header = True
);
D
COPY INTO customers
FROM read_files('/Volumes/company/marketing/files/customer_data.csv')
FILEFORMAT = CSV;
Answer: A
Rationale:
A is correct. It uses CTAS with read_files(), quotes the file path, and uses the correct named-parameter syntax header => true.
B is incorrect. The file path is not quoted and the parameter syntax is incorrect.
C is incorrect. The path is quoted, but it uses incorrect parameter assignment syntax.
D is incorrect. COPY INTO is not the correct syntax shown for creating the table with read_files().
Question 17: Query performance analysis
You run a complex SQL query with an INNER JOIN and GROUP BY in Databricks SQL. The query completes successfully, but takes longer than expected. You want to understand why the query took that long and where time was spent during execution.
A Use the query performance and query profile view to inspect execution stages, joins, and aggregations
B Re-run the query multiple times and compare execution durations
C Download query execution logs and analyze them outside of Databricks
D Review warehouse monitoring metrics to understand overall system utilization
Answer: A
Rationale:
A is correct. Query performance and profile views show execution stages and bottlenecks.
B is incorrect. Re-running the query may show timing variation but not detailed execution causes.
C is incorrect. External log analysis is unnecessary for this task.
D is incorrect. Warehouse monitoring gives broad utilization, not detailed query-stage analysis.
Question 18: CASE statement
You have a table called sales_data with columns order_id, customer_id, order_amount, and sales_rep. You need to create a new column called order_category that classifies orders as follows:
“Small” for orders ≤ $50,000
“Medium” for orders between $50,001 and $200,000
“Large” for orders > $200,000
A
SELECT *,
CASE
WHEN order_amount <= 50000 THEN 'Small'
WHEN order_amount > 50000 AND order_amount <= 200000 THEN 'Medium'
ELSE 'Large'
END AS order_category
FROM sales_data;
B
SELECT *,
IF(order_amount <= 50000, 'Small',
IF(order_amount <= 200000, 'Medium', 'Large')) AS order_category
FROM sales_data;
C
SELECT *,
CATEGORIZE(order_amount, 50000, 200000, 'Small', 'Medium', 'Large') AS order_category
FROM sales_data;
D
SELECT *,
CASE order_amount
WHEN <= 50000 THEN 'Small'
WHEN <= 200000 THEN 'Medium'
ELSE 'Large'
END AS order_category
FROM sales_data;
Answer: A
Rationale:
A is correct. It uses a searched CASE expression with proper range conditions.
B is incorrect. Although nested IF logic may be possible in some SQL contexts, it is not the expected answer here.
C is incorrect. CATEGORIZE is not a standard SQL function for this task.
D is incorrect. This is invalid CASE syntax because comparison operators cannot be used this way in a simple CASE expression.
Question 19: Unity Catalog hierarchy top level
What object sits at the top level of the data access hierarchy in Unity Catalog?
A Metastore
B Workspace
C Schema
D Table
Answer: A
Rationale:
A is correct. The metastore is at the top of the Unity Catalog data access hierarchy.
B is incorrect. A workspace is not the top-level UC data access object.
C is incorrect. Schemas exist below catalogs.
D is incorrect. Tables are lower-level data objects.
Question 20: Databricks Marketplace sharing
What powers the Databricks Marketplace's ability to share data?
A Delta Sharing
B AWS S3
C REST APIs
D Unity Catalog Metastore
Answer: A
Rationale:
A is correct. Delta Sharing powers data sharing in Databricks Marketplace.
B is incorrect. AWS S3 is cloud storage, not the Marketplace sharing protocol.
C is incorrect. REST APIs may support services but are not the sharing mechanism named here.
D is incorrect. Unity Catalog Metastore supports governance but is not the sharing technology itself.