Data-Analyst-Associate Databricks Cer… Free Sample Questions

Q1

A financial services company is analyzing streaming transaction data stored in a bronze Delta table. An analyst needs to create a silver table that includes a new column, `is_flagged`, which is set to true if a transaction amount exceeds $10,000. The process must be idempotent and handle late-arriving data. Which SQL command is most appropriate for this continuous transformation?

View answers, explanations and more in the Simulator

Q2

An analyst is building a dashboard to monitor daily user engagement. A key visualization needs to show the count of active users. The underlying query for this visualization is computationally expensive. The dashboard is viewed frequently by executives, and fast load times are critical. Which feature should the analyst enable for this specific query to improve dashboard performance for all users?

View answers, explanations and more in the Simulator

Q3

True or False: In Databricks SQL, a `VIEW` always stores a physical copy of the data derived from its defining query, similar to a materialized view in other database systems.

View answers, explanations and more in the Simulator

Q4

An analyst at a logistics company needs to create a report on late shipments. The `shipments` table contains `shipment_id`, `estimated_delivery_date`, and `actual_delivery_date`. The analyst needs to add a column `delivery_status` with three possible values: 'On-Time', 'Late', or 'In-Transit'. Which of the following SQL constructs is the most appropriate and readable way to implement this logic?

View answers, explanations and more in the Simulator

Q5Multiple answers

A data governance team wants to ensure that analysts can only query a version of the `customers` table from exactly 7 days ago for a weekly compliance report, preventing access to any more recent data. Which Delta Lake feature allows for this specific type of historical data access? (Select TWO)

View answers, explanations and more in the Simulator

Q6

**Case Study:** **Company Background:** Global Retail Innovations (GRI) is a large e-commerce company that uses Databricks for all its data analytics. They follow the medallion architecture, with raw event data landing in bronze tables, cleaned and enriched data in silver tables, and aggregated business-level data in gold tables. The data analytics team primarily uses Databricks SQL to build dashboards for various departments. **Current Situation:** The marketing department has requested a new, complex dashboard to track customer lifetime value (LTV). The primary data source for this is a large silver table named `customer_transactions` with over 5 billion rows. The preliminary query developed by a junior analyst to calculate LTV is taking over 30 minutes to run, which is too slow for an interactive dashboard. The query involves multiple joins with other large dimension tables (customers, products) and uses several window functions. **Requirements:** 1. The LTV dashboard must load in under 60 seconds. 2. The solution should not require data engineers to build a new ETL pipeline if possible. 3. The solution must be cost-effective and leverage existing Databricks SQL capabilities. 4. The final data presented in the dashboard must be aggregated at the customer level. **Problem:** How should the data analyst restructure the analytics workflow to meet the performance requirements for the LTV dashboard?

View answers, explanations and more in the Simulator

Q7

A data analyst has been given a CSV file containing quarterly sales targets. The file needs to be uploaded to Databricks and queried via SQL. The analyst does not have permissions to create external locations or configure cloud storage. What is the simplest method for the analyst to upload this file and make it queryable?

View answers, explanations and more in the Simulator

Q8

When configuring a SQL warehouse, what is the primary purpose of the 'Scaling' setting?

View answers, explanations and more in the Simulator

Q9

An analyst is examining the query history to troubleshoot a slow dashboard. They notice that a specific query, which joins a large fact table with a small dimension table, is consistently taking a long time. The query profile shows a large amount of data being shuffled across the network during the join operation. Which Databricks SQL optimization technique could most effectively mitigate this issue?

View answers, explanations and more in the Simulator