ASSOCIATE-DATA-PRACTITIONER Free Sample Questions

Google Associate Data Practitioner Practice Test
10/307 questions · Unlock full access
Q1

A financial analytics firm is migrating its data warehouse to BigQuery. A critical requirement is that all new tables created within the `finance_reports` dataset must automatically enforce a 365-day data retention policy based on an ingestion-time partition. Any attempt to create a table without this specific partitioning and expiration setting should fail. Which `bq` command should be used to configure the dataset to meet this requirement?

Q2

A data engineering team uses Dataflow to process streaming IoT data. The pipeline reads from Pub/Sub, performs a complex transformation, and writes to BigQuery. During a load spike, you observe that the data freshness in BigQuery is degrading significantly, and the Pub/Sub subscription shows a growing backlog of unacknowledged messages. The Dataflow monitoring UI shows high System Latency but CPU utilization across workers remains below 50%. What is the most likely bottleneck causing this issue?

Q3

A retail company wants to analyze sales data stored in a BigQuery table named `sales_transactions`. The table contains `product_id`, `store_id`, `sale_date`, and `revenue`. The analytics team needs a report that shows the total revenue for each product, but only for products that have been sold in more than 10 unique stores. Which SQL query in BigQuery will produce the desired report?

Q4

You are tasked with ingesting a large, 10 TB dataset of historical logs from an on-premises SFTP server to a Cloud Storage bucket for archival and future analysis. The on-premises location has a reliable 1 Gbps internet connection. The transfer must be completed within 48 hours, be fully managed, and provide data integrity checks. Which Google Cloud service should you use for this one-time transfer?

Q5Multiple answers

A business analyst has created a dashboard in Looker Studio to track daily sales metrics. The dashboard connects directly to a BigQuery table. Users are reporting that the dashboard is becoming very slow and often times out, especially during peak business hours. The underlying BigQuery table is 5 TB and is not partitioned or clustered. You want to improve the dashboard's performance and reduce query costs with minimal changes to the dashboard itself. Which actions should you take? (Select TWO)

Q6

True or False: Using a customer-managed encryption key (CMEK) with Cloud Storage means that Google no longer holds any component of the encryption key, and all cryptographic operations happen outside of Google Cloud.

Q7

A healthcare organization is building a data pipeline to process patient records. The raw data arrives as JSON files in a Cloud Storage bucket. The pipeline must de-identify sensitive information like patient names and social security numbers by applying masking transformations before loading the data into BigQuery for analysis. The solution must be a fully managed, graphical, low-code service to accelerate development. The pipeline should be designed according to the following flow: ```mermaid graph TD A[GCS Bucket: Raw JSON] --> B{De-identification Pipeline}; B --> C[BigQuery Table: Anonymized Data]; ``` Which service should be used to build the de-identification pipeline (B)?

Q8

You are building a regression model in BigQuery ML to predict housing prices. After training your model, you use the `ML.EVALUATE` function and get the following output: `{'mean_absolute_error': 25000, 'r2_score': 0.85}`. What do these metrics signify about your model's performance?

Q9

An e-commerce company has a daily batch pipeline that updates product inventory in BigQuery. The pipeline is orchestrated by a Cloud Composer DAG. Recently, the DAG has been failing intermittently. You need to investigate the failures by reviewing the execution history, task logs, and the overall structure of the DAG. Which user interface should you use to perform this troubleshooting?

Q10

A new data analyst has joined your team and needs permissions to run queries on all tables within the `production_analytics` dataset in BigQuery. They also need to be able to create new tables in this dataset. However, they must not be able to delete the dataset or modify its permissions. Following the principle of least privilege, which single predefined IAM role should you grant to the analyst at the dataset level?