A financial services company is implementing a near real-time fraud detection pipeline. Transaction data arrives via a Kafka topic. The data engineering team must choose between Snowpipe and Snowpipe Streaming for ingestion. A key requirement is to minimize ingestion latency to under 5 seconds per batch of records. Which factor is the MOST critical in deciding to use Snowpipe Streaming over the traditional Snowpipe REST API?
Q2
A data science team is developing a sentiment analysis model using a custom Python library packaged as a `.whl` file. This library is not available on Anaconda or PyPI. A data engineer needs to make this library available to a Snowpark Python UDF for batch scoring. The security policy prohibits direct runtime package installation from public repositories. What is the recommended approach to securely deploy and use this custom library?
Q3
A data engineer is analyzing the query profile of a long-running query that joins a large fact table (`TRANSACTIONS`, 5TB) with several small dimension tables. The profile indicates significant remote disk I/O (65% of execution time) and poor partition pruning (90% of partitions scanned). The `TRANSACTIONS` table is clustered by `TRANSACTION_DATE`. The problematic query filters on `CUSTOMER_ID`. Which action would provide the MOST significant and targeted performance improvement for this specific query?
Q4
A data engineer has created a stream object on a `RAW_EVENTS` table to capture changes for an ELT pipeline. The stream is consumed by a task that runs every 5 minutes. The task failed to run for 3 hours due to a permission issue, which has now been resolved. The `RAW_EVENTS` table has a data retention period of 1 day. What will be the state of the stream when the task runs successfully for the first time after the outage?
Q5Multiple answers
A data architect needs to enforce column-level security on a table containing employee data, including PII like `SALARY` and `SSN`. The requirements are: 1. Analysts in the `HR_ANALYST` role should see the full, unmasked data. 2. All other roles, including `ACCOUNTADMIN`, should see masked values (e.g., 'XXX-XX-XXXX' for SSN). Which combination of objects and privileges is required to correctly implement this? (Select TWO)
Q6
True or False: When a stored procedure written in Python (using Snowpark) is called, it executes with the rights of the caller (invoker's rights), not the rights of the procedure's owner (owner's rights).
Q7Multiple answers
A data engineer needs to call an external machine learning model hosted on a cloud provider's serverless function endpoint to enrich data within a Snowflake query. The endpoint requires an API key for authentication. What Snowflake objects must be configured to enable this workflow securely? (Select THREE)
Q8
An IoT company ingests billions of small JSON events daily into an external S3 stage. The data needs to be loaded into a `RAW_EVENTS` table. A data engineer implemented a Snowpipe with auto-ingest, but the ingestion credits are significantly higher than expected. Upon investigation, the engineer finds that files are being created in S3 every few seconds, and most are under 1 MB. What is the MOST effective strategy to reduce Snowpipe costs while maintaining the continuous ingestion flow?
Q9
A data engineer is designing a development workflow. The `PROD` database is 10TB. The team needs a full, isolated copy of the `PROD` database for development (`DEV`) and another for QA (`QA`). A key requirement is to minimize storage costs. Additionally, the `DEV` database must not have a Fail-safe period. Which set of commands achieves these requirements MOST efficiently?
Q10
A data engineer needs to flatten a deeply nested JSON structure stored in a VARIANT column named `EVENT_DATA`. The structure contains an array of `transactions`, and each transaction has an array of `items`. The goal is to produce a flat table with `event_id`, `transaction_id`, and `item_id`. Which SQL construct is essential for achieving this transformation efficiently in Snowflake?