A machine learning team is developing a model to predict customer churn. They are using Databricks Asset Bundles (DABs) to manage their project environments. They need to define separate configurations for development, staging, and production, including different cluster policies and secrets scopes. Which section of the `databricks.yml` file is specifically designed to manage these environment-specific overrides?
Q2
An MLOps engineer is implementing a canary deployment for a new version of a demand forecasting model using Databricks Model Serving. The goal is to route 10% of the inference traffic to the new model version (version 2) while the remaining 90% goes to the stable version (version 1). Which configuration snippet correctly implements this traffic split within a model serving endpoint definition?
Q3
A data scientist is building a SparkML pipeline to process text data for sentiment analysis. The pipeline needs to tokenize text, remove stop words, and then convert the tokens into numerical feature vectors using TF-IDF. Which sequence of SparkML transformers is correct for this task?
Q4Multiple answers
A team is building an automated retraining pipeline for a credit risk model. The pipeline should trigger a new training job whenever significant drift is detected in the model's key input features. They are using Lakehouse Monitoring to track drift. Which of the following components are essential for implementing this automated retraining workflow? (Select THREE)
Q5
An ML engineer is tasked with creating a custom PyFunc model in MLflow. This model needs to load a pre-trained tokenizer from Hugging Face and a custom-trained scikit-learn classifier. The entire model, including the tokenizer, must be packaged together for deployment to a sandboxed environment without internet access. Which MLflow feature should be used to package the tokenizer along with the model?
Q6
True or False: When using Databricks Feature Store, point-in-time correctness is automatically guaranteed for batch inference jobs without any specific configuration required in the `create_training_set` method.
Q7Multiple answers
A large-scale image classification model is being trained on Databricks. The team observes that the training process is bottlenecked by the single-node driver's ability to coordinate the workers. They decide to explore distributed hyperparameter tuning to find optimal learning rates. Which two of the following technologies are natively integrated with Databricks for distributed hyperparameter tuning and can effectively manage this workload? (Select TWO)
Q8
An ML team has configured Lakehouse Monitoring on an inference table. They receive an alert that the Population Stability Index (PSI) for a critical categorical feature, 'customer_segment', has exceeded the defined threshold. However, the drift analysis for the model's prediction and label columns shows no significant change. What is the most likely interpretation of this situation?
Q9
A financial institution is building a real-time transaction fraud detection system. Latency is critical, as predictions must be returned in under 50 milliseconds. The features for this model require complex, on-the-fly calculations based on the user's recent activity, which is not available in the batch feature store. Which Databricks solution is best suited for this requirement?
Q10
**Case Study:** A global logistics company, ShipFast, wants to build an MLOps platform on Databricks to manage hundreds of models that predict package delivery times. Their key requirements are strict separation of development, staging, and production environments, auditable model transitions, and automated testing before any model is promoted to production. The current process is manual, with data scientists promoting models via the UI, leading to inconsistent testing and accidental deployments. The MLOps team has been tasked with designing a fully automated, code-driven CI/CD pipeline. The pipeline must enforce that any model version proposed for the 'Staging' stage must first pass a suite of integration tests, including performance evaluation on a holdout dataset and a bias check. If the tests pass, the model version should be automatically transitioned to 'Staging' with a comment linking to the CI job results. The MLOps team decides to use Databricks Jobs and Model Registry webhooks. They create a multi-task job that checks out the code, runs the tests, and on success, transitions the model. They need to ensure this job is triggered securely and reliably whenever a data scientist registers a new model version. What is the most secure and robust way to architect the trigger mechanism for this validation pipeline?