A financial services firm has deployed a credit default prediction model into production using Watson Machine Learning. The model was trained on data from the past five years. After six months in production, the model's performance, monitored via Watson OpenScale, shows a significant drop in accuracy and a drift in the distribution of key features like 'debt-to-income ratio' and 'number of open credit lines'. The MLOps team needs to devise a strategy to address this issue. What is the most appropriate first step to diagnose and mitigate this problem?
Q2
A data science team is building a classifier to detect a rare type of manufacturing defect that occurs in only 0.5% of all products. After training a model, they generate the following confusion matrix on the test set: - True Positives (Defect correctly identified): 45 - False Positives (Good product flagged as defect): 50 - True Negatives (Good product correctly identified): 9,855 - False Negatives (Defect missed): 5 Given the high cost associated with missing a defect (a False Negative), which evaluation metric should the team prioritize to best reflect the model's effectiveness for this specific business problem?
Q3Multiple answers
An MLOps engineer is tasked with deploying a Python-based computer vision model developed in PyTorch. The deployment requirements are: portability across different cloud environments, scalability to handle variable inference loads, and integration into a larger microservices architecture. The model needs to be packaged with all its dependencies and exposed as a REST API endpoint. Which TWO technologies are most suitable for meeting these requirements? (Select TWO)
Q4
A retail company analyzes its sales data from the previous quarter to create reports showing total sales per product category and region. This analysis helps them understand what has already happened in their business. Which type of analytics is being used?
Q5
A data scientist is working with a high-dimensional dataset (200+ features) for a supervised learning task. The goal is to improve model performance and reduce training time by transforming the features into a smaller, uncorrelated set while retaining most of the original data's variance. The original features are not easily interpretable, so preserving their original form is not a priority. Which dimensionality reduction technique is most appropriate for this scenario?
Q6
True or False: The primary goal of the 'Empathize' phase in the Design Thinking process, when applied to an AI project, is to select the most performant machine learning algorithm for the business problem.
Q7
A data scientist is training a deep neural network with many layers for an image classification task. During training, they observe that the gradients for the initial layers are becoming extremely small, effectively halting the learning process for those layers. The model's overall performance has plateaued at a suboptimal level. What is the most likely cause of this issue, and what is a common technique to mitigate it?
Q8
During exploratory data analysis (EDA) in a Watson Studio notebook, a data analyst generates the following visualization for a feature named 'customer_age'. What is the most accurate interpretation of this plot? ```mermaid graph TD subgraph Box Plot for customer_age direction LR A[Min: 18] -- Q1: 28 -- B(Median: 35) -- Q3: 45 -- C[Max: 60] C -- Outlier --- D((75)) C -- Outlier --- E((82)) end ```
Q9
A hospital is developing an AI-powered diagnostic tool to predict the likelihood of patient readmission within 30 days. The model uses sensitive patient data, including demographics, medical history, and treatment details. The hospital's data governance policy requires that all patient data, both at rest and in transit, must be encrypted. The development team uses a combination of Python libraries like Pandas and Scikit-learn for data processing and modeling. What is the most critical data preparation step to ensure compliance with the hospital's data governance policy?
Q10
A consultant is advising a company on building a recommendation engine. The company has explicit user feedback data (e.g., 1-5 star ratings) and implicit feedback data (e.g., clicks, watch time). They want a model that can predict a user's rating for an item they have not yet seen. Which category of machine learning algorithm is most suitable for this task?