A data scientist is developing a model to predict equipment failure in a manufacturing plant. The dataset contains sensor readings and is heavily imbalanced, with failure events representing only 0.5% of the data. The business priority is to identify as many potential failures as possible, even if it means some non-failures are incorrectly flagged. Which evaluation metric should be prioritized for model optimization?
Q2
A research team is conducting a study and wants to determine if there is a statistically significant difference in the mean test scores among three different teaching methods (A, B, and C). Which statistical test is most appropriate for this analysis?
Q3Multiple answers
A machine learning engineer is tasked with deploying a sentiment analysis model as a REST API for a high-traffic mobile application. The deployment must be scalable, easily versioned, and isolated from the underlying infrastructure. Which TWO of the following technologies are BEST suited for this requirement? (Select TWO).
Q4
True or False: In the context of deep learning, transfer learning involves initializing a new model with weights from a pre-trained model and then fine-tuning these weights on a smaller, task-specific dataset.
Q5
**Company Background** A large e-commerce enterprise, 'GlobalMart', wants to implement a personalized product recommendation system to increase customer engagement and sales. The company has a massive dataset containing millions of products, tens of millions of customers, and billions of historical interaction records (clicks, purchases, views). The data is stored in a distributed data lake. **Current Situation** GlobalMart's current recommendation system is a simple, non-personalized 'most popular items' feature, which has low effectiveness. The data science team has been tasked with building a sophisticated machine learning model. The team consists of data scientists with strong Python and ML framework skills but limited experience with large-scale data engineering and MLOps. **Requirements & Constraints** - The recommendation model must be trained daily on new interaction data. - The system must provide real-time recommendations to users browsing the website with low latency (<150ms). - The solution should leverage a managed cloud environment to minimize infrastructure management overhead. - The model must be able to handle the cold-start problem for new users and new products. - The final solution must be cost-effective at scale. Which of the following approaches provides the MOST comprehensive and effective solution for GlobalMart's requirements?
Q6
A data scientist is performing dimensionality reduction on a high-dimensional dataset for visualization purposes. The goal is to preserve the local structure and reveal underlying clusters in two dimensions. Which algorithm is most suitable for this task?
Q7
During an exploratory data analysis (EDA) of a dataset containing customer ages, a data scientist observes that the distribution is right-skewed. What does this indicate about the data?
Q8
A financial institution is using a gradient boosting model to detect fraudulent transactions. After deployment, the MLOps team notices a gradual decrease in the model's F1-score over several months. This phenomenon is commonly referred to as:
Q9
A data scientist needs to build a model to classify news articles into categories like 'Sports', 'Politics', and 'Technology'. The input data consists of the raw text of the articles. Which sequence of NLP techniques is most appropriate for preparing this text data for a machine learning model? ```mermaid flowchart TD A[Start: Raw Text] --> B{Process} B --> C[Vectorization] C --> D[Model Training] ```
Q10
In linear regression, the R-squared value represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s).