D-DS-FN-23 Free Sample Questions

Dell Data Science Foundations Practice Test
10/288 questions · Unlock full access
Q1

A data science team is developing a predictive model for customer churn. During the Data Preparation phase of the Data Analytics Lifecycle, they encounter a dataset with 15% missing values in the 'Last_Transaction_Date' column. The team decides that this variable is critical for the model. Which of the following is the most robust strategy for handling these missing values without introducing significant bias?

Q2

A retail company is analyzing market basket data to discover purchasing patterns. They run an association rules algorithm and find the rule {Diapers} -> {Beer} has a lift of 3.5. What is the correct interpretation of this lift value?

Q3

A data scientist is working on a text analytics project to classify news articles. After preprocessing the text, they create a Term-Document Matrix. What is the primary purpose of applying Term Frequency-Inverse Document Frequency (TF-IDF) weighting to this matrix?

Q4

A data analytics team is tasked with processing a 10TB log file to extract specific error patterns. The processing logic is complex and involves multiple stages of filtering and aggregation. Which combination of Hadoop ecosystem tools is best suited for creating a managed, multi-stage workflow for this task?

Q5

True or False: In the context of Big Data, 'Veracity' refers to the speed at which data is generated and must be processed.

Q6

A data scientist is performing an initial analysis of a dataset in R. They want to quickly get a summary of the central tendency, dispersion, and distribution shape for a continuous numerical variable named `product_cost`. Which R command would be most effective for this purpose?

Q7Multiple answers

During a project presentation to senior executives, a data scientist needs to convey the potential return on investment (ROI) of a new predictive maintenance model. Which data visualization best practices should they employ? (Select TWO)

Q8

A hospital wants to predict patient readmission risk. A data scientist builds a logistic regression model and a decision tree model. To compare their performance, they generate ROC curves for both. The Area Under the Curve (AUC) for the logistic regression is 0.85, and for the decision tree, it is 0.78. What does this comparison indicate?

Q9

A data scientist is using K-means clustering to segment customers based on their purchasing behavior. They have run the algorithm with K=3 and K=5. Which method should be used to determine the optimal number of clusters (K) for the dataset?

Q10

A financial institution is processing a massive stream of real-time transaction data. They need a tool within the Hadoop ecosystem that is specifically designed for distributed, real-time computation on large data streams. Which tool best fits this requirement?