10/201 questions · Unlock full access
Q1

A data center operations team is deploying a new NVIDIA DGX H100 SuperPOD. During the planning phase, they are debating cooling solutions. The primary goal is to maximize performance density while maintaining optimal operating temperatures under sustained, full-load training jobs. Which cooling technology is standard for the DGX H100 system to achieve this goal?

Q2

An MLOps engineer is using the NVIDIA Data Center GPU Manager (DCGM) to monitor a cluster of GPUs running various training workloads. They notice that several GPUs are consistently reporting high `DCGM_FI_DEV_FB_USED` values, approaching 95% of capacity. What is the most direct operational concern indicated by this specific metric?

Q3

A research institution wants to provide isolated GPU resources to multiple research teams from a single NVIDIA A100 server. Each team has small-to-medium sized workloads and does not require a full GPU. The primary requirement is hardware-level partitioning to ensure performance isolation and security between the teams' environments. Which NVIDIA technology should the administrator configure?

Q4

True or False: NVIDIA's GPUDirect Storage technology allows data to be transferred directly between local or remote NVMe storage and GPU memory, bypassing the CPU and system memory entirely.

Q5

A DevOps team is containerizing a legacy machine learning application that uses an older version of the CUDA toolkit. They are deploying this to a modern Kubernetes cluster managed by the NVIDIA GPU Operator. The new nodes have the latest NVIDIA drivers installed. Which component is responsible for ensuring that the application inside the container can communicate correctly with the host's driver, despite the potential version mismatch?

Q6Multiple answers

A financial services company is performing a Total Cost of Ownership (TCO) analysis for a new AI platform. They are comparing a large on-premises NVIDIA DGX SuperPOD deployment with a cloud-based solution using GPU instances from a major provider. Which of the following factors are typically associated with the on-premises deployment? (Select THREE)

Q7

During the training of a large language model on a multi-node cluster, a systems administrator notices that the overall job performance is much lower than benchmarked expectations. The `nvidia-smi` command shows high GPU utilization on all nodes, but network monitoring tools reveal that the InfiniBand fabric is not saturated. Which NVIDIA technology should be investigated first to find a bottleneck related to data movement between GPUs across different nodes?

Q8

What is the primary architectural difference between a CPU and a GPU that makes GPUs exceptionally well-suited for deep learning workloads?

Q9

A hospital is deploying an AI application for real-time medical image analysis. The application uses NVIDIA Clara and will be deployed on-premises to comply with data privacy regulations. The IT team needs to serve multiple concurrent inference requests with the lowest possible latency. Which NVIDIA software is specifically designed to maximize inference throughput and serve models from various frameworks like TensorFlow, PyTorch, and TensorRT?

Q10Multiple answers

An MLOps team is setting up a CI/CD pipeline for their machine learning models using MLflow and Kubernetes. They need to ensure that every time a new model is trained and registered, its performance is tracked, and it can be easily packaged for deployment. Which components of the AI Operations lifecycle are they primarily addressing? (Select TWO)