NCA-GENL Free Sample Questions

Generative AI Llm Practice Test
10/203 questions · Unlock full access
Q1

A data science team is preparing a large text dataset for fine-tuning a Llama 3 model. The dataset consists of 500GB of raw text files. The team needs to perform tokenization and data cleaning as quickly as possible. Which NVIDIA library is specifically designed for GPU-accelerated data manipulation and would be most suitable for this task?

Q2

A developer is implementing a Retrieval-Augmented Generation (RAG) system to answer questions about internal company documents. They have already generated embeddings and stored them in a vector database. Which step in the RAG pipeline immediately follows the retrieval of relevant document chunks from the vector database?

Q3

An MLOps engineer is deploying a large language model using NVIDIA Triton Inference Server. They observe that under high load, requests with long sequences are causing head-of-line blocking, increasing latency for all subsequent requests. Which Triton feature is specifically designed to mitigate this issue by processing requests out of order?

Q4Multiple answers

A hospital is developing an internal chatbot to help doctors quickly summarize patient histories. To ensure patient privacy and prevent the model from discussing off-topic subjects like celebrity gossip or financial advice, which TWO NVIDIA technologies or techniques should be implemented? (Select TWO)

Q5

True or False: Using LoRA (Low-Rank Adaptation) for fine-tuning a large language model involves updating all of the original model's weights.

Q6

A research team is fine-tuning a 70-billion parameter model on a single DGX node with 8 GPUs. The full model requires more VRAM than is available on a single GPU. To overcome this, they decide to split the model's layers across the 8 GPUs. What is this distributed training technique called?

Q7

When evaluating a text summarization model, a team calculates a score based on the overlap of n-grams between the machine-generated summary and a human-written reference summary. This metric is known as:

Q8

A developer is using the NVIDIA NeMo Framework to create a custom conversational AI application. They need to define rules for how the AI should respond to inappropriate user queries and ensure the conversation stays on a specific topic. Which NeMo component is specifically designed for this purpose?

Q9

What is the primary function of the self-attention mechanism in the Transformer architecture?

Q10

A financial firm is using a generative AI model to create market analysis reports. They are concerned that the model, trained on public data, might inadvertently generate text that is too similar to copyrighted articles, creating a legal risk. Which AI safety problem does this scenario describe?