Principal High-Performance LLM Training Engineer

Santa Clara, CA, United States • Posted June 20, 2026

Job Type: Full-time
Location: Santa Clara, CA
Posted: June 20, 2026
Category: other-general
Application Deadline: June 25, 2026

Role Description

NVIDIA is seeking a Principal Engineer to drive the performance of large-scale AI training and post-training workloads across NVIDIA’s full hardware and software stack. This role sits at the intersection of distributed training, GPU architecture, systems software, deep learning frameworks, and performance engineering. You will analyze and optimize frontier-scale LLM workloads running on thousands of GPUs, drive improvements across frameworks such as PyTorch, JAX, NeMo, and NeMo RL, and use insights from real workloads to help shape future NVIDIA GPU, system, and software roadmaps.


We are looking for a deeply technical leader who can operate across abstraction layers: from application-level training behavior to framework/runtime internals, CUDA libraries, communication collectives, memory systems, networking, and GPU architecture. At this level, success means both directly improving performance directly as well as setting technical direction, raising the bar for the org...

Interested in this role?

Click the button below to start your application for Principal High-Performance LLM Training Engineer at NVIDIA.

Apply Now