Principal Software Engineer, Profiling Services
Santa Clara, CA, United States • Posted June 10, 2026
Job Type:
Full-time
Location:
Santa Clara, CA
Posted:
June 10, 2026
Category:
other-general
Application Deadline:
June 15, 2026
Role Description
Help design and ship an Always-On, low-overhead GPU profiling service that runs in production, scales across cluster environments, and delivers actionable insights for ML workloads. You will lead the architecture and hands-on delivery across system software, drivers, and CUDA to make profiling continuously available and reliable.
What you’ll be doing:
+ Design the architecture for an Always-On profiling service, defining interfaces, data flows, and scalability guarantees for multi-process/GPU/node systems.
+ Drive low-overhead, high-reliability implementations in C/C++, including IPC/shared memory, and bounded CPU/memory budgets.
+ Lead end-to-end feature delivery spanning user-mode components, driver/platform layers, and performance counter/trace providers.
+ Establish profiling models that integrate with existing ML/AI workflows (e.g., PyTorch/XLA) to turn low-level signals into actionable insights.
+ Set technical direction for an engineering team; me...
What you’ll be doing:
+ Design the architecture for an Always-On profiling service, defining interfaces, data flows, and scalability guarantees for multi-process/GPU/node systems.
+ Drive low-overhead, high-reliability implementations in C/C++, including IPC/shared memory, and bounded CPU/memory budgets.
+ Lead end-to-end feature delivery spanning user-mode components, driver/platform layers, and performance counter/trace providers.
+ Establish profiling models that integrate with existing ML/AI workflows (e.g., PyTorch/XLA) to turn low-level signals into actionable insights.
+ Set technical direction for an engineering team; me...
Interested in this role?
Click the button below to start your application for Principal Software Engineer, Profiling Services at NVIDIA.
Apply Now