Senior Software Engineer, Cloud-Native Stack – CSP Engagements
Santa Clara, CA, United States • Posted June 01, 2026
Job Type:
Full-time
Location:
Santa Clara, CA
Posted:
June 01, 2026
Category:
other-general
Application Deadline:
June 07, 2026
Role Description
We are developing advanced multi-rack, multi-tenant AI/ML datacenters with NVIDIA GB200, and upcoming GB300 GPUs. NVIDIA seeks a Senior Software Engineer for our CSP (Cloud Service Provider) Engagements team to focus on the cloud-native stack for datacenter products like GB200. In this role, You will define customer workflows, prototype stack enhancements, and debug the toughest Kubernetes + Slurm issues in multi-rack, multi-tenant AI datacenters. You'll tackle complex scheduling challenges across racks, tenants, and clouds as part of the CSP engagements team.
What you’ll be doing:
+ Perform deep-dive debugging of multi-rack, multi-tenant clusters: scheduler behavior, container runtime issues, device-plugin crashes, RDMA/IB fabric anomalies, etc.
+ Gather customer requirements and prototype feature extensions for Kubernetes operators, Slurm plugins, and custom micro-services that expose new GPU capabilities.
+ Drive joint architecture reviews and “whiteboard” se...
What you’ll be doing:
+ Perform deep-dive debugging of multi-rack, multi-tenant clusters: scheduler behavior, container runtime issues, device-plugin crashes, RDMA/IB fabric anomalies, etc.
+ Gather customer requirements and prototype feature extensions for Kubernetes operators, Slurm plugins, and custom micro-services that expose new GPU capabilities.
+ Drive joint architecture reviews and “whiteboard” se...
Interested in this role?
Click the button below to start your application for Senior Software Engineer, Cloud-Native Stack – CSP Engagements at NVIDIA.
Apply Now