Solutions Architect, Inference Deployments
Santa Clara, CA, United States • Posted June 17, 2026
Job Type:
Full-time
Location:
Santa Clara, CA
Posted:
June 17, 2026
Category:
other-general
Application Deadline:
June 23, 2026
Role Description
We’re forming a team of innovators to roll out and enhance AI inference solutions at scale, demonstrating NVIDIA’s GPU technology and Kubernetes. As a Solutions Architect focused on inference, you’ll collaborate closely with our engineering, DevOps, and customers to develop enterprise AI solutions. Together, we'll deliver generative AI to production!
What you'll be doing:
+ Build inference pipelines with tools like NVIDIA Dynamo, distributing tasks among GPU workers to improve efficiency.
+ Collaborate with DevOps teams to orchestrate disaggregated inference using Kubernetes for complex workloads.
+ Accelerate inference pipelines using TensorRT-LLM, vLLM, SGLang, and other backends to ensure seamless integration with disaggregated inference.
+ Provide mentorship and technical leadership to customers and internal teams, guiding them through the deployment of disaggregated inference systems and resolving complex issues.
What we need to see:...
What you'll be doing:
+ Build inference pipelines with tools like NVIDIA Dynamo, distributing tasks among GPU workers to improve efficiency.
+ Collaborate with DevOps teams to orchestrate disaggregated inference using Kubernetes for complex workloads.
+ Accelerate inference pipelines using TensorRT-LLM, vLLM, SGLang, and other backends to ensure seamless integration with disaggregated inference.
+ Provide mentorship and technical leadership to customers and internal teams, guiding them through the deployment of disaggregated inference systems and resolving complex issues.
What we need to see:...
Interested in this role?
Click the button below to start your application for Solutions Architect, Inference Deployments at NVIDIA.
Apply Now