HPC Network Engineering Manager - AI Infrastructure
Remote, Remote, Brazil • Posted June 12, 2026
Job Type:
Full-time
Location:
Remote, Remote
Posted:
June 12, 2026
Category:
Redes e sistemas
Application Deadline:
July 22, 2026
Role Description
We are seeking an HPC Network Engineering Manager - AI Infrastructure to guide architecture and technical direction for AI research and Kubernetes-based GPU infrastructure. You will steer standards for InfiniBand/RDMA, Ethernet, Kubernetes networking, SmartNIC/DPU, and observability across large programs while mentoring senior engineers. Join us to shape reliable, scalable network platforms for massive distributed AI workloads—apply now.
Responsibilities
- Define and own a multi-year architectural vision and roadmap for InfiniBand/RDMA and high-speed Ethernet fabrics supporting massive GPU clusters and distributed AI/LLM workloads across the client portfolio
- Govern evaluation and standardization of cluster network topologies such as Fat-tree, Clos, Rail-optimized, and Dragonfly, and set decision frameworks aligned to scale, performance, and cost constraints
- Establish and enforce engineering standards for host-sid...
Interested in this role?
Click the button below to start your application for HPC Network Engineering Manager - AI Infrastructure at EPAM Systems.
Apply Now