We build and operate high-performance GPU clusters so the most ambitious teams can move fast, stay focused, and scale without friction. Our clusters power top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more.
Our team is highly motivated, and focused on providing a world class supercomputing experience. We put our customers first in everything we do, working hard to not just win the sale, but to win repeated business and customer referrals.
We hold ourselves and each other to high standards. We expect you to care deeply about the work you do, the products you build, and the experience our customers have in every interaction with us.
You must work hard, take ownership from inception to delivery, and approach every problem with an open mind and a positive attitude. We value effectiveness, competence, and a growth mindset.
About the RoleAs Head of Networking, you will lead the architecture, design, and operations of our network services that power our AI infrastructure platform. In this role, you will architect networks that move packets for frontier AI models while ensuring maximum reliability and performance through extensive automation. You will build a team that spans
You will build and lead a world-class networking team ranging from junior network engineers eager to learn high-performance computing, to senior architects who have scaled networks at hyperscalers, to specialized engineers with deep expertise in RDMA/InfiniBand for AI workloads. Your team will span network operations, architecture, automation engineering, and performance optimization roles. You'll be responsible for hiring, mentoring, and developing this team while establishing a culture of technical excellence and continuous learning
FocusBuild networks that scale beyond hundreds of thousands of GPUs.
Collaborate with compute, storage, security, and data center teams to deliver integrated infrastructure solutions
Build and lead a team of network engineers and architects focused on performance, reliability, and automation.
Automate everything. Manual processes kill velocity. Build systems that configure themselves, heal themselves, and optimize themselves. Drive automation initiatives across service deployment, provisioning, and lifecycle management
Design scalable network architectures supporting clusters from 2,000 to 200,000 GPUs
Optimize traffic patterns for AI/ML training workloads and high-performance computing
Lead the design and implementation of scalable, high-performance network architectures supporting GPU clusters and AI workloads
Establish comprehensive monitoring, alerting, and incident response procedures. Create remediation systems that detect and resolve issues before customer impact
Lead root cause analysis and implement preventive measures for network incidents
Ensure network reliability, security, and performance meet the demanding requirements of AI supercomputing workloads
Ensure compliance with data sovereignty and regulatory requirements
10+ years of experience designing and operating large-scale network infrastructure
5+ years in leadership roles at cloud providers, hyperscalers, or technology companies
Deep expertise in software-defined networking, routing protocols, and distributed network design
Proven track record scaling networks for high-throughput, low-latency workloads
Experience with AI/ML infrastructure and GPU cluster networking (RoCE / InfiniBand)
Deep understanding of internet routing, switching, peering, and distributed network design.
Expert knowledge of routing protocols (BGP, EVPN), TCP/IP, and network services (DHCP, DNS)
Proven track record of designing and operating large-scale, high-performance networks in cloud or datacenter environments
Strong knowledge of automation frameworks (e.g., Ansible, Terraform) and infrastructure-as-code principles
Experience offloading services into smart NICs and working with hardware acceleration technologies
Excellent communication skills with ability to influence technical strategy across organizations
Monitoring stacks (Prometheus, Grafana) and observability best practices
Contributions to open-source networking projects
Experience with network source of truth platforms (NetBox, Nautobot, ..) and integrating them with automation workflows
Familiarity with Kubernetes networking, overlay networks, and container networking solutions
Competitive total compensation package (cash + equity).
Retirement or pension plan, in line with local norms.
Health, dental, and vision insurance.
Generous PTO policy, in line with local norms.
Fluidstack is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Fluidstack will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
Top Skills
Similar Jobs
What you need to know about the Charlotte Tech Scene
Key Facts About Charlotte Tech
- Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
- Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
- Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
- Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
- Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
- Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus