NVIDIA Logo

NVIDIA

System Software Engineer, Platform Compute

Posted 2 Days Ago
Be an Early Applicant
In-Office or Remote
3 Locations
168K-322K
Senior level
In-Office or Remote
3 Locations
168K-322K
Senior level
The role involves building and maintaining multi-cloud compute platforms, optimizing costs, ensuring capacity, and implementing orchestration for global operations.
The summary above was generated by AI

For over 25 years, NVIDIA has pioneered visual and accelerated computing. Today, we're defining the future of AI, equipping millions with groundbreaking tools and essential training to lead a new era of innovation. Every month, NVIDIA’s training platform enables thousands of developers around the world to advance their AI skills and excel in their life’s work. We're seeking a foundational System Software Engineer to ensure the 24/7 operation, maintenance, and scaling of a multi-cloud and multi-architecture training delivery platform across 3-4 CSPs and ~50 regions. Your work will be instrumental in managing substantial operational expenditure, optimizing cost per learner, and preventing compute capacity shortages amidst a rapidly expanding user base and potential 10x increase in training demand.

Join a close-knit team where your contributions truly matter. As a core member of our learning systems platform team, you'll partner with experts and creative educators, setting the standard for scalable, reliable learning experiences. You'll play a crucial role in making our purpose-built Learning Management System (LMS) platform a delightful and efficient tool that empowers both learners and instructors. Are you driven to build platforms that open the door to new careers? Do you thrive on creating systems that enable people to confidently apply advanced tools in their work? Want to be at the heart of making pivotal technologies like generative AI accessible and profoundly learnable for everyone? If so, let’s talk!
 

What you’ll be doing:

  • Building systems to support the maintenance, scaling, and operation of diverse, global compute platforms spanning multiple cloud providers.

  • Driving continuous cost optimization for compute resources, focusing on efficiency and expenditure management.

  • Designing and implementing flexible solutions to ensure adequate compute capacity and resource availability, support diverse workload requirements and new compute initiatives, and meet fluctuating demands.

  • Building, maintain, and optimize orchestration functions by mapping workload requirements to cloud provider capabilities, implementing workers, and refining job queue and scaling systems.

  • Managing and maintaining artifacts to establish a consistent baseline compute capability across all supported cloud providers and regions.

What we need to see:

  • Bachelor’s degree in Computer Science, a related technical field, or equivalent experience.

  • 8+ years of DevOps experience optimizing, deploying, and running heterogeneous containerized applications (Docker, Kubernetes) across trust boundaries, on AWS, Azure, and GCP, including hands-on work with EKS, AKS, and GKE.

  • Practical experience in building scalable, reliable services and distributed system integration topologies

  • Hands-on experience maintaining AWS security groups, roles, IAM, and role delegation.

  • Proficiency in Python and Linux shell scripting for automation, application development, system administration, and problem resolution.

  • Validated experience architecting, implementing, and managing cloud infrastructure using Terraform.

  • Demonstrated ability as a meticulous problem-solver with strong analytical skills, capable of rapidly diagnosing and resolving complex technical challenges.

  • Excellent communication, teamwork, and collaboration skills, with an ability to articulate technical concepts clearly to diverse audiences and lead technical responses during incidents.

Ways to stand out from the crowd:

  • Proven experience with event-driven architectures using pub/sub patterns (e.g., AWS SNS/SQS, Google Pub/Sub, Azure Service Bus).

  • Knowledge of generative AI architectures (LLMs, diffusion models) and concepts such as RAG and vector databases.

  • Hands-on experience with the NVIDIA AI stack (NeMo, Triton Inference Server, TensorRT), with Production experience with NVIDIA NIM as a strong plus.

  • Experienced in building and running CI/CD pipelines (Jenkins, GitLab CI) and applying SRE principles to automate, enhance reliability, and improve performance.

  • Familiarity with Python-based Learning Management Systems (LMS) such as Open edX as well as practical experience with highly heterogeneous compute deployments.

With competitive salaries and a generous benefits package, NVIDIA is widely considered to be one of the technology world’s most desirable employers; we have some of the most forward-thinking and hardworking people in the world working for us and, due to unparalleled growth, our best-in-class teams are rapidly growing.

#LI-Hybrid

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 264,500 USD for Level 4, and 200,000 USD - 322,000 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until September 12, 2025.NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Top Skills

AWS
Azure
Docker
GCP
Kubernetes
Linux
Python
Terraform

Similar Jobs

26 Minutes Ago
Remote
US
220K-270K Annually
Expert/Leader
220K-270K Annually
Expert/Leader
Artificial Intelligence • Fintech • Payments • Financial Services • Generative AI
The SEO Director will shape and implement a global SEO strategy, manage a team, enhance site performance, create content, and advocate SEO best practices across the company.
Top Skills: AnalyticsChatgptGoogleSeoYoutube
26 Minutes Ago
Remote
US
50K-90K Annually
Junior
50K-90K Annually
Junior
Artificial Intelligence • Fintech • Payments • Financial Services • Generative AI
You will assist new customers with their applications, performing KYC screenings, managing customer communication, and supporting quality assurance efforts.
Top Skills: Google SpreadsheetsExcelSQL
26 Minutes Ago
In-Office or Remote
3 Locations
225K-300K Annually
Senior level
225K-300K Annually
Senior level
Artificial Intelligence • Fintech • Payments • Financial Services • Generative AI
Drive product adoption and sales for Spending Management software. Collaborate with various teams, engage customers, and refine market strategies.
Top Skills: SalesSpend Management Software

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account