Sully.ai Logo

Sully.ai

Senior AI Systems Engineer (LLM Inference & Infra Optimization)

Posted Yesterday
Remote
Hiring Remotely in US
Senior level
Remote
Hiring Remotely in US
Senior level
Lead efforts in deploying and optimizing large language models on GPU hardware, optimizing inference pipelines and managing multi-cloud infrastructures.
The summary above was generated by AI
About Us

At Sully.ai, we’re building cutting-edge AI-native infrastructure to power real-time, intelligent healthcare applications. Our team operates at the intersection of high-performance computing, ML systems, and cloud infrastructure — optimizing inference pipelines to support next-generation multimodal AI agents. We're looking for a deeply technical engineer who thrives at the systems level and loves building performant, scalable infrastructure.

The Role

We’re looking for a senior-level engineer to lead efforts in deploying and optimizing large language models on high-end GPU hardware and building the infrastructure that supports them. You'll work across the stack — from C++ and CUDA kernels to Python APIs — while also shaping our DevOps practices for scalable, multi-cloud deployments. This role blends systems performance, ML inference, and infrastructure-as-code to deliver low-latency, production-grade AI services.

What You’ll Do
  • LLM Inference Optimization: Develop and optimize inference pipelines using quantization, attention caching, speculative decoding, and memory-efficient serving.

  • Systems Programming: Build and maintain low-level modules in C++/CUDA/NCCL to squeeze the most out of GPUs and high-throughput architectures.

  • DevOps & Infrastructure Engineering: Stand up and manage multi-cloud environments using modern IaC frameworks such as Pulumi or Terraform. Automate infrastructure provisioning, deployment pipelines, and GPU fleet orchestration.

  • Real-Time Architectures: Design low-latency streaming and decision-support systems leveraging embedding models, VRAM token caches, and fast interconnects.

  • Developer Enablement: Build robust tooling, interfaces, and sandbox environments so that other engineers can contribute safely to the ML systems layer.

What We’re Looking For
  • Proficiency in C++, CUDA, and Python with experience in systems or ML infrastructure engineering.

  • Deep understanding of GPU architectures, inference optimization, and large model serving techniques.

  • Hands-on experience with multi-cloud environments (GCP, AWS, etc.) and infrastructure-as-code tools such as Pulumi, Terraform, or similar.

  • Familiarity with ML deployment frameworks (TensorRT, vLLM, DeepSpeed, Hugging Face Transformers, etc.).

  • Comfortable with DevOps workflows, containerization (Docker), CI/CD, and distributed system debugging.

  • (Bonus) Experience with streaming embeddings, semantic search, or hybrid retrieval architectures.

  • (Bonus) Interest in building tools that democratize high-performance systems for broader engineering teams.

Why Join Us
  • Collaborate with a highly technical team solving hard problems at the edge of AI and healthcare.

  • Work with bleeding-edge GPU infrastructure and build systems that push what's possible.

  • Be a foundational part of shaping AI-native infrastructure for real-time, mission-critical applications.

  • Help accelerate a meaningful product that improves how clinicians work and patients are cared for.

Sully.ai is an equal opportunity employer. In addition to EEO being the law, it is a policy that is fully consistent with our principles. All qualified applicants will receive consideration for employment without regard to status as a protected veteran or a qualified individual with a disability, or other protected status such as race, religion, color, national origin, sex, sexual orientation, gender identity, genetic information, pregnancy or age. Sully.ai prohibits any form of workplace harassment. 

Top Skills

C++
Cuda
Deepspeed
Docker
Hugging Face Transformers
Pulumi
Python
Tensorrt
Terraform
Vllm

Similar Jobs

6 Minutes Ago
In-Office or Remote
2 Locations
110K-129K
Mid level
110K-129K
Mid level
Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing
The Color Management Lead will architect and oversee all aspects of color management in dental prosthetic workflows, ensuring accuracy and scalability through R&D and production. Responsibilities include developing color management systems, establishing measurement protocols, leading cross-functional teams, and maintaining quality control.
Top Skills: BabelcolorMatlabProfilemakerPythonRX-Rite Tools
6 Minutes Ago
Remote
2 Locations
Junior
Junior
Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing
Transform digital dental scans into precise implant designs, ensuring accuracy and quality while collaborating with the team and meeting deadlines.
Top Skills: 3Shape
7 Minutes Ago
Remote
USA
65K-80K
Mid level
65K-80K
Mid level
eCommerce • Retail
As a Graphic Designer, you will produce high-quality designs, manage the design process, collaborate with teams, and elevate the brand's visual identity.
Top Skills: Adobe Creative SuiteFigmaIllustratorIndesignPhotoshop

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account