Featherless AI

Machine Learning Engineer — Inference Optimization

Reposted 21 Days Ago

In-Office or Remote

Hiring Remotely in World Golf Village, FL

Mid level

In-Office or Remote

Hiring Remotely in World Golf Village, FL

Mid level

Optimize inference latency and throughput for large-scale ML models, collaborating on performance tuning, and building inference-serving systems.

The summary above was generated by AI

About the Role

We’re looking for a Machine Learning Engineer to own and push the limits of model inference performance at scale. You’ll work at the intersection of research and production—turning cutting-edge models into fast, reliable, and cost-efficient systems that serve real users.

This role is ideal for someone who enjoys deep technical work, profiling systems down to the kernel/GPU level, and translating research ideas into production-grade performance gains.

What You’ll Do

Optimize inference latency, throughput, and cost for large-scale ML models in production
Profile and bottleneck GPU/CPU inference pipelines (memory, kernels, batching, IO)
Implement and tune techniques such as:
- Quantization (fp16, bf16, int8, fp8)
- KV-cache optimization & reuse
- Speculative decoding, batching, and streaming
- Model pruning or architectural simplifications for inference
Collaborate with research engineers to productionize new model architectures
Build and maintain inference-serving systems (e.g. Triton, custom runtimes, or bespoke stacks)
Benchmark performance across hardware (NVIDIA / AMD GPUs, CPUs) and cloud setups
Improve system reliability, observability, and cost efficiency under real workloads

What We’re Looking For

Strong experience in ML inference optimization or high-performance ML systems
Solid understanding of deep learning internals (attention, memory layout, compute graphs)
Hands-on experience with PyTorch (or similar) and model deployment
Familiarity with GPU performance tuning (CUDA, ROCm, Triton, or kernel-level optimizations)
Experience scaling inference for real users (not just research benchmarks)
Comfortable working in fast-moving startup environments with ownership and ambiguity

Nice to Have

Experience with LLM or long-context model inference
Knowledge of inference frameworks (TensorRT, ONNX Runtime, vLLM, Triton)
Experience optimizing across different hardware vendors
Open-source contributions in ML systems or inference tooling
Background in distributed systems or low-latency services

Why Join Us

Real ownership over performance-critical systems
Direct impact on product reliability and unit economics
Close collaboration with research, infra, and product
Competitive compensation + meaningful equity at Series A
A team that cares about engineering quality, not hype

Top Skills

Cuda

Ml Inference Optimization

Onnx Runtime

PyTorch

Tensorrt

Triton

Similar Jobs

CrowdStrike

Technical Account Manager

59 Minutes Ago

Remote or Hybrid

USA

70K-110K Annually

Mid level

70K-110K Annually

Mid level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

As a Technical Account Manager, you will onboard customers, drive technical support, engage with clients, and lead project implementations while advocating for customer success.

Top Skills: ItilLinuxmacOSPmpWindows Server

Circle

Senior Associate, Partner Marketing

An Hour Ago

In-Office or Remote

113K-148K Annually

Senior level

113K-148K Annually

Senior level

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3

The Senior Associate, Partner Marketing will drive partner-oriented marketing strategies to increase adoption and engagement for Circle's products, managing multiple partner relationships and co-marketing campaigns.

Top Skills: BlockchainCryptoDigital AssetsFintechGoogle WorkspacemacOSPaymentsSlack

CrowdStrike

Patent Attorney (Remote)

5 Hours Ago

Remote or Hybrid

USA

Mid level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

The Patent Attorney will manage patent portfolios, assist with patent prosecution, conduct invention mining, and utilize AI-driven workflows to enhance team efficiency.

Top Skills: AILegal TechnologyPatent ProsecutionSpreadsheets

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus