TetraScience Logo

TetraScience

Senior AI Infrastructure Engineer

Posted 19 Days Ago
Remote
Hiring Remotely in United States
Senior level
Remote
Hiring Remotely in United States
Senior level
The Senior AI Infrastructure Engineer will design and maintain cloud-native infrastructure for AI/ML workflows, develop data pipelines, and collaborate across teams for performance enhancement.
The summary above was generated by AI
Description
Who We Are 

TetraScience is the Scientific Data and AI Cloud company. We are catalyzing the Scientific AI revolution by designing and industrializing AI-native scientific data sets, which we bring to life in a growing suite of next gen lab data management solutions, scientific use cases, and AI-enabled outcomes. 

TetraScience is the category leader in this vital new market, generating more revenue than all other companies in the aggregate. In the last year alone, the world’s dominant players in compute, cloud, data, and AI infrastructure have converged on TetraScience as the de facto standard, entering into co-innovation and go-to-market partnerships:

In connection with your candidacy, you will be asked to carefully review the Tetra Way letter, authored directly by Patrick Grady, our co-founder and CEO. This letter is designed to assist you in better understanding whether TetraScience is the right fit for you from a values and ethos perspective. 

It is impossible to overstate the importance of this document and you are encouraged to take it literally and reflect on whether you are aligned with our unique approach to company and team building. If you join us, you will be expected to embody its contents each day. 

What You will Do

We’re looking for a Senior AI Infrastructure Engineer to help design, build, and scale our AI and data infrastructure. In this role, you’ll focus on architecting and maintaining cloud-based MLOps pipelines to enable scalable, reliable, and production-grade AI/ML workflows, working closely with AI engineers, data engineers, and platform teams. Your expertise in building and operating modern cloud-native infrastructure will help enable world-class AI capabilities across the organization.

If you are passionate about building robust AI infrastructure, enabling rapid experimentation, and supporting production-scale AI workloads, we’d love to talk to you.

  • Design, implement, and maintain cloud-native infrastructure to support AI and data workloads, with a focus on AI and data platforms such as Databricks and AWS Bedrock.
  • Build and manage scalable data pipelines to ingest, transform, and serve data for ML and analytics.
  • Develop infrastructure-as-code using tools like Cloudformation, AWS CDK to ensure repeatable and secure deployments.
  • Collaborate with AI engineers, data engineers, and platform teams to improve the performance, reliability, and cost-efficiency of AI models in production.
  • Drive best practices for observability, including monitoring, alerting, and logging for AI platforms.
  • Contribute to the design and evolution of our AI platform to support new ML frameworks, workflows, and data types.
  • Stay current with new tools and technologies to recommend improvements to architecture and operations.
  • Integrate AI models and large language models (LLMs) into production systems to enable use cases using architectures like retrieval-augmented generation (RAG).
Requirements
  • 7+ years of professional experience in software engineering and infrastructure engineering.
  • Extensive experience building and maintaining AI/ML infrastructure in production, including model, deployment, and lifecycle management.
  • Strong knowledge of AWS and infrastructure-as-code frameworks, ideally with CDK.
  • Expert-level coding skills in TypeScript and Python building robust APIs and backend services.
  • Production-level experience with Databricks MLFlow, including model registration, versioning, asset bundles, and model serving workflows.
  • Expert level understanding of containerization (Docker), and hands on experience with  CI/CD pipelines, orchestration tools (e.g., ECS) is a plus.
  • Proven ability to design reliable, secure, and scalable infrastructure for both real-time and batch ML workloads.
  • Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members. 
  • Strong collaboration skills and the ability to partner effectively with cross-functional teams.
Nice to Have
  • Familiarity with emerging LLM frameworks such as DSPy for advanced prompt orchestration and programmatic LLM pipelines.
  • Understanding of LLM cost monitoring, latency optimization, and usage analytics in production environments.
  • Knowledge of vector databases / embeddings stores (e.g., OpenSearch) to support semantic search and RAG.
Benefits
Benefits
  • 100% employer-paid benefits for all eligible employees and immediate family members
  • Unlimited paid time off (PTO)
  • 401K
  • Flexible working arrangements - Remote work
  • Company paid Life Insurance, LTD/STD
  • A culture of continuous improvement where you can grow your career and get coaching

We are not currently providing visa sponsorship for this position.

Top Skills

AWS
Cdk
Ci/Cd
Databricks
Docker
Ecs
Mlflow
Opensearch
Python
Typescript
Vector Databases

Similar Jobs

3 Days Ago
Remote
USA
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
As a Senior Site Reliability Engineer, you will deploy and manage AI tools, ensure system reliability, and collaborate across teams to optimize AI infrastructure.
Top Skills: AnsibleAWSBashGCPGoJavaPythonTerraform
5 Days Ago
In-Office or Remote
5 Locations
184K-357K
Senior level
184K-357K
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Lead the design and implementation of GPU compute clusters for deep learning and HPC workloads, ensuring effective resource utilization, and supporting researchers' needs.
Top Skills: AIAnsibleDockerGpuHpcKubernetesLinuxLsfPodmanPuppetPythonSaltSingularitySlurm
9 Days Ago
In-Office or Remote
6 Locations
224K-426K
Senior level
224K-426K
Senior level
Artificial Intelligence • Computer Vision • Hardware • Robotics • Metaverse
Develop and maintain AI infrastructure software for large-scale systems, optimize efficiency, troubleshoot failures, and enhance NVIDIA's AI platforms.
Top Skills: C/C++ElkJaxLokiPrometheusPythonPyTorchRayTensorFlow

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account