Virta Health Logo

Virta Health

Senior Site Reliability Engineer

Posted 17 Days Ago
Remote
Hiring Remotely in USA
167K-216K
Senior level
Remote
Hiring Remotely in USA
167K-216K
Senior level
As a Senior Site Reliability Engineer at Virta Health, you'll build automation and tooling for reliability, enhance observability, and mentor engineering teams in best practices.
The summary above was generated by AI

Virta Health is on a mission to transform diabetes care and reverse the type 2 diabetes epidemic. Current treatment approaches aren’t working—over half of US adults have either type 2 diabetes or prediabetes. Virta is changing this by helping people reverse type 2 diabetes through innovations in technology, personalized nutrition, and virtual care delivery reinvented from the ground up. We have raised over $350 million from top-tier investors, and partner with the largest health plans, employers, and government organizations to help their employees and members restore their health and live diabetes-free. Join us on our mission to reverse diabetes in 100M.

As an SRE on the Infrastructure team at Virta, you will be building the foundation that will help our company move as fast as possible while meeting security and compliance requirements. Key projects for the team over the next two quarters include:

  • Implement an AI‑driven observability and metrics platform that automatically detects anomalies and highlights SLO risks, enabling product teams to make data‑driven decisions.

  • Enhancing system observability, reliability, and efficiency using off-the-shelf technology combined with internal tools developed in Python and Go to increase transparency and visibility into our systems as well as centralizing data.

  • Building out more products for our Product Development teams like observability (SLOs, alerting, dashboards) modules to allow them to spin up an MVP out of the box.

  • Improving incident readiness with better tooling and the right hygiene practices such as game days.

  • Engage with feature development teams in toil reduction exercises, capacity planning, load testing, SLO process, and other best practices — partnering with product teams to replace manual capacity planning with predictive/AI-driven scaling models and to codify self-healing runbooks that minimize toil

  • Improving the velocity and quality of our developer platform and tooling

  • General AI fluency desired: comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements

We are in the midst of re-defining our incident response tooling/strategy, improving test tooling, and developing a strategy to ensure all applications are performant and available. Joining Virta would make you one of the key people defining and driving the future vision of what reliability and observability should look like.

Responsibilities
  • Ship automation and tooling that reduces toil, with high-quality, well-structured code.

  • Design and codify self-healing workflows and guardrails to minimize toil and improve reliability.

  • Steward SLO dashboards enhanced with AI/ML-assisted insights, leveraging AIOps-style observability to surface anomalies, predict error-budget burn, and improve signal quality across golden signals

  • Integrate load-testing into reliability engineering efforts, ensuring outcomes directly inform SLOs, scaling strategies, and capacity planning.

  • Partner with product teams to replace manual capacity planning with predictive/AI-driven scaling models and implement burn-rate based alerting.

  • Coach and mentor engineers; champion best practices and pragmatic reliability trade-offs.

90 Day Plan

Within your first 90 days at Virta, we expect you will do the following:

  • Teach and inspire other engineering team members through knowledge sharing, pair programming, and giving feedback during code reviews

  • Propose and implement one or more process improvements related to reliability and observability to make our engineering team even better

  • Deliver a proof-of-concept for an AIOps initiative, demonstrating how a manual reliability or observability process can be transformed into automation to reduce toil and improve insight

Must-Haves
  • Highly proficient in shipping backend code in high-quality production environments, with strong hands-on coding and automation expertise, and a deep understanding of reliability and production readiness practices

  • Hands-on expertise with automation and infrastructure-as-code (Terraform modules preferred), ideally with experience in observability

  • Experience designing and implementing highly observable, scalable systems — with a proven track record configuring AIOps / ML-based monitoring platforms — that support large numbers of users while reducing operational burden

  • Applied and general AI fluency: ability to leverage AI/ML-assisted observability (e.g., anomaly detection, error-budget burn prediction) while also being comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements

  • Growth mindset and craftsmanship: ability to coach, mentor, and evangelize AI-first insights while continually improving engineering practices and following best practices

Values-driven culture

Virta’s company values drive our culture, so you’ll do well if:

  • You put people first and take care of yourself, your peers, and our patients equally

  • You have a strong sense of ownership and take initiative while empowering others to do the same

  • You prioritize positive impact over busy work

  • You have no ego and understand that everyone has something to bring to the table regardless of experience

  • You appreciate transparency and promote trust and empowerment through open access of information

  • You are evidence-based and prioritize data and science over seniority or dogma

  • You take risks and rapidly iterate

Is this role not quite what you're looking for? Join our Talent Community and follow us on Linkedin to stay connected!

As part of your duties at Virta, you may come in contact with sensitive patient information that is governed by HIPAA. Throughout your career at Virta, you will be expected to follow Virta's security and privacy procedures to ensure our patients' information remains strictly confidential. Security and privacy training will be provided.

Virta has a location based compensation structure. Starting pay will be based on a number of factors and commensurate with qualifications & experience. For this role, the compensation range is [min of $167,249 - $216,000. Information about Virta’s benefits is on our Careers page at: https://www.virtahealth.com/careers.

As part of your duties at Virta, you may come in contact with sensitive patient information that is governed by HIPAA. Throughout your career at Virta, you will be expected to follow Virta's security and privacy procedures to ensure our patients' information remains strictly confidential. Security and privacy training will be provided.

As a remote-first company, our team is spread across various locations with office hubs in Denver and San Francisco.
Clinical roles: We currently do not hire in the following states: AK, HI, RI
Corporate roles: We currently do not hire in the following states: AK, AR, DE, HI, ME, MS, NM, OK, SD, VT, WI.

#LI-remote

Top Skills

AI
Aiops
Go
Ml
Python
Terraform

Similar Jobs

7 Days Ago
Remote
USA
134K-214K Annually
Mid level
134K-214K Annually
Mid level
Cloud • Fintech • Food • Information Technology • Software • Hospitality
The Sr. Site Reliability Engineer will automate incident and change management processes, optimize efficiency, and collaborate with stakeholders to maintain reliability at Toast.
Top Skills: AWSAzureFirehydrantGCPGoJIRAPythonTerraform
15 Days Ago
Remote
DC, USA
Senior level
Senior level
Healthtech • Software
As a Senior Site Reliability Engineer, you'll design, implement, and maintain infrastructure for software applications, ensuring system performance and collaborating with engineering and operations teams.
Top Skills: AnsibleAWSAws CdkBashChefCloudwatchDatadogDockerEc2JavaScriptNode.jsPuppetPythonRdsS3TypescriptVpc
20 Days Ago
Remote
United States
140K-165K Annually
Senior level
140K-165K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
The Senior Site Reliability Engineer will enhance system reliability and observability, support cloud deployment optimizations, provide mentorship, and improve incident management while ensuring software quality and operational integrity.
Top Skills: AWSAzureDatadogDockerEc2GCPGoKibanaKubernetesRubyTerraform

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account