Virta Health

Senior Site Reliability Engineer

Reposted 8 Days Ago

Remote

Hiring Remotely in USA

167K-216K Annually

Senior level

Remote

Hiring Remotely in USA

167K-216K Annually

Senior level

As a Senior Site Reliability Engineer at Virta Health, you'll build automation and tooling for reliability, enhance observability, and mentor engineering teams in best practices.

The summary above was generated by AI

Virta Health is on a mission to transform diabetes care and reverse the type 2 diabetes epidemic. Current treatment approaches aren’t working—over half of US adults have either type 2 diabetes or prediabetes. Virta is changing this by helping people reverse type 2 diabetes through innovations in technology, personalized nutrition, and virtual care delivery reinvented from the ground up. We have raised over $350 million from top-tier investors, and partner with the largest health plans, employers, and government organizations to help their employees and members restore their health and live diabetes-free. Join us on our mission to reverse diabetes in 100M.

As an SRE on the Infrastructure team at Virta, you will be building the foundation that will help our company move as fast as possible while meeting security and compliance requirements. Key projects for the team over the next two quarters include:

Implement an AI‑driven observability and metrics platform that automatically detects anomalies and highlights SLO risks, enabling product teams to make data‑driven decisions.
Enhancing system observability, reliability, and efficiency using off-the-shelf technology combined with internal tools developed in Python and Go to increase transparency and visibility into our systems as well as centralizing data.
Building out more products for our Product Development teams like observability (SLOs, alerting, dashboards) modules to allow them to spin up an MVP out of the box.
Improving incident readiness with better tooling and the right hygiene practices such as game days.
Engage with feature development teams in toil reduction exercises, capacity planning, load testing, SLO process, and other best practices — partnering with product teams to replace manual capacity planning with predictive/AI-driven scaling models and to codify self-healing runbooks that minimize toil
Improving the velocity and quality of our developer platform and tooling
General AI fluency desired: comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements

We are in the midst of re-defining our incident response tooling/strategy, improving test tooling, and developing a strategy to ensure all applications are performant and available. Joining Virta would make you one of the key people defining and driving the future vision of what reliability and observability should look like.

Responsibilities

Ship automation and tooling that reduces toil, with high-quality, well-structured code.
Design and codify self-healing workflows and guardrails to minimize toil and improve reliability.
Steward SLO dashboards enhanced with AI/ML-assisted insights, leveraging AIOps-style observability to surface anomalies, predict error-budget burn, and improve signal quality across golden signals
Integrate load-testing into reliability engineering efforts, ensuring outcomes directly inform SLOs, scaling strategies, and capacity planning.
Partner with product teams to replace manual capacity planning with predictive/AI-driven scaling models and implement burn-rate based alerting.
Coach and mentor engineers; champion best practices and pragmatic reliability trade-offs.

90 Day Plan

Within your first 90 days at Virta, we expect you will do the following:

Teach and inspire other engineering team members through knowledge sharing, pair programming, and giving feedback during code reviews
Propose and implement one or more process improvements related to reliability and observability to make our engineering team even better
Deliver a proof-of-concept for an AIOps initiative, demonstrating how a manual reliability or observability process can be transformed into automation to reduce toil and improve insight

Must-Haves

Highly proficient in shipping backend code in high-quality production environments, with strong hands-on coding and automation expertise, and a deep understanding of reliability and production readiness practices
Hands-on expertise with automation and infrastructure-as-code (Terraform modules preferred), ideally with experience in observability
Experience designing and implementing highly observable, scalable systems — with a proven track record configuring AIOps / ML-based monitoring platforms — that support large numbers of users while reducing operational burden
Applied and general AI fluency: ability to leverage AI/ML-assisted observability (e.g., anomaly detection, error-budget burn prediction) while also being comfortable with concepts like prompt engineering, operational chatbots, and AI-assisted workflows to accelerate incident response and reliability improvements
Growth mindset and craftsmanship: ability to coach, mentor, and evangelize AI-first insights while continually improving engineering practices and following best practices

Values-driven culture

Virta’s company values drive our culture, so you’ll do well if:

You put people first and take care of yourself, your peers, and our patients equally
You have a strong sense of ownership and take initiative while empowering others to do the same
You prioritize positive impact over busy work
You have no ego and understand that everyone has something to bring to the table regardless of experience
You appreciate transparency and promote trust and empowerment through open access of information
You are evidence-based and prioritize data and science over seniority or dogma
You take risks and rapidly iterate

Is this role not quite what you're looking for? Join our Talent Community and follow us on Linkedin to stay connected!

As part of your duties at Virta, you may come in contact with sensitive patient information that is governed by HIPAA. Throughout your career at Virta, you will be expected to follow Virta's security and privacy procedures to ensure our patients' information remains strictly confidential. Security and privacy training will be provided.

Virta has a location based compensation structure. Starting pay will be based on a number of factors and commensurate with qualifications & experience. For this role, the compensation range is [min of $167,249 - $216,000. Information about Virta’s benefits is on our Careers page at: https://www.virtahealth.com/careers.

As a remote-first company, our team is spread across various locations with office hubs in Denver and San Francisco.
Clinical roles: We currently do not hire in the following states: AK, HI, RI
Corporate roles: We currently do not hire in the following states: AK, AR, DE, HI, ME, MS, NM, OK, SD, VT, WI.

#LI-remote

Top Skills

Aiops

Python

Terraform

Similar Jobs

Circle

Senior Site Reliability Engineer

13 Days Ago

Remote

United States of America

148K-195K Annually

Mid level

148K-195K Annually

Mid level

Blockchain • Fintech • Payments • Financial Services • Cryptocurrency • Web3

The Site Reliability Engineer will build and maintain infrastructure, improve software systems, develop scalable microservices, and ensure quality software delivery.

Top Skills: AWSGoGoogle Cloud PlatformJavaKubernetesAzureSQL

ServiceNow

Senior Site Reliability Engineer

12 Days Ago

Remote or Hybrid

San Diego, CA, USA

111K-172K Annually

Senior level

111K-172K Annually

Senior level

Artificial Intelligence • Cloud • HR Tech • Information Technology • Productivity • Software • Automation

As a Senior Site Reliability Engineer, you'll maintain and enhance the reliability and performance of ServiceNow's infrastructure, driving automation and technical resolutions across the technology stack.

Top Skills: AutomationAWSAzureCi/CdDevOpsJavaScriptLinuxMySQLPythonRuby

Capital One

Lead Software Engineer

9 Days Ago

Remote or Hybrid

McLean, VA, USA

205K-257K Annually

Senior level

205K-257K Annually

Senior level

Fintech • Machine Learning • Payments • Software • Financial Services

The role involves leading technology projects, optimizing distributed systems, collaborating on cloud-based solutions, and mentoring others while leveraging various technologies to enhance services.

Top Skills: AWSCassandraDockerGoKafkaNode.jsOpensearchPostgresPython

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus