Leidos

Principal Site Reliability Engineer

Posted 16 Days Ago

Be an Early Applicant

Remote

2 Locations

126K-228K Annually

Expert/Leader

Remote

2 Locations

126K-228K Annually

Expert/Leader

Lead the design and implementation of scalable, reliable systems. Manage production operations, develop CI/CD pipelines, and mentor teams on SRE best practices. Collaborate with engineering teams to improve system performance and ensure operational excellence.

The summary above was generated by AI

Leidos has an opportunity within the newly created Digital Modernization Practice Area, leading Site Reliability Engineering for the Repeatable Offerings (RO) organization. The RO organization is the delivery arm of the Digital Modernization sector’s Repeatable Offerings, delivering differentiated capabilities and managed services across the sector and the larger Leidos corporation. We are seeking a Principal Site Reliability Engineer (SRE) to lead the design, implementation, and operation of scalable, highly available systems. As a subject matter expert, you will establish best practices for reliability, security, and efficiency while driving innovation in our deployment and operations strategies. You will collaborate with development teams to improve system performance, automate processes, and ensure smooth recovery in high-pressure situations.

The team is primarily located in Blacksburg, VA, and the selected candidate will be required to either be on-site in Blacksburg or will travel frequently to that location, as well as other locations are required.

Primary Responsibilities:

Lead the development and execution of SRE strategies to enhance system reliability, scalability, and efficiency.
Manage production systems and operations, ensuring robust development and implementation processes.
Oversee recovery efforts for unstable or at-risk projects, applying expertise in remediation strategies.
Design and implement microservice architectures, including orchestrators, for high-performance distributed systems.
Develop, maintain, and optimize CI/CD pipelines, infrastructure as code (IaC), and automation frameworks.
Drive adoption of best practices for horizontal and vertical scaling of microservices.
Define and implement packaging and deployment strategies to support rapid and reliable software delivery.
Collaborate with engineering teams to improve observability, monitoring, and operational excellence.
Provide technical leadership in managing containerized applications and orchestration platforms.
Mentor and guide teams on modern reliability engineering methodologies and best practices.

Basic Qualifications:

Requires BS degree and 12 – 15 years of prior relevant experience or Masters with 10 – 13 years of prior relevant experience. Additional years experience are accepted in lieu of degree.
Proven experience as a Principal SRE or equivalent role in establishing robust and reliable systems.
Expertise in managing production systems and operations, including monitoring, incident response, and performance optimization.
Strong experience with Kubernetes and container orchestration.
Deep understanding of CI/CD pipelines, infrastructure as code (IaC), Helm Charts, and Operators.
Hands-on experience in designing and implementing microservice architecture and distributed systems.
Experience leading development teams in packaging and deployment strategies.
Strong knowledge of management strategies and techniques to support SRE principles.
Must have U.S. Citizenship.
Must be able to obtain and maintain a Public Trust clearance specific to the customer.

Preferred Qualifications:

Strong experience with OpenShift in enterprise environments.
Experience with auto-scaling, self-healing architectures, and advanced resiliency strategies.
Demonstrated success in improving and recovering red/unhealthy projects.
Familiarity with service mesh technologies and distributed tracing for monitoring and observability.
Expertise in designing and implementing highly available, fault-tolerant systems at scale.
Experience working on Federal Government contracts.

Original Posting:April 8, 2025

For U.S. Positions: While subject to change based on business needs, Leidos reasonably anticipates that this job requisition will remain open for at least 3 days with an anticipated close date of no earlier than 3 days after the original posting date as listed above.

Pay Range:Pay Range $126,100.00 - $227,950.00

The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.

Top Skills

Ci/Cd

Helm Charts

Infrastructure As Code

Kubernetes

Microservice Architecture

Openshift

Similar Jobs

DFIN

Principal Site Reliability Engineer - Cloud (Remote)

9 Days Ago

Remote

Hybrid

United States

Senior level

Artificial Intelligence • Fintech • Information Technology • Software • Data Privacy

The Principal Site Reliability Engineer ensures SaaS products are fast and stable, focuses on automation, system monitoring, and collaborates with teams to improve product performance.

Top Skills: C#,.Net,Java,Harness,Azure Devops,Ansible,Jenkins,New Relic,Dynatrace,Datadog,Appdynamics,Powershell,Python,Bash,Terrraform,Sql,Cosmos,Solarwinds Database Performance Analyzer,Idera Sql Diagnostic Manager,Redgate Sql Monitor,Kubernetes,Aks,Eks

Atlassian

Principal Site Reliability Engineer

10 Days Ago

Remote

San Francisco, CA, USA

171K-274K Annually

Senior level

171K-274K Annually

Senior level

Cloud • Information Technology • Productivity • Security • Software • App development • Automation

As a Principal Site Reliability Engineer, you will enhance cloud service reliability, improve scalability, and foster cross-team collaboration to implement reliability practices.

Top Skills: AWSAzureGCPJavaNoSQLRdbms

Kunai

Principal Architect - SRE Focus

3 Days Ago

Remote

United States

Mid level

Agency • Fintech • Information Technology • Software • Consulting

The role involves modernizing systems architecture, improving stability through automation and observability, and collaborating across teams to implement benchmarks and solutions.

Top Skills: AWSBddCloud ArchitectureCucumberGherkinGoJavaJmeterJunitNew RelicOpentelemetryPager DutyPostmanPythonSeleniumSplunkTestng

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus