CyrusOne

Senior Reliability Engineer

Sorry, this job was removed at 12:01 a.m. (EST) on Wednesday, Jan 21, 2026

Be an Early Applicant

Remote

Hiring Remotely in USA

140K-170K Annually

Remote

Hiring Remotely in USA

140K-170K Annually

Similar Jobs

Coinbase

Senior Site Reliability Engineer

7 Days Ago

Easy Apply

Remote

USA

Easy Apply

186K-219K Annually

Senior level

186K-219K Annually

Senior level

Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3

The role involves supporting network infrastructure, automating cloud infrastructure, managing CI/CD workflows, and ensuring operational excellence in IT support, including incident response and security practices.

Top Skills: AnsibleAWSBashDockerGitKubernetesPythonRubyTerraform

GitLab

Senior Site Reliability Engineer

8 Days Ago

Easy Apply

In-Office or Remote

Canada, KS, USA

Easy Apply

124K-266K Annually

Senior level

124K-266K Annually

Senior level

Cloud • Security • Software • Cybersecurity • Automation

As a Senior Site Reliability Engineer, you'll automate and manage GitLab environments, ensuring reliability, and scalability while troubleshooting production issues, and improving operational efficiency.

Top Skills: AnsibleCloud Services (AwsElkGcp)GoGrafanaHelm ChartsKubernetesOmnibus-GitlabPrometheusRubyTerraform

GitLab

Senior Site Reliability Engineer

9 Days Ago

Easy Apply

In-Office or Remote

Canada, KS, USA

Easy Apply

124K-266K Annually

Senior level

124K-266K Annually

Senior level

Cloud • Security • Software • Cybersecurity • Automation

As a Senior Site Reliability Engineer, you will enhance GitLab's PostgreSQL infrastructure, automate operational tasks, and ensure reliability and performance in high-scale environments, mentoring team members along the way.

Top Skills: AnsibleChefGitlab ChatopsGoKubernetesPostgresRubyTerraform

The Senior Reliability Engineer serves as a subject-matter expert and strategic technical authority for infrastructure reliability across a portfolio of mission-critical data center sites. This role leads the design, governance, and continuous improvement of reliability strategies for power, cooling, and control systems, applying advanced engineering judgment, analytics, and risk-based decision-making.
The Senior Reliability Engineer independently evaluates complex reliability risks, prioritizes initiatives under uncertainty, and influences operational, maintenance, and capital decisions that materially impact uptime, safety, and lifecycle cost. This role operates with minimal oversight and is expected to shape standards, mentor others, and elevate reliability capability across the organization.

Responsibilities:

Enterprise Reliability Strategy & Asset Care

Architect and govern portfolio-level, risk-based asset strategies for mission-critical power and cooling infrastructure.
Apply advanced RCM principles to define maintenance and inspection strategies aligned to failure risk, system criticality, and redundancy posture.
Evaluate and balance tradeoffs between maintenance investment, operational risk, spares coverage, redundancy, and capital replacement.
Establish and maintain enterprise PM quality standards, including audits, task effectiveness reviews, and elimination of low-value maintenance.

Operational Governance & Change Risk Management

Serve as a final technical authority for high-risk SOPs, MOPs, EOPs, and operational change packages.
Perform system-level risk assessments for planned work, incidents, and abnormal operating conditions.
Guide site teams in CMMS data integrity, work management maturity, and adherence to approved operating procedures.
Lead or oversee complex reliability investigations involving multiple systems, teams, or contributing factors.

Advanced Analytics & Condition Monitoring

Design and mature predictive condition-monitoring programs across the portfolio (oil analysis, thermography, vibration, battery monitoring, controls analytics).
Develop and interpret leading reliability indicators and degradation trends to anticipate failures before impact.
Apply statistical analysis, reliability modeling, and engineering judgment to evaluate failure likelihood and consequence.
Translate analytical insights into strategic maintenance, operational mitigations, or capital recommendations.

Critical Spares & Lifecycle Strategy

Define and govern enterprise critical spares strategies, accounting for supplier risk, lead times, and system exposure.
Identify systemic spares gaps and drive remediation plans in partnership with Supply Chain and Operations.
Lead lifecycle asset assessments to guide long-range capital planning and replacement prioritization.
Provide data-driven input to business cases supporting capital investments and infrastructure upgrades.

Incident Leadership, RCA & Continuous Improvement

Lead high-impact post-incident RCAs and FMEAs, ensuring depth of analysis beyond proximate causes.
Identify and address latent design, procedural, and organizational contributors to reliability events.
Ensure lessons learned result in durable changes to standards, procedures, maintenance strategies, or training.
Champion continuous improvement initiatives that measurably reduce risk and failure recurrence across sites.

Technical Leadership & Capability Development

Act as a mentor and technical escalation point for Reliability Engineers, site engineers, and CE leaders.
Coach teams on reliability methods, risk-based decision-making, and interpretation of condition-monitoring data.
Influence and evolve enterprise reliability standards, playbooks, and operating philosophies.
Partner with leadership to strengthen operator certification, training rigor, and operational discipline.

Qualifications:

10+ years of experience in reliability engineering, maintenance engineering, or facilities engineering within mission-critical environments.
Demonstrated leadership of complex, multi-system reliability programs with measurable business impact.
Expert-level knowledge of RCM, FMEA, RCA, and maintenance optimization methodologies.
Deep technical understanding of mission-critical infrastructure, including UPS, generators, switchgear, chillers, cooling towers, CRAH/CRAC, and BMS/EPMS.
Proven experience governing SOP/MOP/EOP programs and assessing operational change risk in live environments.
Advanced ability to analyze condition-monitoring, CMMS, and operational datasets and convert insights into strategic actions.
Proficiency in data analysis and visualization tools (Excel, Power BI, or similar).
Ability to apply statistical techniques or reliability modeling to support risk-informed decision-making under uncertainty.
Strong executive-level communication skills; able to influence senior leaders and defend technical positions.

Preferred Experience:

Experience designing and scaling enterprise critical spares and lifecycle asset management programs.
Hands-on experience with predictive analytics, failure modeling, or reliability simulations.
Proficiency with Python, R, or similar tools for advanced reliability analytics.
Working knowledge of SQL or other data query languages.
Strong familiarity with NFPA, IEEE, ASHRAE, and other relevant codes and standards.
Experience presenting reliability risk, capital tradeoffs, and investment recommendations to executive audiences.

Education & Certifications:

Bachelor’s degree in Mechanical, Electrical, or Industrial Engineering (or equivalent experience).
Preferred: CMRP, CRE, or similar advanced reliability or maintenance certification.

Work Conditions:

Supports 24×7 mission-critical operations; participates in on-call rotation and may support after-hours events.
Ability to work safely in energized environments in compliance with LOTO and NFPA 70E.
Travel to supported sites approximately 25%.

Salary range: $140,000-$170,000

CyrusOne is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, sex, sexual orientation, gender identity, religion, national origin, disability, veteran status, or other legally protected status.

CyrusOne provides reasonable accommodation for qualified individuals with disabilities in accordance with the Americans with Disabilities Act (ADA) and any other state or local laws. We will respond to requests for reasonable accommodations to assist you in applying for positions at CyrusOne, or to submit a resume.

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus