The Hartford Financial Services Group, Inc. Logo

The Hartford Financial Services Group, Inc.

Principal Reliability Engineer - EDS

Posted 2 Days Ago
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Charlotte, NC, USA
153K-229K Annually
Expert/Leader
In-Office or Remote
Hiring Remotely in Charlotte, NC, USA
153K-229K Annually
Expert/Leader
Lead enterprise reliability strategy for data platforms and cloud infrastructure. Define RE roadmaps, SLO/SLI frameworks, AIOps/AI-driven automation, observability, and resilience patterns. Influence architecture, mentor engineers, enforce IaC/CI-CD standards, and ensure data pipeline and product reliability across AWS/GCP and platforms like Snowflake, EMR, and Hadoop/Spark.
The summary above was generated by AI
Principal Reliability Engineering - IE06JE

We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.   

         

The Enterprise Data Services (EDS) organization is seeking a Principal Reliability Engineer (Principal RE) to serve as the senior technical authority responsible for the reliability, resilience, availability, and performance of all data platforms, cloud infrastructure, data products, and data pipelines across the enterprise data organization. This role sets the strategic vision for Reliability Engineering within EDS and leads the definition, implementation, and continuous evolution of RE practices, tooling, automation, observability frameworks, and AIOps/AI‑driven operations.

As the Principal RE, you will influence architectural direction, lead large‑scale, cross‑organizational technical initiatives, and drive a culture of engineering excellence, automation‑first operations, and proactive reliability improvement. You will partner closely with platform engineering, data engineering, security, architecture, and product teams to embed RE principles into every stage of the data product lifecycle.

This role will have a Hybrid work schedule, with the expectation of working in an office (Columbus, OH, Chicago, IL, Hartford, CT or Charlotte, NC) 3 days a week (Tuesday through Thursday).

Key Responsibilities

Enterprise Reliability Strategy & Leadership

  • Work closely with the AVP, RE & Production Support, EDS defining the Reliability Engineering strategy for data platforms, data cloud environments, and data products.
  • Establish long‑term RE roadmaps, target operating models, and architectural patterns that scale with organizational growth.
  • Serve as the highest‑level technical escalation point for systemic reliability issues, influencing executive stakeholders and engineering leaders.

Platform & Cloud Reliability (AWS, GCP, Snowflake, EMR, Hadoop, ETL/ELT)

  • Leverage Enterprise provided standards and building blocks to Architect and evolve highly reliable, performant, and cost‑efficient cloud‑based platforms across AWS and GCP for all EDS services.
  • Influence and work directly with Platform Solution Architecture on new product enablement, hyper automation (end to end blueprint automation).
  • Oversee reliability controls and fail‑safe patterns for Snowflake, EMR, Hadoop/Spark clusters, container platforms (e.g., Kubernetes), and mission‑critical data systems.
  • Lead the creation and enforcement of SLO/SLI frameworks that span the entire data lifecycle.

AI‑Enabled Operations, AIOps & Intelligent Automation

  • Develop and implement AI‑driven automation for anomaly detection, alert correlation, autonomous remediation, and predictive capacity management.
  • Leverage LLMs, prompt engineering, and cloud‑native AI services (AWS Bedrock, SageMaker, Vertex AI) to build intelligent runbooks, advanced troubleshooting agents, and generative‑AI‑enabled operational tooling.
  • Champion the adoption of machine learning–based observability and reliability analytics.

End‑to‑End Observability & Operational Excellence

  • Adopt and architect enterprise‑wide data observability frameworks—including logging, metrics, tracing, distributed profiling, and event pipelines—for all data platforms and pipelines.
  • Establish gold‑standard incident response patterns, post‑incident reviews, and continuous improvement processes.
  • Drive elimination of toil across EDS, focusing on self‑healing systems, proactive detection, and autonomous operations.

Data Pipeline & Data Product Reliability

  • Define RE best practices for modern data products, governed data pipelines, real‑time/streaming systems, and operational analytics platforms.
  • Ensure data quality, data timeliness, and SLAs for data products through automated checks, lineage-informed alerting, and pipeline reliability tooling.
  • Partner with Data Engineering to embed resilience patterns (idempotency, checkpointing, replayability, disaster recovery) into pipeline architectures.

Engineering Standards, Governance & Cross‑Org Influence

  • Set and enforce standards for IaC, CI/CD, platform automation, reliability frameworks, operational readiness, and runbook quality across EDS.
  • Provide technical leadership and mentorship to Staff/Senior Engineers in the RE team and Production Support teams, influencing engineering culture and helping grow RE capabilities across the organization.
  • Represent Reliability Engineering in architectural reviews, enterprise governance forums, and executive‑level discussions.

Technical Experience

  • 10+ years in one or more of the following areas: data, cloud, platform engineering, site/reliability engineering, or large‑scale distributed systems, with experience in leadership or technology leader roles.
  • Proficiency with data or cloud platforms, including architectural patterns for resilience, networking, security, and distributed data infrastructure.
  • Deep experience supporting or engineering platforms such as Snowflake, EMR, Hadoop/Spark, Data Integration, and cloud‑native data ecosystems.
  • Scripting and programming (preferably Python) for large‑scale automation, platform tooling, and reliability frameworks.
  • Experience with Infrastructure‑as‑Code (Terraform, CloudFormation) and enterprise CI/CD.

Preferred Qualifications

  • Experience in regulated or highly complex enterprise environments (financial services, insurance, healthcare).
  • Prior experience as a Senior Staff Engineer, Engineering or Architecture leader with hands on experience, or similar senior technical role.
  • Knowledge of data governance, metadata, lineage systems, and data quality engineering practices.
  • Certifications in AWS, GCP, Kubernetes, or SRE/DevOps frameworks.

AI & AIOps

  • Background applying machine learning to operations—anomaly detection, event correlation, predictive modeling, and automated remediation.
  • Understand of AI‑enabled developer/operations tools using LLMs, prompt engineering, or cloud AI services for reliability improvements.

Observability & Platform Operations

  • Expertise with enterprise observability stacks (Prometheus, Grafana, Datadog, Splunk, Dynatrace, OpenTelemetry).
  • Ability to design and enforce advanced SLI/SLO frameworks across complex data ecosystems.

Leadership & Cross‑Functional Influence

  • Demonstrated ability to lead technical strategy at scale, influence senior engineering leaders, and set enterprise‑wide standards.
  • Strong capability in mentoring engineers, providing architectural guidance, and fostering engineering excellence.
  • Exceptional communication skills for interacting with executives, senior architects, product leaders, and engineering teams.

Candidate must be authorized to work in the US without company sponsorship. The company will not support the STEM OPT I-983 Training Plan endorsement for this position.

Compensation

The listed annualized base pay range is primarily based on analysis of similar positions in the external market. Actual base pay could vary and may be above or below the listed range based on factors including but not limited to performance, proficiency and demonstration of competencies required for the role. The base pay is just one component of The Hartford’s total compensation package for employees. Other rewards may include short-term or annual bonuses, long-term incentives, and on-the-spot recognition. The annualized base pay range for this role is:

$152,800 - $229,200

Equal Opportunity Employer/Sex/Race/Color/Veterans/Disability/Sexual Orientation/Gender Identity or Expression/Religion/Age

About Us | Our Culture | What It’s Like to Work Here | Perks & Benefits

Similar Jobs

An Hour Ago
Remote
United States
75K-126K Annually
Senior level
75K-126K Annually
Senior level
Big Data • Transportation • Analytics • Big Data Analytics
Lead monitoring and optimization of programmatic ad campaigns using real-time and historical data. Diagnose performance issues, design A/B and multivariate experiments, partner with AdOps and Data Science to operationalize ML-driven solutions, and drive automation of campaign decisioning to scale revenue and efficiency.
Top Skills: A/B TestingProgrammatic AdvertisingPythonSQL
An Hour Ago
Easy Apply
Remote
USA
Easy Apply
186K-219K Annually
Senior level
186K-219K Annually
Senior level
Artificial Intelligence • Blockchain • Fintech • Financial Services • Cryptocurrency • NFT • Web3
Design and build scalable, fault-tolerant backend payment services in Go. Own end-to-end delivery from design to production, improve reliability, observability, and performance, and collaborate with product and cross-functional teams to expand payment capabilities and integrations.
Top Skills: APIsDistributed SystemsGenerative AiGoMicroservicesObservabilityPaymentsService-Oriented Architecture
An Hour Ago
Remote or Hybrid
United States
103K-129K Annually
Mid level
103K-129K Annually
Mid level
HR Tech • Information Technology • Professional Services • Sales • Software
Provide technical support and escalation ownership for payroll workflows: investigate and resolve payroll run issues, taxes, benefits, and integrations; advise customers on payroll configuration and compliance; collaborate with Product and Engineering to translate issues into product improvements; proactively identify misconfigurations and recommend process improvements and documentation.
Top Skills: APIsCSSDatabasesHrisHTMLPayroll PlatformsScriptingSQLWebhooks

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account