The Home Depot Logo

The Home Depot

Staff Software Engineer, Reliability Engineer - Observability (Remote)

Posted 3 Days Ago
Remote
Hiring Remotely in Georgia
120K-190K Annually
Mid level
Remote
Hiring Remotely in Georgia
120K-190K Annually
Mid level
The Staff Reliability Engineer will lead observability solutions, ensuring system reliability and performance while mentoring junior engineers and collaborating on technical initiatives.
The summary above was generated by AI

With a career at The Home Depot, you can be yourself and also be part of something bigger.

Position Purpose:

The Staff Reliability Engineer – Observability is responsible for leading the design, implementation, and evolution of observability solutions that ensure the reliability, performance, and efficiency of our systems. As a Staff Reliability Engineer, you will be part of a dynamic team with engineers of all experience levels who help each other build and grow technical and leadership skills while creating, deploying, and supporting production applications.
As a Staff Reliability Engineer, you are expected to build and grow the skillsets of the more junior Engineers.

Key Responsibilities:

  • 50% Delivery and Execution - Develops, tests, deploys, and maintains software, with a clear understanding of the value the software is to provide; Takes a broad view when approaching issues; using a global lens; Consistently achieves results, even under tough circumstances; Develops test suites (functional, destructive, etc) to enable success, rapid deployment of code to production; Takes on new opportunities and tough challenges with a sense of urgency, high energy and enthusiasm; Consistently achieves results, even under tough circumstances
  • 10% Learns and Grows - Actively seeks ways to grow and be challenged using both formal and informal development channels; Learns through successful and failed experiment when tackling new problems
  • 20% Plans and Aligns - Creates new and better ways for the organization to be successful; Delivers multi-mode communications that convey a clear understanding of the unique needs of different audiences; Works the Product Team to ensure user stories are developer ready, easy to understand and testable; Collaborates with other team members in agile processes; Relates openly and comfortably with diverse groups of people; Adapts approach and demeanor in real time to match the shifting demands of different situations
  • 20% Supports and Enables - Fields questions from product and engineering teams; Helps grow junior engineers by providing guidance on modern software development frameworks, and leading technical discussions; Notes gaps on the team and provides suggestions for changes to make the team more productive

Direct Manager/Direct Reports:

  • This position typically reports to Software Engineer Manager or Sr. Manager
  • This position typically has 0 Direct Reports

Travel Requirements:

  • No travel required.

Physical Requirements:

  • Most of the time is spent sitting in a comfortable position and there is frequent opportunity to move about. On rare occasions there may be a need to move or lift light articles.

Working Conditions:

  • Located in a comfortable indoor area. Any unpleasant conditions would be infrequent and not objectionable.

Minimum Qualifications:

  • Must be eighteen years of age or older.
  • Must be legally permitted to work in the United States.

Preferred Qualifications:

  • 3-5 years of relevant work experience in site reliability engineering or related field
  • Experience in monitoring and observability, including designing and implementing observability solutions using OpenTelemetry, Prometheus, and distributed tracing
  • Proficiency in cloud platforms (GCP preferred) and infrastructure as code (Terraform, Ansible)
  • Experience in programming languages such as, Go, Python, and Java
  • Experience with creating and executing unit, functional, destructive, and performance tests
  • Experience with modern debugging and root cause analysis techniques
  • Experience in designing systems for High Availability, Disaster Recovery, Performance, Efficiency, and Security
  • Experience in leading observability initiatives, including defining instrumentation standards and building monitoring dashboards
  • Hands-on experience implementing alerting thresholds and automated responses based on service level objectives (SLOs)
  • Strong experience with Kubernetes cluster management, optimization, and scaling
  • Expertise in container orchestration, including best practices for containerized application deployments and resource optimization
  • Experience designing, building, and maintaining scalable cloud infrastructure on GCP
  • Proficiency in automating routine operational tasks to reduce toil and improve efficiency
  • Familiarity with integrating observability-driven alerts with incident management systems and leading incident response efforts
  • Experience optimizing system performance, identifying and resolving bottlenecks, and conducting capacity planning
  • Knowledge of database performance tuning, query optimization, and designing application stress testing methodologies
  • Familiarity with service mesh technologies (Istio, Linkerd)

Minimum Education:

  • The knowledge, skills and abilities typically acquired through the completion of a bachelor's degree program or equivalent degree in a field of study related to the job.

Preferred Education:

  • No additional education

Minimum Years of Work Experience:

  • 3

Preferred Years of Work Experience:

  • No additional years of experience

Minimum Leadership Experience:

  • None

Preferred Leadership Experience:

  • None

Certifications:

  • None

Competencies:

  • Global Perspective
  • Manages Ambiguity
  • Nimble Learning
  • Self-Development
  • Collaborates
  • Cultivates Innovation
  • Situational Adaptability
  • Communicates Effectively
  • Drives Results
  • Interpersonal Savvy

For California, Colorado, Connecticut, Rhode Island, Nevada, New York City, Ithaca (NY), Westchester County (NY), and Washington residents:
 

The pay range for this position is between $120,000 - $190,000

Top Skills

Ansible
GCP
Go
Istio
Java
Kubernetes
Linkerd
Opentelemetry
Prometheus
Python
Terraform

Similar Jobs

An Hour Ago
Remote
Hybrid
67 Locations
167K-410K Annually
Expert/Leader
167K-410K Annually
Expert/Leader
Artificial Intelligence • Professional Services • Business Intelligence • Consulting • Cybersecurity • Generative AI
The AI Solutions Engineering Delivery Lead oversees AI solution delivery teams, managing AI model lifecycles, implementing emerging technologies, and ensuring high-quality outcomes while aligning solutions with business goals.
Top Skills: AWSAzureGoogle Cloud PlatformLangchainPandasPythonPyTorchScikit-LearnSemantic KernelSQL
An Hour Ago
Remote
USA
90K-115K
Junior
90K-115K
Junior
Artificial Intelligence • eCommerce • Food
The Fulfillment Systems Engineer optimizes the proprietary Canopy platform, implements data-driven improvements, collaborates across departments, and maintains system performance.
Top Skills: CSSDjangoDockerHTMLJavaScriptLookerPower BIPythonSQLTableau
4 Hours Ago
Remote
Hybrid
2 Locations
176K-201K Annually
Senior level
176K-201K Annually
Senior level
Fintech • Machine Learning • Payments • Software • Financial Services
The Lead Machine Learning Engineer will design and implement ML applications and systems, mentor junior developers, and optimize data pipelines while collaborating with cross-functional teams.
Top Skills: DaskData EngineeringDockerJavaMachine LearningNomadPythonPyTorchScalaScikit-LearnSparkSQLTensorFlowTransformers

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account