Echo360

Senior Site Reliability Engineer

Posted 11 Days Ago

Remote

Hiring Remotely in Youngstown, OH

120K-150K Annually

Senior level

Remote

Hiring Remotely in Youngstown, OH

120K-150K Annually

Senior level

The Senior Site Reliability Engineer is responsible for ensuring cloud infrastructure reliability, scalability, and security, implementing automated monitoring systems, and optimizing performance while mentoring junior engineers.

The summary above was generated by AI

Description

Position Summary:

As a Site Reliability Engineer at Echo360, you will play a critical role in ensuring the reliability, scalability, cost, and security of our cloud infrastructure while proactively preventing incidents and maintaining adherence to SLOs and SLAs. You will design and implement automated monitoring and alerting systems to detect potential issues early, collaborate with development teams to ensure seamless deployments and rollbacks, and conduct failure testing to enhance system resilience. Leveraging your expertise in AWS services—including RDS, DynamoDB, MySQL, S3, OpenSearch, Kafka, and EKS—you will optimize performance, automate infrastructure provisioning using Terraform and CI/CD pipelines, and enforce security best practices, IAM policies, and secrets management. Your contributions will directly impact system stability, efficiency, and overall service quality.

Beyond technical excellence, you will engage in incident response, post-mortem analysis, and continuous improvement initiatives to prevent recurring issues. You will actively participate in a well-structured on-call rotation, mentor junior team members, and stay up to date on emerging technologies and best practices in site reliability engineering. With experience in monitoring tools like CloudWatch, DataDog, Prometheus, and Grafana, as well as CI/CD strategies using GitHub Actions and Jenkins, you will help drive a culture of automation and efficiency. If you thrive in a fast-paced, agile environment and are passionate about cloud cost optimization, security, and performance tuning, this role offers an exciting opportunity to make a meaningful impact on our infrastructure and engineering practices.

Requirements

The Primary Responsibilities for this role include:

Ensure service reliability and SLO/SLA adherence to production, preventing incidents by proactively conducting failure testing.
Implement automated monitoring and alerting systems for early detection of potential problems.
Collaborate with development teams to perform deployments and rollbacks with minimal disruption.
Optimize the performance and scalability of our AWS infrastructure, including RDS, DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, SES, EKS, ECS, and EC2.
Automate infrastructure provisioning and deployment processes using Terraform, CI/CD pipelines, and configuration management tools.
Proactively identify and address potential security vulnerabilities to maintain compliance, IAM best practices, and secrets management.
Participate in incident response and post-mortem analysis activities to identify root causes and prevent future occurrences.
Help onboard and mentor junior team members, sharing your knowledge and expertise.
Stay up to date on the latest cloud technologies and best practices for SRE.
Participate in a well-structured on-call rotation with other Site Reliability Engineers.
Explore new technologies and innovative solutions to improve service quality and speed to market.
Participate in technical discussions and deep dives with the other engineering and product teams.

The ideal candidate for this role will have:

5+ years of experience as a Site Reliability Engineer or similar role.
Strong understanding of AWS cloud services, including DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, EKS, ECS and EC2.
Experience with infrastructure automation tools like Ansible, Terraform, or CloudFormation.
Experience with monitoring and alerting tools like CloudWatch, DataDog, Prometheus, Grafana, Kibana, and PagerDuty.
Experience with GitHub actions, Cl/CD pipelines and deployment strategies.
Strong problem-solving and analytical skills.
Excellent communication and collaboration skills.
Ability to work independently and take ownership of complex tasks.
Passion for technology and a desire to learn and grow.
Experience with Jenkins, PostgreSQL, and MongoDB.
Experience with cloud cost optimization, security best practices and tools.
Experience working in a fast-paced, agile environment.
Experience Rancher, Cattleprod, and TeamCity a plus.

Additional Job Details

This position is FULLY REMOTE; we will consider candidates who are located in many, but not all, states within the United States. For US-based positions, candidates must be eligible to work in the United States for any employer.

The base salary range for this position is $120,000 - $150,000 annually.

Compensation may vary outside of this range depending on a number of factors, including a candidate’s qualifications, skills and experience. Base pay is one part of the Total Package that is provided to compensate and recognize employees for their work.

To ensure applications are reviewed by the appropriate team, we ask that all candidates apply directly through our job postings only. This helps us manage applications effectively and maintain a fair hiring process. We look forward to receiving your application!

About Echo360:

Echo360 is a leading provider of advanced video content management and engagement solutions to the global higher education, K-12, corporate, and government industries. Our cloud-based platform empowers instructors, students, and administrators to create, edit, share, and manage all types of video content, as well as live stream video in real time. We support a diverse range teaching and learning modalities promoting active learning and providing real-time assessments to ensure student Echo360 at .

We’re looking for individuals who can support our DNA:

Maniacally Mission Driven - We embrace our roles as agents of transformation: enabling the kind of inspired learning that changes people’s lives.

Massively Collaborative – We support each other and work together for the greater good. By joining forces, our collective potential is mighty.

Relentlessly Inventive - We see the potential to deliver breakthrough solutions and are empowered to deliver them.

Moving at the speed of bright - Velocity is something we put at the core of everything we do. Not only because technology is moving fast, but because our learners are moving even faster.

Benefits

Echo360 offers comprehensive benefits including medical, dental, vision, life & disability insurance, a 401(k) plan with company match and an unlimited PTO policy.

Echo360 Inc does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.

#LI-Remote

Top Skills

AWS

Ci/Cd

Cloudwatch

Datadog

DynamoDB

Eks

Github Actions

Grafana

Jenkins

Kafka

MySQL

Opensearch

Prometheus

Rds

Terraform

Similar Jobs

CrowdStrike

Sr. Site Reliability Engineer - GovCloud (Remote)

7 Hours Ago

Remote

95K-160K Annually

Senior level

95K-160K Annually

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

The Senior Site Reliability Engineer will ensure reliability and security in GovCloud environments, automating infrastructure and optimizing performance while maintaining compliance and leading incident response efforts.

Top Skills: Aws GovcloudAws WorkspacesAzureBashCitrixCloudFormationDatadogElkGCPGrafanaIamPamPowershellPrometheusPythonSplunkTerraformVmware Horizon

Multi Media, LLC

Senior Site Reliability Engineer

Yesterday

Remote

United States

161K-180K Annually

Senior level

161K-180K Annually

Senior level

Consumer Web • Digital Media • Information Technology • News + Entertainment • Social Media

The Senior Site Reliability Engineer will enhance infrastructure resilience, optimize system performance, and improve both physical and cloud systems while collaborating with engineering teams.

Top Skills: AnsibleCC++DockerGoJavaKubernetesPythonTerraformUnix/Linux

MongoDB

Senior Site Reliability Engineer, Development Infrastructure

2 Days Ago

Remote

United States

118K-231K Annually

Senior level

118K-231K Annually

Senior level

Big Data • Cloud • Software • Database

The Senior Site Reliability Engineer will design, implement, and enhance systems for infrastructure development, focusing on automation, reliability, and developer experience.

Top Skills: AWSAzureBazelCrossplaneGCPGithub ActionsKubernetesTerraform

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus