Spreedly

Senior Site Reliability Engineer

Reposted 8 Days Ago

Remote

2 Locations

50K-80K

Senior level

Remote

2 Locations

50K-80K

Senior level

The Senior Site Reliability Engineer ensures the reliability and scalability of our payments platform, leading incident management, application support, and automation efforts while mentoring team members.

The summary above was generated by AI

About Us:

Spreedly is the world's leading Open Payments Platform, sitting at the center of a network processing more than $50b of GMV annually. Spreedly's Payments Orchestration platform enables and optimizes digital transactions with the world’s most complete payment services marketplace. Built on Spreedly’s PCI-compliant architecture, our Advanced Vault solution combines a modern feature-set with rule-based configurations to optimize the vaulting experience for all stored payment methods. Global enterprises and hyper-growth companies grow their digital business faster by relying on our payments platform. Hundreds of customers worldwide secure card data in our PCI-compliant vault and use tokenized card data to enable and optimize over $45 billion of annual transaction volumes with any payment service.

Our vision is that the world is better with a diversified, inclusive payment ecosystem. Our mission is to accelerate commerce with an open, secure, and flexible payment platform that welcomes all payment participants. Our employees help us execute our vision by building a culture focused on autonomy, transparency, and collaboration in a dynamic, high-growth organization.

Product Offering:

Spreedly provides an open payments platform. The platform’s connectivity provides payments performance. Key products and services include:

Payment Gateway Integration: Connects merchants, platforms, and marketplaces to multiple payment gateways and payment services.

Tokenization: Securely stores and manages payment data with a universal tokenization service.

Transaction Routing: Enables intelligent routing of transactions to optimize success rates and costs.

Payment Vault: A secure storage solution for sensitive payment information.

Fraud Tools Integration: Integrates with various fraud prevention tools to enhance transaction security.

About the Role:

As a Senior Site Reliability Engineer (SRE) at Spreedly, you will focus on ensuring the reliability, observability, and scalability of our globally distributed payments platform. You will lead efforts to stabilize and optimize our infrastructure, build platform services, and champion best practices that enhance system performance and resilience. A strong candidate for this role brings deep experience in designing and operating highly available, scalable cloud architectures while fostering a culture of reliability across the organization.

In this role, you will leverage your expertise in software development, infrastructure, and operations to ensure our applications and systems are reliable, scalable, and efficient. You will work across the entire application stack, using a diverse range of tools and technologies to support our mission-critical system.

Responsibilities:

System Reliability & Performance: Ensure the reliability, availability, and performance of Spreedly’s globally distributed payments platform, processing $4B monthly production systems through monitoring, automation, and continuous improvement.
Application Development Support: Collaborate with development teams to improve the reliability and performance of Ruby on Rails and Elixir applications.
Observability & Monitoring: Implement and maintain robust observability solutions using Datadog and OpenTelemetry, enabling proactive identification alerting, and resolution of issues.
Incident Management: Lead incident response efforts by participating in a shared on-call rotation to maintain 24/7 system reliability, including root cause analysis, resolution, and implementing measures to prevent recurrence.
Automation & Tooling: Develop and maintain automation tools to reduce manual intervention, streamline operations, and enhance developer productivity.
Database Performance Tuning: Monitor, analyze, and optimize the performance of relational databases, identifying and resolving bottlenecks to maintain data integrity and efficiency.
Thought Leadership: Lead by example, infusing modern SRE best practices and fostering a culture of reliability and performance within the engineering organization.
Mentorship: Provide technical guidance and mentorship to team members, fostering a culture of learning and collaboration.

Requirements:

Observability Tools: Hands-on experience with Datadog, OpenTelemetry, Sentry, and Sumo Logic or similar monitoring and observability platforms, with a focus on actionable metrics and alerts.
Programming Expertise: Proficiency in a modern programming language, with a proven ability to write clean, maintainable, and efficient code. Ruby, Rails, and Elixir experience are preferred.
Cloud Infrastructure: Experience with AWS services, including EC2 (Ubuntu Linux), S3, and RDS.
Database Management: In-depth knowledge of relational databases (e.g., CockroachDB, PostgreSQL, Riak) with experience in performance optimization and query tuning. Experience with Kafka is a plus.
Architectural Mindset: Experience applying design patterns to enhance reliability, scalability, and performance in application development.
System Troubleshooting: Excellent problem-solving skills with experience diagnosing complex system issues in production environments.
Collaboration: Proven ability to work cross-functionally with product and application, infrastructure, and security engineering teams.
Communication: Strong written and verbal communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.

We Offer US-based Employees:

Competitive salary + Equity
Outstanding Medical and Dental benefits, including 100% employer-paid options
Company-paid Life and Disability insurance
Optional vision and supplemental insurance options, and various Flexible Spending Accounts (FSA)
Open Paid Time Off policy + 12 weeks of paid leave for new parents
Matching 401(k) plan (5% up to $5,000 yearly)
$1,000 annual professional development stipend
Monthly home working/digital lifestyle stipend, new MacBook, and one-time accessory reimbursement
LinkedIn Learning subscription
Access to company-paid professional coaching service
Visits to HQ in Durham, North Carolina for remote employees

#LI-AE1

Spreedly is an equal opportunity employer. We are committed to fostering, cultivating, and preserving a culture of diversity, equity, inclusion, and belonging. We actively work to drive out even unintentional discrimination in our hiring processes via practices like blindly graded work samples, structured interviews, and diversity awareness training.

Due to the sensitive nature of what Spreedly does - handling payment data - finalist candidates must complete a successful background and reference check.

At this time Spreedly is unable to provide sponsorship for employment, and we are not set up to support remote employees who reside in California or New York. In order to be considered for employment, applicants must be currently legally authorized to work in the job location country and not require future sponsorship in order to continue working in that country.

We appreciate your interest in our company. Because of the high volume of resume flow, we may only respond to those candidates that we think will be a potential fit.

Top Skills

AWS

Datadog

Docker

Elixir

Kafka

Opentelemetry

Postgres

Ruby

SQL

Similar Jobs

CrowdStrike

Sr. Site Reliability Engineer - GovCloud (Remote)

3 Days Ago

Remote

95K-160K Annually

Senior level

95K-160K Annually

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

The Senior Site Reliability Engineer will ensure reliability and security in GovCloud environments, automating infrastructure and optimizing performance while maintaining compliance and leading incident response efforts.

Top Skills: Aws GovcloudAws WorkspacesAzureBashCitrixCloudFormationDatadogElkGCPGrafanaIamPamPowershellPrometheusPythonSplunkTerraformVmware Horizon

Exabeam

Senior Site Reliability Engineer

Yesterday

Remote

Hybrid

United States

175K-190K Annually

Senior level

175K-190K Annually

Senior level

Artificial Intelligence • Information Technology • Machine Learning • Security • Software • Cybersecurity • Generative AI

Responsible for maintaining production environment reliability and availability, implementing automation for operational issues and collaborating with engineering teams on services and infrastructure improvements.

Top Skills: AWSDockerJavaKubernetesLinuxPerlPHPPythonRuby

DFIN

Senior Site Reliability Engineer - Cloud (Remote)

Yesterday

Remote

United States

Senior level

Artificial Intelligence • Fintech • Information Technology • Software • Data Privacy

The Senior Site Reliability Engineer ensures SaaS products are stable and optimized, focusing on automation, monitoring, and collaboration within teams to maintain high service quality.

Top Skills: AksAnsibleAppdynamicsAzure DevopsBashC# .NetCosmosDatadogDynatraceEksHarnessIdera Sql Diagnostic ManagerJavaJenkinsKubernetesNew RelicPowershellPythonRedgate Sql MonitorSolarwinds Database Performance AnalyzerSQLTerraform

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus