Zeta Global

Senior Site Reliability Engineer

Reposted 9 Hours Ago

Easy Apply

Remote or Hybrid

Hiring Remotely in United States

140K-170K Annually

Senior level

Easy Apply

Remote or Hybrid

Hiring Remotely in United States

140K-170K Annually

Senior level

The Senior Site Reliability Engineer will enhance system reliability, develop production-grade code, implement observability tools, conduct root cause analyses, and collaborate on system design for scalability.

The summary above was generated by AI

WHO WE ARE

Zeta Global (NYSE: ZETA) is the AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to make it easier for marketers to acquire, grow, and retain customers more efficiently. Through the Zeta Marketing Platform (ZMP), our vision is to make sophisticated marketing simple by unifying identity, intelligence, and omnichannel activation into a single platform – powered by one of the industry’s largest proprietary databases and AI. Our enterprise customers across multiple verticals are empowered to personalize experiences with consumers at an individual level across every channel, delivering better results for marketing programs. Zeta was founded in 2007 by David A. Steinberg and John Sculley and is headquartered in New York City with offices around the world. To learn more, go to www.zetaglobal.com.

The Role

We’re looking for an experienced Senior Site Reliability Engineer (SRE) who can write production-grade code, have mastery of SLIs, SLOs, and error budgets, and are passionate about building scalable observability systems.

If you:

Can code confidently in Python or Golang and solve real-world problems through automation. (not only scripting)
Have hands-on experience implementing SLIs, SLOs, and distributed tracing in production.
Understand Kubernetes, Terraform, and Infrastructure as Code tools.
Have hands-on experience with Chaos Engineeringand anomaly detection.
Are excited about working with high-throughput, distributed systems processing millions of transactions daily…

Then this role might be for you!

Key Responsibilities:

Design, implement, and manage SLOs, SLIs, and error budgets, ensuring reliability aligns with user expectations and business objectives.
Develop production-grade software to enhance system reliability and reduce manual toil through automation.
Implement and optimize observabilitysolutionsusing tools like OpenTelemetry, with a focus on high-cardinality metrics, distributed tracing, and actionable insights.
Drive postmortem processes and lead in-depth root cause analyses for incidents, ensuring lessons learned are effectively applied to prevent recurrence.
Define and monitor MTTx metrics (MTTA, MTTR, MTTF), using them to guide system improvements and measure reliability progress.
Design and participate in Chaos Engineering exercises.
Collaborate with engineering teams to design systems with reliability and scalability in mind, incorporating capacity planning, resiliency patterns, and modern deployment strategies (e.g., Canary, Blue-Green).
Lead design reviews for alerting strategies, ensuring effective signal-to-noise ratios in monitoring and incident management.
Advocate for and implement best practices in incident response and system design to achieveoptimaluptime and performance.

Your experience:

Strong Coding Background:

4+ years of experience as an SRE or in a similar role with hands-on coding.
3+ years of software development experience in Python or Golang, with a focus on building maintainable, production-quality code.

SRE Expertise:

Deep understanding of SRE principles, particularly SLIs, SLOs, error budgets, and their real-world application.
Hands-on experience conducting postmortems and implementing observability at scale.
Hands-on experience conducting chaos engineering exercises.

Observability Skills:

Expertise in designing and implementing end-to-end observabilitysolutions using tools like OpenTelemetry, Prometheus, Grafana, or Honeycomb.
Experience with distributed tracing and handling high-cardinality metrics in production environments.

Infrastructure Knowledge:

3+ years of experience with AWS and proficiency in Kubernetes, Terraform, andInfrastructure as Code (IaC) tools.
Strong understanding of distributed systems, microservices architectures, and containerization (Docker, Kubernetes).

Monitoring and Automation:

Hands-on experience with CI/CD platforms (GitOps, Jenkins, ArgoCD) and building automated pipelines.
Familiarity with tools and frameworks for incident management and operational automation.

Additional Skills:

Knowledge of modern deployment strategies (e.g., Canary,Blue-Green) and resiliency patterns (e.g., circuit breakers, retries).
Strong analytical skills for statistical analysis of metrics to identify and resolve performance bottlenecks.

BENEFITS & PERKS

Unlimited PTO

Excellent medical, dental, and vision coverage

Employee Equity and Stock Purchase Plan
Employee Discounts, Virtual Wellness Classes, and Pet Insurance And more!!

COMPENSATION RANGE 

The compensation range for this role is $140,000.00 - $170,000.00, depending on location and experience.

PEOPLE & CULTURE AT ZETA

Zeta considers applicants for employment without regard to, and does not discriminate on the basis of an individual’s sex, race, color, religion, age, disability, status as a veteran, or national or ethnic origin; nor does Zeta discriminate on the basis of sexual orientation, gender identity or expression.

We’re committed to building a workplace culture of trust and belonging, so everyone feels invited to bring their whole selves to work. We provide a forum for employees to celebrate, support and advocate for one another. Learn more about our commitment to diversity, equity and inclusion here: https://zetaglobal.com/blog/a-look-into-zetas-ergs/

ZETA IN THE NEWS!

https://zetaglobal.com/press/?cat=press-release

#LI-YW1

Top Skills

Argocd

Ci/Cd

Docker

Gitops

Grafana

Honeycomb

Jenkins

Kubernetes

Opentelemetry

Prometheus

Python

Terraform

Similar Jobs at Zeta Global

Zeta Global

Senior Product Manager

9 Hours Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

170K-180K Annually

Senior level

170K-180K Annually

Senior level

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics

The Senior Product Manager will lead LLM Operations, focusing on agent chaining, model orchestration, and developing user-friendly tools for marketers.

Top Skills: AILangsmithLlmsWorkflow Orchestration

Zeta Global

Senior Product Designer

9 Hours Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

160K-180K Annually

Senior level

160K-180K Annually

Senior level

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics

Lead design of the Agentic Apps Framework for Zeta Developer Platform: define AI interaction patterns, build scalable design systems in Figma, prototype motion and micro-interactions, partner with Product and Engineering, mentor designers, and establish standards to deliver production-grade agentic experiences at scale.

Top Skills: BoltChatgptClaudeClaude CodeConfluenceCSSCursorD3FigjamFigmaFigma SlidesFramerFramer MotionGitGsapHTMLJIRALinearLovableModel Context ProtocolMotion.DevNext.JsNotionProtopieReactRiveSlackStorybook

Zeta Global

Senior Product Designer

9 Hours Ago

Easy Apply

Remote or Hybrid

United States

Easy Apply

160K-180K Annually

Senior level

160K-180K Annually

Senior level

AdTech • Artificial Intelligence • Marketing Tech • Software • Analytics

Lead the design of Zeta Developer Platform's AI-first experience—creating conversational interfaces, scalable design system components, motion-driven interactions, and prototypes. Partner with product and engineering to define AI interaction, explainability, and platform-wide UX patterns that build trust and consistency.

Top Skills: After EffectsChatgptClaudeClaude CodeConfluenceCSSCursorD3FigjamFigmaFigma SlidesFramerFramer MotionGsapHTMLJIRAMotion.DevNext.JsProtopieReactRiveSlackStorybook

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus