Xsolla

Site Reliability Engineer

Posted 2 Days Ago

Be an Early Applicant

In-Office or Remote

Hiring Remotely in Montréal, QC

120K-150K Annually

Senior level

In-Office or Remote

Hiring Remotely in Montréal, QC

120K-150K Annually

Senior level

As a Site Reliability Engineer, you will ensure system reliability, monitor for issues, resolve incidents, and collaborate with development teams to enhance operational stability.

The summary above was generated by AI

ABOUT US

At Xsolla, we believe that great games begin as ideas, driven by the curiosity, dedication, and grit of creators around the world. Our mission is to empower these visionaries by providing the support and resources they need to bring their games to life. We are committed to leveling the playing field, ensuring that every creator has the opportunity to share their passion with the world.

Headquartered in Los Angeles, with offices in Berlin, Seoul, and beyond, we partner with industry leaders like Valve, Twitch, and Ubisoft to clear the paths for innovation in gaming. Our global reach spans over 200 geographies, offering more than 700 payment methods in 130+ currencies.

Longevity Opportunity Vision Enjoy the game!

Requirements

Proven experience as a Site Reliability Engineer, or similar Software Engineering role in a large-scale production environment ( 5 years to 10 years)
overall in IT area (as Ops or Developer).
Proficiency in scripting languages such as Python, Bash. Strong understanding of Go and PHP will be a plus.
Deep knowledge of monitoring systems such as Datadog, Prometheus, Grafana.
Good understanding of continuous integration/continuous delivery processes and platforms (Gitlab preferred). Experience with Helm.
Experience with Docker, Kubernetes, or other container orchestration systems.
Familiarity with infrastructure automation tools like Terraform.
Experience with automation, system administration, and system hardening.
Experience with Linux-based infrastructures, Linux/Unix administration.
Demonstrated problem-solving skills, particularly debugging and troubleshooting complex software systems. Ability to work under pressure.
Excellent communication skills with a capacity to articulate and solve complex technical problems
Xsolla Technology Stack:Ubuntu, Kubernetes, Gitlab, Terraform, Terragrunt, Puppet, Nginx, Google Cloud Platform, Datadog, Prometheus, Grafana,
ELK, Zabbix and Harbor.

Responsibilities

Ensure high reliability and availability and meet SLAs, SLOs, and SLIs.
Monitor the system for issues and respond to incidents, ensuring quick resolution to maintain high system availability.
Drive incident resolution and process improvements to minimize downtime and increase operational transparency.
Ensure all key services are measured, monitored and raising alerts when needed.
Develop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Datadog, Prometheus, Grafana and others.
Support services before they go live through activities such as capacity planning, monitoring setup, logging, and production readiness reviews.
Engage in service capacity planning and demand forecasting, performance analysis, and system tuning.
Collaborate with the development teams to enhance the product's operational stability.
Build and drive the automation systems that maintain system health

Education

IT professional certifications are not required, but it will be a plus
Certified Kubernetes Administrator or Developer
HashiCorp Certifications
GCP Certifications

Benefits:

We are passionate about fostering a supportive environment for our team, so we prioritize the physical, mental, and emotional well-being of our employees and their families through a comprehensive Benefits Program. This includes 100% company-paid medical, dental, and vision plans, unlimited Flexible Time Off, and a personalized career roadmap for each employee. By investing in professional development through training and educational opportunities, we ensure that our team thrives both personally and professionally. Together, we’re not just building a business; we’re cultivating a community that values creativity, collaboration, and the transformative power of play.

By submitting the following job application form, you consent to Xsolla processing your data for career-related inquiries and potential employment opportunities. We process your data in accordance with this Xsolla Privacy Notice for Job Applicants. Please direct any inquiries regarding your data privacy to [email protected].

Top Skills

Bash

Datadog

Docker

Gitlab

Grafana

Helm

Kubernetes

Linux

PHP

Prometheus

Python

Terraform

Similar Jobs

GitLab

Site Reliability Engineer

Yesterday

Easy Apply

Remote

Easy Apply

Mid level

Cloud • Security • Software • Cybersecurity • Automation

As an Intermediate Site Reliability Engineer, you automate operations, manage PostgreSQL database reliability, handle incidents, and provide database expertise while designing scalable systems.

Top Skills: AnsibleChefGoKubernetesPostgresPuppetRubyTerraformVm

GitLab

Senior Site Reliability Engineer

5 Days Ago

Easy Apply

Remote

Canada

Easy Apply

Senior level

Cloud • Security • Software • Cybersecurity • Automation

Design, implement, and maintain scalable infrastructure using GCP and AWS, automate operations, manage incident responses, and enhance monitoring systems.

Top Skills: AWSGCPGoGrafanaHashicorp VaultKubernetesPrometheusPulumiTerraform

Guidewire Software

Site Reliability Engineer

4 Days Ago

Remote

Canada

Mid level

Cloud • Information Technology • Insurance • Software • Analytics

As a Site Reliability Engineer, you will automate processes, oversee AWS infrastructure, ensure platform reliability, and enhance observability tools while collaborating with developers.

Top Skills: ApacheAuroraAWSAws SqsBashBitbucketCloudwatchCrossplaneDatadogDockerFlux CdGitGoHelmJavaKafkaKubernetesKubevelaOktaPagerdutyPythonTeamcityTerraformTomcat

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus