Akamai Technologies Logo

Akamai Technologies

Senior II Site Reliability Engineer

Reposted 2 Hours Ago
Be an Early Applicant
In-Office or Remote
Hiring Remotely in Poland
Senior level
In-Office or Remote
Hiring Remotely in Poland
Senior level
The Senior II Site Reliability Engineer will lead reliability initiatives for AI inference platforms, mentoring SREs, defining observability strategies, and improving incident management while building automation tools.
The summary above was generated by AI

Do you want to shape reliability practices for a new AI inference platform?

Are you a senior technical leader who drives solutions across teams?

Join the Akamai Inference Cloud Team

The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design, implement, deploy and operate AI platforms that enable customers to run inference models and developers to create AI applications.

Partner with the best

In this role, you'll lead reliability workstreams for Akamai's serverless inference platform, desig SRE tooling and automation, and drive technical decisions. Opportunities exist to mentor other SREs, influence architecture decisions with product engineering teams, and shape SRE practices for AI inference workloads and GPU infrastructure at scale.

As a Senior II SRE, you will be responsible for:

  • Taking responsibility for observability strategy, designing telemetry, dashboards, alerts, defining SLO/SLI frameworks, and implementing improvements when targets are missed.
  • Building production-grade automation and tooling that reduces operational toil, improves incident response, and sets patterns that other SREs adopt
  • Owning incident management integration for inference workloads, designing frameworks, leading incident response during on-call rotations, and driving systemic improvements from post-mortems
  • Defining and implementing deployment safety practices including progressive rollouts, canary analysis, and rollback automation, establishing standards for the team
  • Partnering with product engineering teams to influence architecture decisions, ensure operational readiness, and represent the SRE perspective in design reviews
  • Mentoring Senior and mid-level SREs through code reviews, design discussions, and hands-on problem-solving

Do what you love

To be successful in this role you will:

  • Have extensive experience in SRE, platform engineering, or infrastructure engineering, working with large-scale distributed systems
  • Track record of defining SLO/SLI frameworks, building observability platforms, and running incident management processes at scale
  • Demonstrate expertise in Kubernetes and containerization, including autoscaling, resource scheduling, and orchestration for compute-intensive workloads at scale.
  • Build automation and tooling using Python or Go, while leveraging CI/CD pipelines, deployment safety practices, and infrastructure-as-code expertise.
  • Lead technical initiatives across teams, guide engineers through mentorship, and resolve complex reliability challenges independently with expertise and precision.
  • Gain experience in AI/ML infrastructure, model deployment, or handling GPU workloads effectively within relevant environments.
  • Demonstrate ownership of intricate reliability issues, deliver solutions collaboratively, and enhance the technical expertise of surrounding SRE team members.

Work in a way that works for you

FlexBase, Akamai's Global Flexible Working Program, is based on the principles that are helping us create the best workplace in the world. When our colleagues said that flexible working was important to them, we listened. We also know flexible working is important to many of the incredible people considering joining Akamai. FlexBase, gives 95% of employees the choice to work from their home, their office, or both (in the country advertised). This permanent workplace flexibility program is consistent and fair globally, to help us find incredible talent, virtually anywhere. We are happy to discuss working options for this role and encourage you to speak with your recruiter in more detail when you apply.
Learn what makes Akamai a great place to work

Connect with us on social and see what life at Akamai is like!

We power and protect life online, by solving the toughest challenges, together.

At Akamai, we're curious, innovative, collaborative and tenacious. We celebrate diversity of thought and we hold an unwavering belief that we can make a meaningful difference. Our teams use their global perspectives to put customers at the forefront of everything they do, so if you are people-centric, you'll thrive here.

Working for you

At Akamai, we will provide you with opportunities to grow, flourish, and achieve great things. Our benefit options are designed to meet your individual needs for today and in the future. We provide benefits surrounding all aspects of your life:

  • Your health
  • Your finances
  • Your family
  • Your time at work
  • Your time pursuing other endeavors

Our benefit plan options are designed to meet your individual needs and budget, both today and in the future.

About us

Akamai powers and protects life online. Leading companies worldwide choose Akamai to build, deliver, and secure their digital experiences helping billions of people live, work, and play every day. With the world's most distributed compute platform from cloud to edge we make it easy for customers to develop and run applications, while we keep experiences closer to users and threats farther away.

Join us

Are you seeking an opportunity to make a real difference in a company with a global reach and exciting services and clients? Come join us and grow with a team of people who will energize and inspire you!
#LI-Remote

Similar Jobs

2 Hours Ago
In-Office or Remote
Senior level
Senior level
Cloud • Security • Software • Cybersecurity
As a Senior II Software Engineer, you'll design core capabilities for Akamai's AI inference platform, focusing on scalability, reliability, and performance, while mentoring team members and driving technical decisions.
Top Skills: APIsAuthenticationCi/CdCloud-Native ArchitecturesContainerizationDevOpsDistributed SystemsInfrastructure-As-CodeKubernetesLoad Balancing
2 Hours Ago
In-Office or Remote
Mid level
Mid level
Cloud • Security • Software • Cybersecurity
As a Software Development Engineer in Test at Akamai, you will develop scalable software, create test strategies, and automate testing processes in collaboration with product engineers.
Top Skills: AngularBashJavaLinuxPythonRubyShellSpring FrameworkUnix
2 Hours Ago
In-Office or Remote
45-45 Hourly
Internship
45-45 Hourly
Internship
Cloud • Security • Software • Cybersecurity
As a Cloud Site Reliability Intern, you will work on ensuring smooth operation of Compute services, automate systems, and collaborate with teams.
Top Skills: BashGnu/LinuxGrafanaPrometheusPythonSQLUnix

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

  • Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
  • Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
  • Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
  • Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
  • Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
  • Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account