Clarifai

Senior Site Reliability Engineer

Posted 15 Days Ago

Easy Apply

Remote

Hiring Remotely in Canada

Senior level

Easy Apply

Remote

Hiring Remotely in Canada

Senior level

The Senior Site Reliability Engineer will ensure high availability of core services, optimize system performance, manage cloud infrastructure, and collaborate with teams to solve engineering challenges.

The summary above was generated by AI

Senior Site Reliability EngineerAbout the Company

Clarifai is a leading, compute orchestration AI platform specializing in computer vision and generative AI. We empower organizations to transform unstructured image, video, text, and audio data into actionable insights, significantly faster and more accurately than manual processes. Founded in 2013 by Matt Zeiler, Ph.D., Clarifai has been at the forefront of AI innovation since achieving the top five placements in the 2013 ImageNet Challenge. Our diverse, globally distributed team operates across the United States, Canada, Estonia, Argentina, and India.

We have secured $100M in funding, including a $60M Series C round, backed by industry leaders such as Menlo Ventures, Union Square Ventures, Lux Capital, NEA, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm, and Osage.

Clarifai is proud to be an equal-opportunity workplace committed to building and maintaining a diverse and inclusive team.

Your Impact

Clarifai’s platform is a kubernetes-native distributed system that requires the orchestration of many components. Efficiently serving and training large neural networks presents unique design and infrastructure challenges.

You will be critical to solving these challenges both in the context of the cloud and in on premise environments. Additionally, you will be responsible for our broader cloud infrastructure and development tools and environments.

The Opportunity

Ensure the smooth operation and high availability of Clarifai's core services
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
Develop Kubernetes resources and custom tooling for seamless cloud and on-premise deployments
Design and implement scalable, secure, and cost-effective infrastructure solutions.
Partner with teams across the organization to identify & solve engineering challenges

Requirements

BS/BA in Computer Science or related degree
Good knowledge of cloud providers (AWS, GCP or similar)
Expertise with Kubernetes (EKS, GKE, self-hosted) and Infrastructure as Code using Terraform, Helm
Solid understanding of web and networking (HTTP, TLS, DNS, Certificates, etc)
Experience with CI/CD pipelines using tools such as GitHub Actions, ArgoCD, and Atlantis
Strong interpersonal skills working with teams across different time zones and regions

Great to Have

Knowledge of basic Microservice Architecture principles
Familiarity with security best practices for cloud-based systems.
Experience with relational databases, message queues, key value stores
Experience writing python, golang, or any other popular programming language
Familiarity with any RPC framework
Experience developing & building custom Kubernetes operators

Top Skills

Argocd

Atlantis

AWS

GCP

Github Actions

Helm

Kubernetes

Python

Terraform

Similar Jobs

MongoDB

Senior Site Reliability Engineer

6 Days Ago

Easy Apply

Remote or Hybrid

Easy Apply

127K-249K Annually

Senior level

127K-249K Annually

Senior level

Big Data • Cloud • Software • Database

Manage continuous delivery infrastructure for reliable code deployment. Collaborate with teams to streamline onboarding, support deployment systems, and participate in on-call rotations.

Top Skills: Argo WorkflowsArgocdAWSAzureGoGoogle Cloud PlatformKubernetesPython

Red Hat

Senior Site Reliability Engineer

3 Hours Ago

Remote

140K-230K Annually

Senior level

140K-230K Annually

Senior level

Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation

As a Senior Site Reliability Engineer, you'll develop and manage OpenShift services, improve reliability, automate processes, and troubleshoot issues to enhance customer experiences.

Top Skills: AnsibleAzureDockerGoJavaKubernetesOpenshiftPrometheusPythonRed Hat Enterprise Linux

Fellow - AI Meeting Assistant

Senior Site Reliability Engineer

9 Days Ago

Remote

Ontario, ON, CAN

130K-160K Annually

Mid level

130K-160K Annually

Mid level

Artificial Intelligence • Software

The Senior Site Reliability Engineer at Fellow will design, implement, and manage reliable systems, optimize AWS infrastructure, and oversee Kubernetes clusters while enhancing CI/CD pipelines and monitoring systems.

Top Skills: AWSDatadogEc2ElasticsearchGithub ActionsGitlab CiGrafanaJenkinsKubernetesPrometheusPulumiRds

What you need to know about the Charlotte Tech Scene

Ranked among the hottest tech cities in 2024 by CompTIA, Charlotte is quickly cementing its place as a major U.S. tech hub. Home to more than 90,000 tech workers, the city’s ecosystem is primed for continued growth, fueled by billions in annual funding from heavyweights like Microsoft and RevTech Labs, which has created thousands of fintech jobs and made the city a go-to for tech pros looking for their next big opportunity.

Key Facts About Charlotte Tech

Number of Tech Workers: 90,859; 6.5% of overall workforce (2024 CompTIA survey)
Major Tech Employers: Lowe’s, Bank of America, TIAA, Microsoft, Honeywell
Key Industries: Fintech, artificial intelligence, cybersecurity, cloud computing, e-commerce
Funding Landscape: $3.1 billion in venture capital funding in 2024 (CED)
Notable Investors: Microsoft, Google, Falfurrias Management Partners, RevTech Labs Foundation
Research Centers and Universities: University of North Carolina at Charlotte, Northeastern University, North Carolina Research Campus