Site Reliability Engineer

San Diego, California, United States
Machine Learning and AI


Role Number:200173203
The Video Computer Vision org is a centralized applied research and engineering organization responsible for developing real-time on-device Computer Vision and Machine Perception technologies across Apple products. We balance research and product to deliver Apple quality, state-of-the-art experiences, innovating through the full stack, and partnering with HW, SW and ML teams to influence the sensor and silicon roadmap that brings our vision to life. Examples include FaceID, Animoji/Memoji, Scene Understanding, People Understanding and Positional Tracking (VIO/SLAM).

Key Qualifications

  • 3+ years in managing Site Reliability Engineering teams and supporting mission critical applications 5+ years managing large fleet of *nix systems
  • 3+ years of Hybrid Cloud (data center, AWS, GCP, Azure)
  • 5+ years with configuration management tools such as Ansible or Terraform 2+ years of programming experience (preferably Python) 2+ years of managing Relational and NoSQL Databases
  • Experience with building fully automated CI-CD pipelines
  • 2+ years of building fully automated CI-CD pipelines
  • You should also be self-directed, analytical, and work well in a team environment.


Your core responsibility is to provide operational support of multiple cloud based applications with an emphasis on deployment, security, scalability and reliability running on AWS and Apple infrastructure. Operations tech stack: Ansible, Terraform, Go, Python, Prometheus, with some bash scripting. Common technologies include: Django, Docker, Kubernetes, Postgres, Redis, and Cassandra. We make have a hybrid infrastructure and make use Amazon Web Service extensively along with home-grown compute clouds. What qualities will make you successful? We are looking for a driven and dedicated Site Reliability Engineer possessing hands-on experience with: Core Operations experience with Linux, Ansible (or similar), Docker, Kubernetes, Postgres. Engage various software development teams to collaborate and build services from the ground up Expertise in networking with an emphasis on security Experience building systems both on-premise (data center) and on public cloud (AWS, GCP or Azure welcome) Working knowledge of deploying microservices (Django, Go, JVMs’) Have worked with schedulers such as Kubernetes, AWS ECS or EKS. Ability to write code in one of many high level languages (Python preferred) Vast experience using Linux with knowledge of kernel/system tuning Last but not least, you are battle-tested and have a few interesting production tales

Education & Experience

BS/MS in Computer Science/Computer Engineering (or equivalent experience)

Additional Requirements