Site Reliability Engineer

Seattle, Washington, United States
Software and Services

Summary

Posted:
Role Number:200518697
The Apple Services Engineering (ASE) team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. And they do it on a massive scale, meeting Apple’s high expectations with high performance to deliver a huge variety of entertainment in over 35 languages to more than 150 countries. These engineers build secure, end-to-end solutions. They develop the custom software used to process all the creative work, the tools that providers use to deliver that media, all the server-side systems, and the APIs for many Apple services. Thanks to Apple’s unique integration of hardware, software, and services, engineers here partner to get behind a single unified vision. That vision always includes a deep commitment to strengthening Apple’s privacy policy, one of Apple’s core values. Although services are a bigger part of Apple’s business than ever before, these teams remain small, nimble, and cross-functional, offering greater exposure to the array of opportunities here.

Key Qualifications

  • At least 5 years in a Site Reliability Engineering, DevOps or infrastructure focused role
  • Experience supporting internet-facing production services and distributed systems
  • Provide OnCall support to 1st level production support teams
  • Proficient coding experience with Python, Java, bash or similar languages
  • Passion for designing and building reliable systems
  • Strong sense of ownership and integrity demonstrated through clear communication and collaboration
  • Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment
  • Automation advocate - you truly believe in removing operation load with software
  • Understanding of the Linux Operating System, standard networking protocols, and components
  • Hands-on experience managing large numbers of diverse systems with configuration management, infrastructure provisioning tools or software delivery platforms (such as Puppet, Ansible, Terraform and Spinnaker)
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
  • Excellent troubleshooting and problem solving skills
  • Experience with disaster recovery, and capacity planning
  • Experience with containers and container orchestration platforms such as Docker and Kubernetes
  • Experience with monitoring tools such as Splunk, Grafana, and Prometheus
  • Administrative experience with tools such as Kafka is a plus
  • Experience with PCI, ISO 27001, SOX compliant systems is a plus
  • Demonstrated ability to deliver results on time with high quality
  • Excellent communication skills to work and collaborate with development teams

Description

Support and maintain services by measuring and monitoring availability, latency, and overall system health. Develop, manage and support SRE tools and applications. Engage in improving the whole lifecycle of services from inception through deployment, operations, and refinement. Analyze logs and telemetry data by writing monitoring and automation code. Provide OnCall support to 1st level production support teams. Provide hands-on technical expertise during service impacting events. Collaborate with other engineers on code reviews, internal infrastructure improvements and process enhancements. Operating at our scale, across multiple geographically dispersed data centers and servicing hundreds of millions of users presents outstanding challenges! As an SRE at Apple, you'll need to solve these problems using data, teamwork, and your own expertise. Will you join us in crafting solutions that do not yet exist?

Education & Experience

BS degree in computer science or equivalent field with 5+ years or MS degree with 3+ years experience, or equivalent.

Additional Requirements

Pay & Benefits