Site Reliability Engineer (SRE) - Site Reliability Engineering

Vancouver, British Columbia, Canada
Software and Services


Role Number:200326123
The Apple Media Products Engineering team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. And they do it on a massive scale, meeting Apple’s high expectations with high performance to deliver a huge variety of entertainment in over 35 languages to more than 150 countries. These engineers build secure, end-to-end solutions. They develop the custom software used to process all the creative work, the tools that providers use to deliver that media, all the server-side systems, and the APIs for many Apple services. Thanks to Apple’s unique integration of hardware, software, and services, engineers here partner to get behind a single unified vision. That vision always includes a deep commitment to strengthening Apple’s privacy policy, one of Apple’s core values. Although services are a bigger part of Apple’s business than ever before, these teams remain small, nimble, and cross-functional, offering greater exposure to the array of opportunities here.

Key Qualifications

  • 3+ years of running services in a large scale *nix environment.
  • Understanding of SRE principles and goals along with good Oncall experience
  • Experience and understanding on Scaling, Capacity Planning and Disaster Recovery
  • Fast learner with excellent analytical problem solving and communication skills
  • The ability to design, author, and release code in any language (Go, Python, Ruby or Java would be a plus)
  • Deep understanding and experience in one or more of the following - Docker, Mesos, Kubernetes, AWS, Ansible, Hadoop, Spark, Cassandra
  • Experience working on supporting Java applications
  • Experience using monitoring and logging solutions like Prometheus, Grafana, Splunk etc.
  • Familiarity with DNS, HTTP, message queues, queueing theory, RPC frameworks, datastore


The Site Reliability Engineer (SRE) role in Apple Media Products (AMP) requires a mix of strategic engineering and design along with hands-on, technical work. This SRE will configure, tune, and fix multi-tiered systems to achieve optimal application performance, stability and availability. We work closely with the systems engineers, network engineers, database administrators, monitoring team and information security team. Responsibilities will include: Be primary point of contact for the data pipeline that involves Kafka, Hadoop, Cassandra etc and infrastructure components Ensure Service level SLAs are met Write, review and develop code and documentation that solves the hardest problems that live on some of the largest and most sophisticated systems in the World Engage and improve life cycle of service from inception and design, to deployment, operation, migration and sunsets Passion for quality and automation, an ability to understand complex systems and a desire to constantly make things better. Set priorities and work efficiently in a fast-paced environment Measure and optimize system performance Experience working with geographically distributed teams and execute high level projects and migrations Strong communication skills. Demonstrate ability to deliver results on time with high quality. If you love designing, running systems and infrastructure that will affect millions of users then this is the place for you!

Education & Experience

BS in engineering, computer science or other technical disciplines plus 3 years of related experience.

Additional Requirements