Service Reliability Engineer - ASE Data Infra SRE
The people here at Apple don’t just build products— they craft the kind of wonder that has revolutionised entire industries. It’s the diversity of those people and their ideas that encourages the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Imagine what you could do here. Join Apple, and help us leave the world better than we found it. A job at Apple is unlike any other you’ve had. You will be challenged. You will be inspired. And you’ll be proud! At Apple, phenomenal ideas have a way of becoming phenomenal products, services, and customer experiences very quickly. Bring passion and dedication to your job, and there's no telling what you could accomplish!
The Apple Services Engineering team (ASE) is one of the most exciting examples of Apple’s long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. And they do it at an extensive scale, meeting our high expectations with dedication to deliver a huge variety of entertainment in over 35 languages to more than 150 countries.
These engineers build secure, end-to-end solutions. They develop the custom software used to process all the creative work, the tools that providers use to deliver that media, all the server-side systems, and the APIs for many Apple services.
Our team, ASE Data Infra SRE at Apple, supports analytics, real-time and batch jobs, and services for various Data Engineering teams at Apple, and we commit to reliability, automation, data-driven decisions, learning from failure, and ensuring user satisfaction. Join a team that values innovation, continuous learning, working on challenging problems at scale and the opportunity to make a significant impact.
Thanks to Apple’s unique integration of hardware, software, and services, engineers here partner to get behind a single unified vision. That vision always includes a deep commitment to strengthening Apple’s privacy policy, one of our core values. Although services are a bigger part of Apple’s business than ever before, these teams remain small and multi-functional, offering greater exposure to the array of opportunities here.
The Service Reliability Engineer (SRE) role in Apple Services Engineering requires a mix of strategic engineering and design along with hands-on, technical work. This SRE will configure, tune, and fix multi-tiered systems to achieve optimal application performance, stability and availability.
We manage jobs as well as applications on bare-metal and cloud computing platforms to deliver data processing for many of Apple’s global products. Our teams work with exabytes of data, petabytes of memory, and tens of thousands of jobs to enable predicable and performant data analytics enabling features in Apple Music, TV+, Appstore and other world class products.
If you love designing, running systems that will impact millions of users, then this is the place for you!
The Main Responsibilities for this position include:
- Support Java-based applications & Spark/Flink jobs on Baremetal, AWS & Kubernetes
- Ability to understand the application requirements (Performance, Security, Scalability, etc.) and assess the right services/topology on AWS, Baremetal & Kubernetes
- Build automation to enable self-healing systems
- Build tools to monitor high performance & alert the low-latency applications
- Ability to troubleshoot application-specific, core network, system & performance issues.
- Involvement in challenging and fast paced projects supporting Apple's business by delivering innovative solutions.
- Monitor production, staging, test and development environments for a myriad of applications in an agile and dynamic organisation.
- BS degree in computer science or equivalent field with 5+ years or MS degree with 3+ years experience, or equivalent.
- At least 5 years in a Site Reliability Engineering (SRE), DevOps role
- 5+ years of running services in a large-scale *nix environment
- Understanding of SRE principles and goals along with prior on-call experience
- Extensive experience in managing applications on AWS & Kubernetes
- Deep understanding and experience in one or more of the following - Hadoop, Spark, Flink, Kubernetes, AWS
- Fast learner with excellent analytical problem solving and interpersonal skills
- Experience supporting Java applications
- Experience with Big Data Technologies
- Experience working with geographically distributed teams and implementing high level projects and migrations
- Strong communication skills and ability to deliver results on time with high quality