SRE Engineer, ASE, London
People at Apple don't just build products — they craft the kind of experience that has revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it.
The Apple Service Engineering(ASE) team builds and provides systems and infrastructure that power Apple's services (such as iCloud, iTunes, Siri, and Maps).
We are the foundation on which Apple's software developers build the products that our customers love. Our services have to scale globally, stay highly available, and "just work." If you love designing, engineering, and running systems and infrastructure that will help millions of customers, then this is the place for you!
This is a hands-on role to maintain and enhance SRE practices for a private cloud service to accelerate our ability to reliably and consistently deliver thousands of applications.
Apple Service Engineering (ASE)'s Compute team is seeking an experienced SRE software engineer to build and enhance Kubernetes internals, ensuring that services scale to meet the demands of Apple's Services offerings.
You will work with world-class engineers on core components of Kubernetes with an emphasis on controllers and infrastructure to manage namespaces to help fit Apple's diverse needs while engaging with the upstream community to drive Apple's requirements. Ultimately, you will help build the platform that delivers our applications at scale to our end users.
The successful candidate will be highly self-motivated with a passion for excellence, quality and detail. The SRE will not only support operations, but also work closely with the developers and architects within the team to aid in the design and assist with the implementation to improve stability, security and scalability.
As an SRE at Apple, you will:
Operate, monitor, and triage all aspects of our production and non-production environments.
Design, build and implement innovative solutions around Kubernetes in a highly distributed environment.
Prepare alert handling procedures, runbooks, and collaborate with other SRE teams.
Participate in on-call rotations to troubleshoot and resolve production issues, minimizing downtime.
Automate deployment and orchestration of services into the cloud environment as well as other routine processes.
- In depth experience in a Site Reliability Engineering, DevOps, or Infrastructure focused role
- Must be an expert and have in-depth professional experience with cloud operations, with a focus on "infrastructure-as-a-service" (compute, storage, and network virtualization)
- Proficient in GoLang and Python
- Experience operating large-scale multi-tenant Infrastructure as a Managed service
- Familiarity with cloud infrastructure concepts (zones, regions, VPCs, etc)
- Automation advocate - you truly believe in removing operational load via software.
- A strong sense of ownership. At the same time, you're a great teammate who communicates clearly and transparently - Self-motivated, inquisitive, and always looking to learn more.
- Experience managing, scaling, and troubleshooting Java and GoLang applications
- Be capable of collaborating and coordinating with multiple distinct engineering teams and mentoring others