Site Reliability Engineer
London, Greater London, United Kingdom
Software and Services
Come help us build the next generation cloud platform to support internet services across Apple. Our software ensures that Apple's services are reliable, scalable, and secure, and we leverage both open source and home-grown technologies to provide managed infrastructure services. We are looking for a world-class Infrastructure Engineer with experience in developing processes, tools, and automation for managing distributed systems in production environments. We balance our time across automating operations for our growing footprint of deployments, building self-service products to empower internal customers, and increasing the reliability and scalability of our services with application and systems-level improvements.
- Deep understanding of the Linux operating system, including kernel, memory, process, threads, cgroups, static / shared libraries, IPC, signals. Standard UNIX utilities, programs and packaging.
- Extensive experience in configuration management and fleet orchestration via Puppet, Chef, Ansible, or others.
- Understanding of basic Internet infrastructure services including DNS, DHCP, LDAP, server virtualization, server monitoring, cloud services (AWS S3/EC2/CloudFront/Steps... or equivalent).
- Some exposure to structured or unstructured storage and caching.
- Demonstrated history in automating operations processes via services and tools.
- Knowledge of continuous integration, testing methodologies, TDD and agile development methodologies.
- Understanding of distributed system concepts including: the CAP Theorem, micro-services, and the Twelve Factor App.
- Fluency in one or more high-level programming languages like Java, Python, Go, Ruby or equivalent.
The Platform Infrastructure Engineering group designs, builds and operates the cloud infrastructure that hosts Apple’s consumer-facing applications. We are looking for talented engineers to join a team of highly experienced and effective individuals who are passionate and creative about production. Challenges of scale are solved through automation, attention to detail, and the strength of a fully-integrated data center, network, compute and application stack. Your experience in understanding how applications operate across distributed resources in diverse geographies, and create and tune the tools and monitoring will make you successful.
Education & Experience
Technical BS/MS degree or equivalent work experience.
- • Architect, author and deliver software to improve the availability, scalability and security of Apple's internal cloud infrastructure.
- • Build and manage systems, infrastructure and applications through automation.
- • Deploy, support and monitor new and existing services, platforms, and application stacks.
- • Use scale testing to measure, tune and optimize system performance.
- • Participate in periodic 24x7 on-call duties.
- • This role may require occasional international travel/transatlantic travel