Site Reliability Engineer (SRE) - Object Storage

Posted: 11 Mar 2024

Weekly Hours: 35

Role Number:200541645

People at Apple don’t just build products — they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it. Apple Cloud infrastructure is BIG. The storage SRE teams of Apple Cloud are building and running the next generation distributed storage systems to support Apple’s most critical services. Operating at our scale, across multiple geographically dispersed data centers, and servicing users with vast data need presents unique challenges. As a Storage SRE at Apple, you'll need to solve these problems using your deep understanding of storage, data analysis, programming, teamwork, and expertise in Linux system internals. Storage SREs at Apple involve themselves across the full infrastructure stack; from tuning the block storage layer to content delivery network traffic management.

Key Qualifications

Experience in building, operating, and scaling distributed storage systems in a private, public, or hybrid cloud environment.
The ability to design, author, understand, and release code in languages like Go (preferred), Java, Python, or Rust.
Good understanding of block, object, and file storage solutions in Linux (such as LVM, XFS, ext4, S3, Ceph, Gluster, NFS).
Understanding of Linux internals, standard networking protocols, and distributed systems.
Experience with provisioning, data migration, backup & recovery, at-scale testing, disaster recovery, and capacity planning.

Description

We are looking for seasoned software and systems engineers to join the Object Storage SRE team at Apple. The role involves tremendous amount of individual responsibility and influence over the direction the platform, shaping its use by many critical Apple Cloud services for years to come. You are solution-oriented and have a passion for software delivered as a service to improve reuse, efficiency, and simplicity. Your work will affect hundreds of millions of users and be essential to the success of some of the most visible current and future Apple features. The role involves understanding the team's priorities; taking ownership of projects or deliverables; designing solutions and building buy-in for those designs; and successful delivery of those designs in order to meet the project goal. The role involves giving technical feedback to colleagues to assist them in the delivery of their designs, features and projects, as well as driving technical standards across the two-site team in collaboration with other senior members of the team. The team has an on-call rota including the week-ends and the successful candidate should expect to handle alerts and other escalations in order to maintain a high level of availability and functionality for our provided services. The team is divided into two shards in UK and US and cross-timezone meetings are a core feature of how our team collaborates, reaches agreements, and executes to deliver projects. At Apple Cloud, we run a mix of open source, vendor licensed, and internally developed tools to perform functions such as system configuration management, provisioning, software development & deployment, logging, and monitoring. You'll learn these tools and have opportunities to improve them. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded. The candidate may be expected to travel to other Apple locations from time to time e.g. the USA.

Education & Experience

Bachelors degree in Computer Science or related field, or equivalent employment

Additional Requirements

Acute drive to automate manual operations and to improve them through repeated iteration.
Awareness of best practices for deployment of storage systems - implication of physical and virtual deployment models to change management. failure domains, hardware lifecycle management, etc.
Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible, and Spinnaker).
Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks.
Familiarity with microservices architecture and container orchestration with Kubernetes.
Familiarity with relational & non-relational databases (such as Cassandra, Postgres, & RocksDB)