Site Reliability Engineer
Santa Clara Valley (Cupertino), California, United States
Software and Services
People at Apple don't just build products — they craft the kind of experience that has revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it. The Apple Cloud Infrastructure (ACI) team builds and provides systems and infrastructure that fuel Apple’s services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which Apple’s software developers build the products that our customers love. We are looking for passionate and hardworking Site Reliability Engineers to continue our focus in providing our customers the highest quality Apple Services experience. Our services have to scale globally, stay highly available, and "just work.” If you love designing, engineering and running systems and infrastructure that will help millions of customers, then this is the place for you! All of our systems are Linux, and we work with the systems directly; there is no cloud abstraction layer between you and most of our systems. We run a mix of open-source, vendor licensed, and internally developed tools to perform functions such as system configuration management, provisioning, software deployment, logging, and monitoring. You'll learn these tools and have opportunities to improve them. Our team is collaborative; we work closely with the development teams we support to deliver the best results for Apple. We think critically and strive to balance the best solution with the need to get things done for each engineering challenge we face. Good ideas are heard and results are rewarded.
- Exactly what you’ll end up doing in this role is very flexible, but here're some things you should feel confident you could accomplish your first year:
- Take charge of one of our major systems, becoming the domain expert.
- Become known to our customers, and to your peers, for your expertise in your chosen area including helpfulness, ability and willingness to teach and mentor others, and friendly demeanour; debugging and fixing operational issues, such as problems with system configuration, system provisioning, and user access.
- Kept a keen eye on security issues in every project you work on, contributed to improving security in the systems that were already in place.
- Participated in our on-call schedule and will have contributed to making it better and reducing the toil associated with it.
- Actively participated in many, many discussions inside our team and with other teams designed to identify and pursue the best solutions to our automation and systems management problems.
- You will have brought to these discussions your strong opinions and respectful, collaborative attitude.
- Established a particularly strong relationship with a single other team, such as the monitoring team, the security design team, or one of the property SRE teams.
- This will have allowed you both to influence them more effectively in their pursuit of automation and toil reduction, and to keep the rest of our team apprised of upcoming initiatives that we need to know about.
- You might have also taken on the challenge of writing an entirely new piece of automation, including customer-facing documentation, operational documentation, extensive automated testing, operational design, release and deployment.
ABOUT THE TEAM The team I'm hiring for specifically is the Security Services SRE team. This team will be responsible for maintaining systems that provide internal security infrastructure for the ACI systems, system authentication (LDAP), secrets management, bootstrapping secure host communications, and so on. I'm looking for people of a wide range of skills and backgrounds for this new team, from fresh out of college all the way through to highly seasoned senior SREs. ABOUT YOU Strong sense of ownership and integrity demonstrated through clear communication and collaboration Experience in managing and scaling distributed systems in a public, private, or hybrid cloud environment The ability to design, author, and release code in languages like Go, Python, Ruby or Java Acute drive to automate manual operations and to improve them through repeated iteration Understanding of the Linux Operating System, standard networking protocols, and components Hands-on experience managing large numbers of diverse systems with configuration management or software delivery platforms (such as Puppet, Chef, Ansible, and Spinnaker) Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks Excellent fixing and problem-solving skills Experience with scale testing, disaster recovery, and capacity planning Familiarity with Microservices Architecture and container orchestration with Kubernetes
Education & Experience
Accredited degree or equivalent industry experience
- Apple is an Equal Opportunity Employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or other legally protected characteristics. If you'd like more information about your EEO rights as an applicant. Apple will not discriminate or retaliate against applicants who inquire about, disclose, or discuss their compensation or that of other applicants.
- Apple will consider for employment all qualified applicants with criminal histories in a manner consistent with applicable law. If you are applying for a position in San Francisco, please click here...
- Apple participates in the E-Verify program in certain locations as required by law. Learn more.
- Apple's committed to working with and providing reasonable accommodation to applicants with physical and mental disabilities. Learn more.