Santa Clara Valley (Cupertino), California, United States
Software and Services
Apple Services Engineering team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. Join Apple Services Engineering Cloud Service Infrastructure team, as a Site Reliability Engineering Manager, to help support and scale cloud services for millions of Apple users. This is a hands-on role, to establish SRE practices for a private cloud service, to accelerate our ability to reliably and consistently deliver thousands of applications. You will lead a team of Site Reliability Engineers who thrive in a fast-paced workplace, where drive and collaboration are the keys to success!
- 8+ years in critical, large scale distributed systems experience, combining Hardware, Operating Systems and Software
- 3+ years experience building and leading engineering teams; ideally SRE or Production Engineering
- Strong emphasis on SRE as an engineering subject area, with proficiency in at least in one of the following languages (Golang, Rust, Python, Swift)
- Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements
- Superb interpersonal skills, capable of working with multi-functional technical and business teams and varying levels of management, influencing decision making
The Apple Services Engineering Cloud Services SRE organization is looking for a strong, hands-on leader. The leader will lead a platform focused SRE team, and be responsible for the reliability of the platform. The platform serves workloads that provide our organization and our customers with their favorite applications, services, and tools. We are domain experts in fleet management, systems, and software engineering. We build automations, instrument reliability tools, and respond to alerts and incidents which may pose a risk to the reliability of the platform. Team’s focus is on infrastructure capabilities and processes, improving the reliability and efficiency of the systems, at scale. Responsibilities include: - Act as the Service Owner, designing and mapping key performance indicators to achieve the organization’s mission - Lead the definition of requirements, priorities and planning of engineering deliverables - Implement structured engineering and operations processes - Lead the team in daily agile SRE practices, ensuring proper team focus on priorities, achievements, and deliverables - Optimize velocity and efficiency of delivery, and drive continuous improvement Success depends on strong understanding of SRE principles and practices, combined with a track record of resolving issues in a live production environment, and implementing strategies to minimize them while driving clear action plans for the team. The successful candidate will be highly self-motivated with a passion for excellence, quality, and detail. As a leader, they are responsible for coaching and mentoring their team members, helping them achieve service goals, and build career paths in alignment. It’s imperative for the leader to empower their team by providing appropriate context and timely feedback. The leader will not only own the service, but will also collaborate with other teams within Apple. They will build trust with stakeholders and partner through diplomacy, discussion, and follow-through. This is a broad cross-organization role with high-visibility, collaborating with multiple teams. They are expected to invest in and build good relations with key partners. Their collaboration with internal customers, product engineering, and development groups is critical to success.
Education & Experience
Bachelors or Masters in Computer Science, Computer Engineering, or equivalent experience.