Site Reliability Engineering (SRE) Manager
Austin, Texas, United States
Software and Services
Apple is seeking an outstanding Site Reliability Engineering (SRE) Manager responsible for the stability, security, and availability of enterprise encryption and PKI applications provided by the Crypto Services team. Thousands of transactions per second with millisecond response times. Applications that must stay up or Apple stops doing business. Security is priority one, but must not undermine availability, reliability, or performance. The services provided by the Crypto Services team are essential to the security of most of Apple’s products and services. iPhone, iPad, iOS, OS X, the App Store, iTunes, iCloud, and Apple Pay to name a few. The ambition is to achieve both security and availability while enhancing user experience.
- Skilled at working cross-functionally to achieve project success
- Standout colleague focused on team success above individual achievement
- Desire to build, grow and improve a team (manager as a mentor mentality)
- Passion for automation and a reluctance for manual implementation
- Strong philosophy of continuous improvement
- Strong sense of ownership wanting to understand how things work and resolve root causes
- Ability to encourage and foster a culture of visibility and transparency across teams
- Experience managing enterprise services in a large scale *nix environment
- Experience with Cloud Computing platforms (particularly AWS) a plus
- Experience with DevOps tools, processes, and culture. Experience with Puppet, Chef or Ansible
- Demonstrable knowledge of networking and network protocols
- Strong understanding of Java and J2EE application servers
- Working knowledge of databases including sql, indexing, and schema design
- Prowess in troubleshooting and problem solving
The SRE team is responsible for the following: - Application and Infrastructure testing (functional, performance, reliability, and failover) - Application configuration management and deployment - Troubleshooting environment related issues (response times, connectivity, authentication, authorization, configuration, etc.) - On-call for all application and cryptographic appliance issues - Health monitoring - Patch management - Application metrics and operational intelligence - Team collaboration and development tools (i.e. issue tracking, source code repository, monitoring, wiki, etc.) - Collaboration with supporting teams (systems engineers, database administrators, network engineers, data center operations, information security, and more) - Hardware deployment design - Network ACL and VIP design and requirements - Cryptographic appliance management and maintenance - Communication with business partners regarding significant events In order to succeed, the SRE team must collaborate effectively with several supporting teams. For example, systems engineers, database administrators, network engineers, data center operations, information security, etc. Defining detailed requirements, communicating clear expectations, and following up in a timely manner is essential. One of the most important teams the SRE’s will collaborate with is the software engineering team. The development of software and deployment of that software is increasingly intertwined, so intense collaboration between the SRE manager and development manager is paramount. Application services are operated out of multiple data centers and SRE’s are required to be on site at each data center a few times a year to perform various procedures. At the heart of the team’s services is cryptographic key management. The SRE team plays a vital role in ensuring the security, availability, and proper use of cryptographic keys.
Education & Experience
Prefers a BS in engineering, computer science or other technical disciplines plus 5 years of related experience.
- Travel: 5%