Senior Site Reliability Engineer - Apple Corporate Systems
Santa Clara Valley (Cupertino), California, United States
Software and Services
The Corporate Systems Engineering group at Apple primarily focuses on creative ways to engineer business solutions to meet the growing business needs of Apple's People, Finance, iTunes, Sales, Retail, and IT Service organizations. At core, our portfolio comprises of engineered custom solutions to process very high-volume micro-transactions from iTunes Downloads, iPhone Activations, Sales from Retail, Online, and Resellers, etc. These solutions are based on cutting edge enterprise technologies ranging from Server Side Java, Web Technologies, Cocoa, iOS, Oracle, and Non-SQL Databases. Accurately processing such very high volume transactions is our core strength. As member of the technical leadership team, you will play a critical role in ensuring building and running apps and services from reliability, availability, and scalability perspective.
- 5+ years of hands-own experience in deploying and troubleshooting of apps or services in a large scale Linux/Unix environment.
- Establish change management discipline in roll-out and deployment of new product features.
- Experienced in developing automated solution for deploying, monitoring, and logging of our critical services.
- Proficient in solving problems wide range of issues across multiple technologies.
- Experience in automation through scripts or through other tools for reducing toil of manual processes.
- Infrastructure knowledge of Network, Load Balancers, VM, Firewalls, Security Certificates, etc.
- Working knowledge of Oracle.
- Experience in troubleshooting and issue triaging.
- Working knowledge of source control software (SVN or Git).
- Ability to multi-task and manage tasks with varying priorities.
- Ability to work independently with minimal supervision.
- Excellent written and oral communication skills.
We are looking for a hands-on, energetic, and seasoned site reliability engineering lead to serve as a primary person responsible for the overall health, availability, reliability, scalability, and capacity planning of our critical services. While the engineering team will be primarily focused on development of new features, the SRE Lead will need to be in lock-step with them in gaining deep application knowledge to ensure durability and operability of the services. - Responsible for automating mundane and repetitive procedures. - Work closely with system engineers, network engineers, database administrators, information security team to achieve service level objectives. - Responsible for establishing and monitoring of service level indicators to maintain the overall health of the services. - Work in troubleshooting, triaging production issues, and further escalating the issues to the engineering team for a permanent fix. - Participate in the change management processes to ensure the durability and operability of the service.
Education & Experience
Bachelor’s and/or Masters in Computer Science with 5+ years relevant work experience.