Senior Incident Response Engineer
Santa Clara Valley (Cupertino), California, United States
Software and Services
Imagine what you could do here - At Apple, phenomenal ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. We at Retail store apps infrastructure and operations team are responsible for the maintenance and high availability of business critical infrastructure and services. We are seeking a talented, self-driven individual to help co-ordinate the timely restoration of these services and drive incident management process across various cross-functional groups.
- 5+ years of Major Incident / Problem management experience for IT organizations that run high-demand applications
- Develop the incident management and response processes, train teams (Apple production support) and handle partnerships across retail store apps landscape
- Own and lead reported incidents with a focus on restoring services and business continuity
- Monitor and handle communications during high impact incidents via relevant channels
- Prepare statistics, KPI and trend reports for use in the incident and problem management process
- Drive and facilitate RCA (root cause analysis) activities and participate in Post-mortem reviews
- Knowledge on AWS/Google cloud platform or any other cloud vendor is a must.
- Good system level understanding of Linux, TCP/IP, HTTP/S fundamentals is required.
At Retail store apps team, we build and lead large scale web and iOS applications that are used by Apple retail store employees world wide. We strive to provide operational excellence by ensuring the highest levels of performance and availability across retail store apps. As we expand our presence from Apple Data center to other cloud providers, we are looking for a self-driven and highly motivated incident response engineer to handle communication, emergency response and incident management for our store employees. - Should have experience in systems and network administration - Should have broad understanding of Incident and Problem Management - Must possess strong written and verbal communication skills - Must have strong analytical and project management skills - Should have flexibility and willingness to support a 24x7 Apple production support team via off-hours support and or on-call availability
Education & Experience
Bachelor of Science in Computer Science, or equivalent experience.
- Any experience on container technologies like Docker, K8s or CoreOS is preferred. Knowledge on distributed database and multi-tenant systems is required.