Apple Media Products Engineering- Problem Management Engineering Program Manager
Santa Clara Valley (Cupertino), California, United States
Software and Services
- Exceptional organizational, project management, and technical acumen.
- Prior engineering, infrastructure, or software development project management experience for Site Reliability Engineering and / or DevOps.
- Process re-engineering experience with focus on Incident Management and Problem Management.
- Preference will be given to candidates who have knowledge and background in implementing ITIL.
- Framework and have knowledge of Service Management Processes.
- Proficiency and experience in delivering cross-functional large scale Programs.
- Self-motivated and proactive with demonstrated creative and critical thinking capabilities.
- Strong analytical, troubleshooting, and problem-solving skills.
- Strong verbal and written communication skills to represent at all levels of the organization.
- Lead all aspects of the Process to ensure Service Level Agreements (SLA’s) / SLO’s are published and met by support teams.
- Strong facilitation skills. Must be able to address and lead the discussion of a large number of individuals to keep them focussed on the agenda of the incident meetings.
- Should have worked in and have knowledge of various public Cloud environments.
Seeking a seasoned Engineering Program Manager for our Incident Management function. You will own the Incident Management Process for the AMP teams. The Incident management process includes, but not limited to, all tasks as described: - Initiate, coordinate and manage AMP Post Mortem meetings, as and when required, following a qualified Incident. This involves technically understanding the issue and reaching out to the concerned SRE teams to ensure the various parameters of the incidents are captured and documented prior to the Post Mortem meetings. - Facilitate the flow of the meeting to arrive at a logical explanation of the root causes of the incident and document the action items as a outcome of these meetings to improve Service availability. - Ensure the Incident Management KPI’s are recorded and their targets met. - Closely partner with Technical Managers to generate monthly statistical reports on Incidents and outages and present them to key Execs at Incident Review meetings. - Work with the respective teams to ensure the tools used in documenting Incidents are actively being supported and manage the roadmap for new features / enhancements. - Design maintain and continually improve Incident Management processes and metrics. Track and analyze trends and generate statistical reports. - Follow up with respective teams on the various tickets generated as a outcome of the Post Mortem's to ensure they are being prioritized and completed on a priority basis. - Partner with respective technical teams to review monitoring alert trends and discover opportunities to improve on-call alert frequency. - Lead continual service improvement and ongoing process maturity through regular reviews of the process, tools and reporting through regular engagement with partners.