SWE - Site Reliability Engineer - WTE

Munich, Bavaria-Bayern, Germany
Software and Services

Summary

Posted:
Weekly Hours: 40
Role Number:200208826
Software Delivery Services & Infrastructure is focused on ensuring engineers at Apple can do amazing things and we are looking for a talented application Site Reliability Engineer (SRE) to join us in this mission. You’ll play a critical role in the day-to-day operations of services relied upon across Apple. You'll partner with engineering teams to ensure they're successful. You'll look for opportunities to innovate all while driving for rock-solid operations. Responsibilities will include - Adopt and apply SRE best practices to services you support - Keep users, key stakeholders, and leadership informed through regular reporting and communications - Identify areas of automation for manual tasks/toil - Develop playbooks related to actionable alerts - Foster strong relationships with cross-functional teams - Participating in on-call rotations - Deployment validation testing for production deployments - Continuous customer experience validation and performance analysis - Perform regular disaster recovery (DR) testing and fail-overs - Participation in incident post mortems and implementing preventive findings - Ensuring services are adhering to published specs/standards - Perform predictive analysis or implement AI to do issue avoidance

Key Qualifications

  • A positive and respectful attitude
  • A passion for providing reliable services at scale, on bare metal as well as in cloud environments
  • A deep understanding of CI/CD technologies such as Jenkins
  • Strong working knowledge of Git and code-review systems such as Gerrit, Bitbucket, and Github
  • Good understanding of administration of Linux services
  • Experience using Prometheus, Grafana, and Splunk
  • Superb collaboration skills with excellent written and verbal communication
  • The ability to troubleshoot large scale systems
  • Deep understanding of web services, how they operate and what needs monitoring and alerts
  • Good understanding of security principals and design
  • The desire to be proactive at all times in issue prevention
  • The desire to do what is right for the customer and to provide a great customer experience

Description

As part of Software Delivery Services & Infrastructure SRE, you will be responsible for delivering reliable services and driving projects to a successful outcome. This role will focus on operating and supporting a distributed development workflow used by teams in Software Engineering. You will monitor SLOs, respond to incidents, troubleshoot issues, and ensure the service is up-to-date and secure. You will collaborate with engineering teams to implement best practices and shape technical decisions. To ensure your success, this job will provide you with: - Passionate and talented coworkers around the global that are ready to collaborate, mentor, and learn from you - Ownership to drive meaningful improvements to the operational reliability of the services you manage - Opportunities to contribute to the best practices used by SRE teams within Software Delivery

Education & Experience

Additional Requirements

  • - Prior experience as an SRE, software engineer, or system administrator
  • - Proven ability to self-manage large projects and meet deadlines