Senior Site Reliability Engineer - Ad Platforms

Austin, Texas, United States
Software and Services

Summary

Posted:
Weekly Hours: 40
Role Number:200549803
At Apple, we work every day to build products that enrich people’s lives. Our Advertising Platforms group makes it possible for people around the world to easily access informative and imaginative content on their devices while helping publishers and developers promote and monetize their work. Our technology and services power advertising in Apple News and Ads in the App Store. Our offline pipelines are highly performant, deployed to handle high-volume asynchronous events at scale, and set new standards for enabling effective advertising while protecting user privacy. The Ad Platforms team is seeking a Site Reliability Engineer for a great opportunity. Our mission is to enable Ad Platforms to deliver advertisements in a reliable and scalable way that results in awesome user experiences.

Key Qualifications

  • 5+ years experience supporting internet-facing production services and distributed systems.
  • Good programming skills in one of Java or Python or Go.
  • Experience in operating Linux based systems, with a proven understanding of its internals.
  • Experience in container platforms like Kubernetes.
  • Experience building and running infrastructures on AWS, including using services like EKS, MSK.
  • Experience in Infrastructure as a code like Terraform.
  • Experience in leading the deep-dive and troubleshooting of production issues with an active diagnostic call.
  • Demonstrated problem solving ability using creative and innovating thinking but also adhering to a strong sense of ownership, customer service, and integrity demonstrated through clear communication.
  • Aim to be self-motivated, and eager to learn.

Description

As a Site Reliability Engineer you will be responsible for providing the platform for critically important ad-tech systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to flourish. The successful candidate will be highly self-motivated with a passion for excellence, quality and detail. The SRE will not only support operations, but also work closely with the developers and architects within the team to aid in the design and assist with the implementation to improve stability, security and scalability. Key Responsibilities: - Implement and improve our infrastructure and application monitoring and observability capabilities that results in improving our reliability. - Engage with application engineering teams to improve service operability and reliability, on-call efficiencies, drive incident management, and post-mortem analysis. - Drive production readiness, and improve key areas like capacity planning, configuration management, and observability Design and improve architectures of new and existing systems based on the principles of reliability and high availability with extensive logging and observability. - Develop expertise in Apple Infrastructure and best practices and bring that to Ad Platforms to run a world class distributed systems. Create tooling and automation to improve the operations and operability of our infrastructure and applications.

Education & Experience

Bachelor's degree in Computer Science/Engineering field or equivalent. Master's degree preferred.

Additional Requirements