Senior Manager, Data Site Reliability Engineering, Ad Platforms

Cupertino, California, United States
Software and Services

Summary

Posted:
Weekly Hours: 40
Role Number:200511312
At Apple, we work every day to build products that enrich people’s lives. Our Advertising Platforms group makes it possible for people around the world to easily access informative and imaginative content on their devices while helping publishers and developers promote and monetize their work. Today, our technology and services power advertising in Search Ads in the App Store and Apple News. Our platforms are highly-performant, deployed at scale, and setting new standards for enabling effective advertising while protecting user privacy. The Ad Platforms team is seeking a Senior Manager for leading Data Site Reliability Engineering. Our mission is to enable Ad Platforms to deliver advertisements in a reliable and scalable way that results in fantastic user experiences!

Key Qualifications

  • Expert understanding in Linux based systems and deep expertise in Hadoop/YARN/Spark based technologies
  • Hands on experience with AWS /EMR, S3, Glue, Athena and Kubernetes Infrastructure
  • Expertise in designing, implementing and administering large Hadoop clusters and related Infrastructure such as Hive, Spark, HDFS, HBase, Oozie, Presto, Flume, Airflow and Zookeeper
  • 5+ years managing clustered services, distributed systems, production data stores
  • 5+ years leading teams in multiple locations
  • Experience in managing the life cycle of data services from inception and design to deployment, operation, migration, administration and sunsets
  • Experience in running Machine Learning pipelines (Training models, experimentation) and Jupyterhub/GPU compute/pytorch Infrastructure
  • Cloudera CDH5/CDH6/CDP cluster management and prior capacity planning experience for large scale multi tenant clusters
  • Ability to code well in at least one language (Shell, Ruby, Python, Java, Perl)
  • Experience in setup / management of security infrastructure such as Kerberos
  • Good work attitude and tenacious troubleshooting/analytical skills
  • Multi-datacenter deployment / Disaster Recovery experience is a plus
  • Prior Advertising and related data pipeline (click stream etc.) experience is a plus!
  • A passion to reinforce and enrich an engineering team environment, driving team engagement and satisfaction and most meaningfully, a sense of humor and an eagerness to learn

Description

Design and implement scalable data platforms for our customer facing services Monitor production, staging, test and development environments for multiple teams in an agile / dynamic fast paced engineering organization Deploy and scale Hadoop infrastructure to support data pipeline and related services Build infrastructure capabilities to improve resiliency and efficiency of the systems and services at scale Drive data infrastructure / pipeline, services and upgrade/migration projects from start to finish Support in Hadoop / HDFS infrastructure day today operations, administration and maintenance Data cluster monitoring and troubleshooting Capacity planning, management, and troubleshooting for HDFS, YARN/MapReduce and Spark work loads Participate in rotational on-call schedule Partner with program management, network engineering and other multi-functional teams on the larger initiatives Work simultaneously on multiple projects contending for your time and understand how to prioritize them accordingly Build and drive automation capabilities for the organization

Education & Experience

Bachelor's degree in Computer Science/Engineering discipline or equivalent Master's degree preferred

Additional Requirements

Pay & Benefits