Sr. Data Site Reliability Engineer - Ad Platforms
Austin, Texas, United States
Software and Services
This role can be located in either Austin/NYC. At Apple, we work every day to build products that enrich people’s lives. Our Advertising Platforms group makes it possible for people around the world to easily access informative and imaginative content on their devices while helping publishers and developers promote and monetize their work. Today, our technology and services power advertising in Search Ads in the App Store and Apple News. Our platforms are highly-performant, deployed at scale, and setting new standards for enabling effective advertising while protecting user privacy. The Ad Platforms team is seeking a Senior Data Site Reliability Engineer for a great opportunity. Our mission is to enable Ad Platforms to deliver advertisements in a reliable and scalable way that results in awesome user experiences.
- Expert understanding in Linux based systems and deep expertise in Hadoop/YARN/Spark based technologies
- Hands on experience with AWS /EMR, S3, Glue , Athena and Kubernetes Infrastructure
- Expertise in designing, implementing and coordinating large Hadoop clusters and related Infrastructure such as Hive, Spark, HDFS, HBase, Oozie, Presto, Flume, Airflow and Zookeeper
- 5+ years leading clustered services, distributed systems, production data stores
- Experience in managing the life cycle of data services from inception and design to deployment, operation, migration, administration and sunsets.
- Experience in running Machine Learning pipelines (Training models, experimentation) and Jupyterhub/GPU compute/pytorch Infrastructure.
- Cloudera CDH5/CDH6/CDP cluster management and prior capacity planning experience for large scale multi tenant clusters
- Ability to code well in at least one language (Shell, Ruby, Python, Java, Perl)
- Experience in setup / management of security infrastructure such as Kerberos
- Good work demeanor and tenacious troubleshooting/analytical skills
- Multi-datacenter deployment / Disaster Recovery is a nice to have
- Prior Advertising and related data pipeline (click stream etc.) experience is a plus
In this role, your duties will include: Design and implement scalable data platforms for our customer facing services Supervise production, staging, test and development environments for multiple teams in an agile / multifaceted fast paced engineering organization Deploy and scale Hadoop infrastructure to support data pipeline and related services Build infrastructure capabilities to improve resiliency and efficiency of the systems and services at scale Drive data infrastructure / pipeline, services and upgrade/migration projects from start to finish Support in Hadoop / HDFS infrastructure day today operations, administration and maintenance Data cluster monitoring and troubleshooting Capacity planning, management, and troubleshooting for HDFS, YARN/MapReduce and Spark work loads Participate in rotational on-call schedule Partner with program management, network engineering and other multi-functional teams on the larger initiatives Work simultaneously on multiple projects competing for your time and understand how to prioritize them accordingly
Education & Experience
Bachelor's degree in Computer Science/Engineering field or equivalent. Master's degree preferred.