Senior Hadoop Site Reliability Engineer, Ad Platforms
Austin, Texas, United States
Software and Services
At Apple, we work every day to build products that enrich people’s lives. Our Advertising Platforms group makes it possible for people around the world to easily access informative and imaginative content on their devices while helping publishers and developers promote and monetize their work. Today, our technology and services power advertising in Search Ads in the App Store and Apple News. Our platforms are highly-performant, deployed at scale, and setting new standards for enabling effective advertising while protecting user privacy. The Ad Platforms team is seeking a Senior Hadoop Site Reliability Engineer for a great opportunity. Our mission is to enable Ad Platforms to deliver advertisements in a reliable and scalable way that results in awesome user experiences.
- Expert understanding in Linux based systems and deep expertise in Hadoop/YARN/Spark based technologies
- Expertise in designing, implementing and administering large Hadoop clusters and related Infrastructure such as Hive, Spark, HDFS, HBase, Oozie, Presto, Flume ,Airflow and Zookeeper
- 5+ years managing clustered services, distributed systems, production data stores
- Experience in managing the life cycle of data services from inception and design to deployment , operation , migration , administration and sunsets.
- Experience in running Machine Learning pipelines (Training models , experimentation ) and Jupyterhub / GPU compute/pytorch Infrastructure.
- Cloudera CDH5/CDH6 cluster management and prior capacity planning experience for large scale multi tenant clusters
- Ability to code well in at least one language (Shell, Ruby, Python, Java, Perl)
- Experience with AWS /EMR , S3, Glue ,Athena and Kubernetes Infrastructure
- Experience in setup / management of security infrastructure such as Kerberos
- Active member or contributor to open source Apache Hadoop projects is a plus
- Good work attitude and tenacious troubleshooting/analytical skills
- Multi-datacenter deployment / Disaster Recovery experience is a plus
- Prior Advertising and related data pipeline (click stream etc.,) experience is a plus
In this role, your duties will include: - Design and implement scalable data platforms for our customer facing services - Monitor production , staging , test and development environments for multiple teams in an agile / dynamic fast paced engineering organization - Deploy and scale Hadoop infrastructure to support data pipeline and related services - Build infrastructure capabilities to improve resiliency and efficiency of the systems and services at scale - Drive data infrastructure / pipeline , services and upgrade/migration projects from start to finish - Support in Hadoop / HDFS infrastructure day today operations , administration and maintenance - Data cluster monitoring and troubleshooting - Capacity planning, management, and troubleshooting for HDFS, YARN/MapReduce and Spark work loads - Participate in rotational on-call schedule - Partner with program management, network engineering and other cross functional teams on the larger initiatives - Work simultaneously on multiple projects competing for your time and understand how to prioritize them accordingly
Education & Experience
Bachelor's degree in Computer Science/Engineering discipline or equivalent. Master's degree preferred.