Siri - Data Site Reliability Engineer

Santa Clara Valley (Cupertino), California, United States
Machine Learning and AI

Summary

Posted: Feb 21, 2019
Weekly Hours: 40
Role Number: 200037799
As a member of the Siri Data Production Engineering team within the Siri organization, you will face highly complex issues in a large scale, distributed system environment. In order to ensure a reliable and rewarding Siri experience, you will develop and design new solutions heavily focused on system automation. We look for talented engineers in both the Operations and Development space to bring these rare solutions to production at a rapid pace. From an Operations perspective, you will help make us the most successful personal assistant in the industry.

Key Qualifications

  • Passionate about Data technologies (Hadoop, Hbase, Spark, Kafka, Flume, Hive, Solr, Yarn, Presto, Jupyter Notebooks, etc.)
  • Understanding and experience deploying CI/CD pipelines
  • Expertise in configuration management (such as Puppet, Chef, or Ansible) for deploying, configuring, and managing servers and systems
  • Strong Experience with alerting and monitoring automation for large scale data and services infrastructure
  • Understanding of one or more object oriented programming languages (Scala, Java, C++)
  • Fluent in at least one scripting or systems programming language (Python, Ruby, Bash, Go, Rust, Crystal etc.)
  • Linux expertise from detail level understanding to large scale deployments
  • Network and security knowledge for large scale deployments
  • Interest and knowledge in using containers (Docker, etc) and container orchestration such as Mesos(DCOS)/Marathon or Kubernetes frameworks for scaling data and services infrastructure
  • Strong verbal and written communication skills
  • Passionate about being part of a tight-knit team

Description

You will be responsible for both the worldwide Siri event log data ingestion infrastructure and the data storage platform used for analytics and machine learning of worldwide Siri events. Data integrity from server immittance to the aggregate cluster is our core metric. We ingest double digit billion events per day with an available data footprint well into the triple digit petabytes. To run our environment efficiently, we drive for automation and build frameworks which help ease both the end user experience and the manageability by our team. The team’s direction is to strive for automation of our pipelines to ensure reliability at the highest level. You will work closely with both platform development and data analytics engineers to ensure new streams or existing streams are integrated properly from an efficiency, scalability and maintainability standpoint. Tasks which cannot be automated are documented in a runbook for a NOC to support. As a member of the Siri Production Engineering Team within Apple you will: You will manage Apple’s largest infrastructure supporting millions of Siri customers at almost triple digit PB scale. You will diagnose and fix complex issues across the entire stack. You will design and develop automation / frameworks to handle both development and production activities at scale. You will take part in building out our data platform scalability story using container orchestration You will advise other teams (within and outside of Siri) on proper integration on our platform. You will innovate on our environment with the purpose of pushing Siri to the next level

Education & Experience

BS, MS, or PhD degree in Computer Science or equivalent and 3+ years experience in data technologies

Additional Requirements

  • In depth experience with Mesos/Marathon, Kubernetes, Docker Swarm, or other container orchestration framework
  • Experience building and operating large scale Hbase and Spark data infrastructure in a production environment
  • Experience deploying user facing data services such as Presto, Jupyter Notebooks, ad-hoc query environments etc
  • Scaling and operating data infrastructure to support production Machine Learning and GPU based systems at large scale
  • Workflow and data pipeline orchestration experience (Oozie, Airflow, Pinball, Jenkins, Luigi, etc.)
  • In depth experience with resource managers such as Yarn and Marathon
  • Data analytics experience with Pig, MR, Spark, Hive, R, etc.
  • Experience with scaling Splunk