AI/ML - Sr. Data Engineer, Siri Understanding

Seattle, Washington, United States
Machine Learning and AI


Weekly Hours: 40
Role Number:200224498
Apple’s AI/ML teams transform every Apple product, and because we fully integrate hardware and software, we can collaborate to deliver amazing experiences while protecting user data. Siri is the voice of Apple products and its launch was a defining moment in the history of Artificial Intelligence. Hundreds of millions of people now use Siri to send a message, play their favorite song or even take a selfie. We’re looking for a hardworking individual able to improve the processing, analysis and preparation of huge data sets used to train Siri’s machine learned models. We love big-data technologies, including, data exploration, visualization, distributed processing, and applications at scale. We also believe that Privacy is a fundamental and universal human right and embrace this philosophy in our day-to-day work. If you share these values and are capable of applying cutting-edge data science, you'll improve Siri’s ability to understand users. In this role, you'd also analyze user behavior, prepare data sets and enable Siri to further embrace privacy-preserving machine learning. This role sits on the Private Learning team, which is a newly-minted team nested within Siri Understanding and Apple's overall AI/ML organization.

Key Qualifications

  • 3+ years of industry experience working with distributed data technologies (e.g. Hadoop, MapReduce, Spark, Flink, Kafka, etc.) for building efficient & large-scale data pipelines
  • Software Engineering proficiency in at least one high-level programming language (Java, Scala, Python or equivalent). Experience required in building batch or streaming data processing pipelines curating data for data science consumers.
  • You are curious and tenacious: always learning new technologies, rapidly synthesizing new information, and understanding “the why” before “the what.”
  • You are self-directed and capable of operating amid ambiguity.
  • You are poised and display excellent judgment in prioritizing across difficult tradeoffs.
  • You are pragmatic: not letting “the perfect” be the enemy of “the good.”
  • You are humble and growth-minded, continually improving in self-awareness and collaboration


The goal of Siri's Machine Learning teams is to take Siri to the next level of intelligence and accuracy while preserving user privacy. Siri processes more than a billion requests every week and good data is at the heart of this engine. Some of the things you will focus on include: - Moving between understanding the open & unanswered questions about Siri; to defining new metrics and filters; to specifying new logging necessary with the high-level goal of using data to improve Siri - Designing, creating, and maintaining data pipelines that populate a petabyte scale data warehouse. Working with data infrastructure teams providing input to improve our platform - Writing high-quality code, and using best practices to make sure the systems and pipelines are healthy - Partnering with your teammates across your immediate organization to answer questions, to provide support, and to innovate in making our data pipeline usable across Siri

Education & Experience

MS or BS in CS Engineering, Math, Statistics, or a related field OR equivalent practical experience in data engineering

Additional Requirements

  • - Experience building systems with Private Federated Learning or Differential Privacy techniques is a strong plus
  • - Experience in data visualization using Superset / Tableau / Druid or equivalent is a plus