AI/ML - Sr Data Engineer, Siri Data

Santa Clara Valley (Cupertino), California, United States
Machine Learning and AI


Weekly Hours: 40
Role Number:200205765
Would you like to play a critical part in the next revolution of human-computer interaction? Would you like to contribute to the advancement of a product that is globally redefining how humans use voice to relate to technology? The Siri Data organization seeks to improve Siri by using data as the voice of our customers. Within Siri Data, the mission of Siri data engineering is to build the scalable & high quality data sets that curate the data required to give our customers their voice. We’re looking for exceptional data engineers who are passionate about our product and values; who love working with data at scale; and who are committed to that hard work necessary to continuously improve. As a part of this group, you will work with petabytes of data daily using diverse technologies like Spark, Flink, Kafka, Hadoop and others. You will be expected to effectively partner with upstream engineering teams and downstream analytical & product consumers.

Key Qualifications

  • You have excellent written and verbal communication skills.
  • You are tenacious, relentless, & determined
  • You are curious: always learning new technologies, rapidly synthesizing new information, and understanding “the why” before “the what.”
  • You are self-directed and capable of operating amid ambiguity.
  • You are poised and display excellent judgment in prioritizing across difficult tradeoffs.
  • You are pragmatic: not letting “the perfect” be the enemy of “the good.”
  • You are humble, continually growing in self-awareness and possessing a growth mindset


WHAT WILL FILL YOUR DAYS: Moving between understanding the open & unanswered questions about Siri; to defining new metrics and filters; to specifying new logging necessary with the high-level goal of using data to improve Siri. Designing, creating, and maintaining data pipelines that populate a petabyte scale data warehouse. Working with data infrastructure teams providing input to improve our platform. Working with data producing teams to specify requirements and to transparently provide rapid feedback. Partnering with your teammates across Siri data to answer questions, to provide support, and to innovate in taking our data warehouse to the next level.

Education & Experience

Surprise us! Many will have an MS or BS in CS, Engineering, Math, Statistics, or a related field OR equivalent practical experience in data engineering. 4+ years of industry experience working with distributed data technologies (e.g. Hadoop, MapReduce, Spark, Flink, Kafka, etc.) for building efficient & large-scale data pipelines. Software Engineering proficiency in at least one high-level programming language (Java, Scala, Python or equivalent). Experience required in building batch data processing pipelines curating data for data science consumers. Experience strongly preferred building stream-processing applications using Apache Flink, Spark-Streaming, Apache Storm, Kafka Streams or others.

Additional Requirements