Siri - Senior Data Infrastructure Engineer

Cambridge, Cambridgeshire, United Kingdom
Machine Learning and AI


Role Number: 200010825
Would you like to play a part in the next revolution in human-computer interaction? Join the Siri team at Apple. Contribute to a product that is redefining mobile computing. Build groundbreaking technology for artificial intelligence. Work with the talent dense team who built the intelligent assistant that helps millions of people get things done — just by asking. As a member of the Data Infrastructure Engineering team within the Siri Production Engineering organization, you will be faced with highly complex issues in a large scale, distributed system environment. In order to ensure a reliable and rewarding Siri experience, you will be empowered to develop and design new solutions heavily focused on data infrastructure optimized for our scale. We are also responsible for ensuing that data privacy is an integral part of everything we build. We look for Software Engineers with a passion for data in both the Operations and Development space to bring these scalable high quality solutions to production at a rapid pace.

Key Qualifications

  • In depth expertise Big Data technologies (Hadoop, Hbase, Spark, Kafka, Solr, JupyterHub, Hive, HDFS, etc)
  • You have proven experience in large scale production data and systems environments
  • Ability to tackle difficult Hadoop ecosystem scalability and performance issues
  • Experience designing and building Hadoop ecosystem tooling that promote scalable and performant operations
  • You have proven experience in one or more object-oriented programming languages (Scala, Java, C++)
  • Fluent in at least one scripting or systems programming language (Python - preferred, Ruby, Bash, Go, Rust, Crystal etc.)
  • Knowledge of the Linux operating system (OS, networking, process level), and in Mesos or Kubernetes
  • Interest in DevOps style engineering teams - we operate what we build!
  • Strong verbal and written communication skills
  • You are passionate about data and in being a part of a tight-knit Data Privacy Infrastructure Engineering team
  • Thorough knowledge of macOS and iOS is helpful
  • Ability to stay focused and prioritize a heavy workload while achieving exceptional quality
  • You are upbeat, adaptable, and results oriented with a positive attitude
  • You bring passion and dedication to your job and are committed to our vision and supporting the developer community


The Siri Data Infrastructure Engineering team manages data ingestion and the data storage platform used for analytics and machine learning of worldwide Siri events. Using internally built systems platforms, open source data platform tools, and purpose built solutions developed by our own team, the Data Infrastructure Engineering team members strive to build out a performant and scalable data platform at huge scale with high quality data that can be operated at by our relatively small team alongside the Siri Production Engineering Data Infrastructure SRE team. Our engineers not only work closely with other teams within the Siri Production Engineering team, but also with the development and analytics engineers within Siri. We build out data platform infrastructure for maximum efficiency, scalability and reliability to allow domain specific data scientists and machine learning engineers to focus on their specialties. A successful candidate will be someone who can actively take part in the design, build, and operation of our data and systems infrastructure. As a member of the Siri Production Engineering Team within Apple you will: - Work with open source big data technologies such as Spark, Kafka, Presto, Hbase, Solr, and Hadoop, alongside developing scalable large scale data infrastructure solutions that other teams can build their domain specific solutions upon - Develop data platform services that enhance how we operate our data platform, store our data securely, ensure data privacy, and enhance ease of use for everyone that makes use of our data - Ensure our systems and data platform offers reliable high quality data with consistent SLAs - Troubleshoot complex issues across the entire stack - Create automation frameworks to manage and optimize user access patterns for our Hadoop cluster and other data infrastructure - Advise other teams (within and outside of Siri) on technical direction - Help grow our data environment with the purpose of pushing Siri to the next level of scale and stability

Education & Experience

BS,MS, or PhD degree in Computer Science, EE, Physics, or other technical discipline and 5+ years of building data pipelines experience

Additional Requirements

  • - Strong Scala and/or Java expertise with JVM performance tuning/optimization
  • - Experience with Kubernetes or Mesos compute platforms
  • - Workflow and data pipeline orchestration experience (Oozie, Airflow, Pinball, Jenkins, Luigi, etc.)
  • - In depth experience with resource managers such as Yarn and Marathon
  • - Experience with Microservices oriented systems and framework development
  • - Strong statistics and/or machine learning knowledge
  • - Open source code involvement such as taking part as an Apache PMC and/or Committer