AI/ML - Sr Data Pipeline Engineer, Siri Data

Santa Clara Valley (Cupertino), California, United States
Machine Learning and AI


Weekly Hours: 40
Role Number:200180306
Would you like to play a part in the next revolution in human-computer interaction? Contribute to a product that is redefining mobile and desktop computing, and work with the people who built the intelligent assistant that helps millions of people get things done — just by asking? The vision for the Siri Data organization is to improve Siri by using data as the voice of our customers. As data pipeline engineer you will build complex & scalable data pipelines. You will work with one of the most exciting high performance computing environments, with petabytes of data, millions of queries per second, and have an opportunity to imagine and build products that delight our customers every single day.

Key Qualifications

  • 5+ years experience working with Spark or other big data architectures (Hadoop, Mapreduce) in high-volume environments.
  • Experience building and managing ETL pipelines from inception to production rollout.
  • Experience with object-oriented/object function scripting languages: Python and Scala.
  • Experience with workflow management tools: Airflow, Oozie, Azkaban, etc.
  • Experience with configuration management & Monitoring : Splunk, Grafana, Prometheus, Nagios, puppet.
  • Experience supporting hosted services in a high-volume customer facing environment.
  • Experience with SQL and basic database knowledge for modifying queries and tables.
  • Experience with CI / CD: Teamcity and Jenkins.


The Siri Data Platform team is in an unusual position to align our quality initiatives to a singular platform. You can help us architect highly scalable distributed data system. A part of the job is to ensure the Operational SLA for data generation & availability across the data org in Siri. In order to achieve these things you would have experience in : Writing Tools / Dashboard for operational excellence. Contribute to our monitoring & Alerting framework. Develop & contribute to Open source projects (Apache Spark, Apache Druid). Constantly evolve our pipelines & question the status quo. Ensure the platform can handle all types of robust data exploration in real-time. Partner with different teams and build features to improve data analysis.

Education & Experience

Bachelor’s Degree or foreign equivalent in Computer Science, or related field, or equivalent experience.

Additional Requirements