AI/ML - Data Engineer (NLP/Speech), Siri and Information Intelligence

Cambridge, Cambridgeshire, United Kingdom
Machine Learning and AI


Role Number:200534060
Play a part in the next revolution in human-computer interaction. Contribute to a product that is redefining mobile computing. Create groundbreaking technology for large scale systems, natural language, big data, and artificial intelligence. And work with the people who created the intelligent assistant that helps millions of people get things done — just by asking. Join the Siri Response / Text-to-Speech (TTS) team at Apple. Our team is looking for exceptional data engineers passionate about delivering delightful customer experiences with Siri voices. As Data Engineer (NLP/Speech), you'll work on building and maintaining text and speech datasets, processes and workflows for our TTS systems.


Apple is hiring data engineers for the Siri Response / Text-to-Speech (TTS) team. You'll be working at the frontier of AI, processing massive amounts of speech and text data for our TTS systems. You'll work closely with fellow engineers to gather and integrate new speech and text data into our repositories, transforming raw data into formats usable for TTS model training, and making datasets available to partner teams in Apple to power Siri's voice. Your responsibilities will include: * Collect and centralize data from various sources, working with internal privacy, legal and modeling teams * Build processes and workflows that support data transformation for TTS systems (e.g. audio processing and text annotation), based on the needs and requirements of modeling teams * Provide datasets to partner teams, managing access or usage control * Create dashboard for interactive data exploration * Develop tools and tests to ensure quality and help diagnose issues * Perform analysis on external and internal processes and data to identify opportunities for improvement * Develop prototype ML models utilizing in-house toolkits If this sounds like you, we'd love to hear from you!

Minimum Qualifications

Key Qualifications

  • 5+ years’ industry experience processing large-scale text/speech datasets for ML applications
  • Strong expertise in Python, (NoSQL) databases, cloud-based data technologies, and working with large datasets and pipelines
  • Experience in tooling and streamlining workflows in complex processes
  • Highly-motivated, creative, organized and a strong problem solver
  • Outstanding spoken and written communication skills

Preferred Qualifications

Education & Experience

MS / PhD in Computer Science or a related field

Additional Requirements

  • * Experience in working with natural language data, lexical resources, corpora, NLP algorithms and tools is a plus
  • * Experience in machine learning, natural language processing, machine translation or text-to-speech is a plus
  • * Knowledge of one or more foreign languages is a plus