Senior Data Engineer - System Intelligent and Machine Learning, ISE

Cupertino, California, United States
Software and Services

Summary

Posted:
Weekly Hours: 40
Role Number:200519155
Do you believe Machine Learning and AI can change the world? We truly believe it can! We are the Data Team of the System Intelligence and Machine Learning (SIML) group at Apple. We are responsible for building high quality ML datasets at scale, used to train ML models that power AI-centric features for many Apple products (iPhone, iPad, Mac, Apple Watch and even AirPods). Such features go from the smart wallpaper on your iPhone Lock Screen, to the models that highlight the faces of your loved ones in your Photos app, to input experiences (eg autocorrect, next word prediction, handwriting recognition). We’re looking for an exceptional software and data engineer who is passionate about Apple products and values; who loves working with data ops at scale, and who is committed to the hard work necessary to continuously improve our ML data pipelines. We invite you to join us at this exciting time. Grow fast and positively impact multiple critical features from your first day at Apple!

Key Qualifications

  • 5+ years of industry experience as a software and data engineer, with involvement in the data component of the ML lifecycle, and a solid understanding of applied machine learning topics
  • Proven experience as an engineer specialized in data engineering & pipelines
  • Proven experience in Python, or another modern programming language
  • Experience designing and building large scale data processing systems; keeping up to date with the latest technologies, comfortable performing benchmarks, prototyping and bringing new systems to production
  • The know-how to manage complex data projects while establishing and enhancing the right software engineering culture for our team
  • Experience in building data pipelines to process large scale datasets, using orchestration frameworks like Airflow, KubeFlow, or similar pipeline tools
  • Proficiency to design and lead a technical roadmap in alignment with R&D cross-functional teams with the capacity to influence other data infrastructure teams, and collaborate with members of our data Ops functions
  • A self-starter, able to handle ambiguity, identify risks, troubleshoot, and find the right people and tools to get the job done

Description

In this position, you will work with SIML Data functions and with ML teams to assess data engineering needs tied to shipping ML features. You will partner with and influence the roadmap of teams that build infrastructure blocks that we rely upon (eg storage & labeling platforms), in order to contribute to a best-in-class ML Data Engine. Our team of data engineers will use these systems to support end-to-end data flows tied to collection/annotation/QA operations, deliver high quality data quickly to ML teams, ensure traceability, versioning and lineage of data objects, and enforce compliance to contractual and regulatory obligations. As a software engineer specialized in data engineering, you are also expected to code and contribute to the stack. You will establish and execute the strategy for our organization’s Machine Learning Data Engine with an initial focus on agile ML Data OPs. This includes identification of infrastructure components and data stack to be used, design and implementation of pipelines between data systems and teams, automation workflows, data visualization and tools, data enrichment and monitoring tools.

Education & Experience

Bachelors, Masters or PhD in Computer Science, Mathematics, Physics; or a related field, or equivalent practical experience.

Additional Requirements

  • - Prior experience in large language models, or generative AI is desired
  • - Experience mentoring and growing engineers is desired
  • - Solid understanding of either NLP or Computer Vision is a plus
  • - Experience with ETL frameworks like Airflow is a plus
  • - Kubernetes and Docker experience is a plus

Pay & Benefits