AIML - Engineering Manager, ML Infrastructure & Frameworks

Seattle, Washington, United States
Machine Learning and AI

Summary

Posted:
Role Number:200546818
The Data Platform team within the AIML organization powers analytics, experimentation, and ML feature engineering to power Siri, Search, and other ML features we all love in our Apple ecosystem. The mission of the Data Platform org is to provide our engineers and data scientists with an innovative, reliable, secured, and easy-to-use infrastructure for ingesting, storing, processing, and interacting with data and ultimately help the teams that build data-intensive applications be successful. You will work with many cross-functional teams and own the planning, execution, and success of technical projects with the ultimate purpose of improving Siri and Search experience for Apple customers. We are looking for an Engineering Manager for the ML acceleration team that scales, manages and optimizes the infrastructure (Compute, Storage & Networking) for large scale ML training and inference. Come join us and be part of the Data Platform journey.

Key Qualifications

  • 8+ years of software development experience
  • 3+ years leading ML infrastructure engineering teams
  • Experience with commercial and/or open-source large-scale data processing, ML training and inference, storage frameworks, and platforms such as Apache Spark, Apache Flink, Ray, etc
  • Experience optimizing frameworks and infrastructure for large-scale ML training on GPU-accelerated hardware
  • Experience with frameworks such as Deepspeed, Horovod, Hugging Face Accelerate, PyTorch Lightening
  • Experience in managing and optimizing ML platforms and infrastructure
  • Strong organizational skills and experience working on large multi-functional teams
  • Experience in influencing and driving key product innovations and opportunities across diverse collaborators
  • Passionate about operational excellence through proper automation and engineering processes
  • Strong distributed systems and engineering background
  • Superb problem-solving skills and ability to thrive in a fast-paced and dynamic environment.

Description

Join Apple's Data Platform team as an Engineering Manager to deliver the best experiences across Siri, Spotlight, and Safari. You will be responsible for defining and driving the roadmap for multi-GPU acceleration for ML training for our data platform, offering the best infrastructure across our stack at Apple scale. You will collaborate with cross-functional teams of innovative software engineers, product managers, and engineering managers to continually improve our efficiency and training performance. We embrace the use of open-source technologies, including Kubernetes and Spark, Flink, Trino, and Iceberg, for data processing. RESPONSIBILITIES INCLUDE: - Define and drive technical vision, roadmap, and strategy for our platform - Guide the design and development of new AI and ML acceleration frameworks and tools - Participate in product design reviews to ensure efficient and secure use of ML infrastructure - Collaborate with stakeholders and cross-functional leaders in engineering, product, and operations across Apple to ensure the correct adoption of our data and ML platform - Lead and mentor new hires or junior engineers - Provide guidance and establish processes to ensure engineering excellence, efficiency, and operational sustainability of our platform - Foster a healthy, inclusive, collaborative, and technology-driven culture

Education & Experience

B.S or M.S. Degree in Computer Science/Engineering, or equivalent work experience

Additional Requirements

  • Bonus if you have experience in the following areas:
  • - Working with or developing Large-language models (LLMs)
  • - Working with ML frameworks such as PyTorch, TensorFlow, Jax
  • - Working with streaming data processing frameworks such as Apache Flink, Kafka Streams, Spark Streaming
  • - Developing and optimizing algorithms that run efficiently on resource-constrained platforms
  • - Designing, implementing, and benchmarking/fine-tuning ML/deep learning algorithms
  • - Working with GPU computing or ML modeling frameworks.

Pay & Benefits