Machine Learning and Computer Vision Engineer

Santa Clara Valley (Cupertino), California, United States
Machine Learning and AI


Role Number:200497145
The Video Computer Vision organization is working on exciting technologies for future Apple products. Building upon our previous successes in ML solutions such as FaceID and the RoomPlan framework utilizing LIDAR sensors, we are expanding our horizon to explore and develop state-of-the-art generative AI solutions. As part of the Spatial Perception Team (SPT), we are not only looking to understand scenes but also generate, recreate, and innovate them using deep generative models. In this role, you will be at the forefront of inventing generative algorithms with real-world applications, leading the development of both 2D and 3D generative models, and ensuring the efficient evaluation and performance understanding of these models.

Key Qualifications

  • Extensive knowledge in generative machine learning techniques, including but not limited to diffusion models, GANs, VAEs, and transformer architectures.
  • Experiences in vision-language models, multimodal machine learning, and image/video generation tasks.
  • Strong coding skills in python. C++ is a plus.
  • Expertise in visualizing generative models’ outputs and results.
  • Creativity and curiosity for solving highly complex problems
  • Excellent communication and collaboration abilities.


Designing and developing generative neural networks and algorithms – from training and evaluation to in-depth failure analysis. Building end-to-end pipelines that prioritize swift model evaluations and rapid iterations. Teaming up with algorithm engineers to deep-dive into intricate generative model challenges. Working cross-functionally to bring generative computer vision algorithms to real-world applications – this spans everything from data labeling to QA and implementation.

Education & Experience

MS with 3+ years industry experience, or PhD in computer science or related fields, especially with a focus on generative AI, vision-language models, and multimodal learning.

Additional Requirements

Pay & Benefits