Multimodal Generative Modeling Research Engineer

Cupertino, California, United States
Machine Learning and AI


Weekly Hours: 40
Role Number:200526833
Do you believe generative models can transform creative workflows and smart assistants used by billions? Do you believe it can fundamentally shift how people interact with devices and communicate? We truly believe it can! We are looking for senior technical leaders experienced in architecting and deploying production scale multimodal ML. An ideal candidate has the ability to lead diverse cross functional efforts ranging from ML modeling, prototyping, validation and private learning. Solid ML fundamentals and an ability to place research contributions with respect to state of the art would be an essential part of the role. Experience with training and adapting large language models would be an important need. We are the Intelligence System Experience (ISE) team within Apple’s software organization. The team works at the intersection between multimodal machine learning and system experiences. System Experience (Springboard, Settings), Keyboards, Pencil & Paper, Shortcuts are some of the experiences that the team oversees. These experiences that our users enjoy are backed by production scale ML workflows. Visual Understanding of People, Text, Handwriting & Scenes, multilingual NLP for writing workflows & knowledge extraction, behavioral modeling for proactive suggestions, and privacy preserving learning are areas our multi disciplinary ML teams focus on. SELECTED REFERENCES TO OUR TEAM’S WORK: - ( - ( - (

Key Qualifications

  • Hands on experience training LLMs
  • Experience adapting pre-trained LLMs for downstream tasks & human alignment
  • Modeling experience at the intersection of NLP and vision
  • Familiarity with distributed training
  • Proficiency in ML toolkit of choice, e.g., PyTorch
  • Strong programming skills in Python, C and C++


We are looking for a candidate with a proven track record in applied ML research. Responsibilities in the role will include training large scale multimodal (2D/3D vision-language) models on distributed backends, deployment of compact neural architectures efficiently on device, and learning policies that can be personalized to the user in a privacy preserving manner. Ensuring quality in the wild, with an emphasis on fairness and model robustness would constitute an important part of the role. You will be interacting very closely with a variety of ML researchers, software engineers, hardware & design teams cross functionally. The primary responsibilities of the role would center on enriching multimodal capabilities of large language models. The user experience initiative would focus on aligning image/video content to the space of LMs for visual actions & multi-turn interactions.

Education & Experience

M.S. or PhD in Computer Science, or a related fields such as Electrical Engineering, Robotics, Statistics, Applied Mathematics or equivalent experience.

Additional Requirements

Pay & Benefits