AI/ML - Infrastructure Services - Senior Site Reliability Engineer, ML Platform & Technology

Portland, Oregon, United States
Machine Learning and AI


Weekly Hours: 40
Role Number:200337221
This is an exciting opportunity for a Senior Systems Engineer / SRE to Join the AI/ML team at Apple. We are looking for an experienced SRE to join a new team who understands and believes in the concept of infrastructure as code. The successful candidate will focus on developing and designing solutions to solve highly complex issues in a large scale, distributed system environment. The team is multi-disciplined.

Key Qualifications

  • 10+ years of work experience in system administration
  • Expert knowledge of the Linux operation system (OS, networking, process level)
  • Fluent in at least one scripting language (Shell, Python, Ruby, etc.)
  • Strong verbal and written communication skills
  • Passionate about being a part of a tight-knit Operations team


The team will be responsible for maintenance and delivery of infrastructure services. These services are key to the development and production process at AI/ML. This team works very closely with other teams across AI/ML, being the operational subject matter experts, and offering guidance and advice enabling them to improve their services. A successful candidate will likely have experience as a Systems Administrator that has moved on to development and automation in their career. Help operate Apple’s largest infrastructure supporting millions of AI/ML customers. Troubleshoot complex issues across the entire stack. Advise other teams (within and outside of AI/ML) on technical direction. Make changes to our environment with the purpose of pushing AI/ML services to the next level.

Education & Experience

BS/MS/PhD in Computer Science or related field

Additional Requirements