AI/ML - Infrastructure Services - Senior Site Reliability Engineer, ML Platform & Technology
Portland, Oregon, United States
Machine Learning and AI
This is an exciting opportunity for a Senior Systems Engineer / SRE to Join the AI/ML team at Apple. We are looking for an experienced SRE to join a new team who understands and believes in the concept of infrastructure as code. The successful candidate will focus on developing and designing solutions to solve highly complex issues in a large scale, distributed system environment. The team is multi-disciplined.
Key Qualifications
- 10+ years of work experience in system administration
- Expert knowledge of the Linux operation system (OS, networking, process level)
- Fluent in at least one scripting language (Shell, Python, Ruby, etc.)
- Strong verbal and written communication skills
- Passionate about being a part of a tight-knit Operations team
Description
The team will be responsible for maintenance and delivery of infrastructure services. These services are key to the development and production process at AI/ML.
This team works very closely with other teams across AI/ML, being the operational subject matter experts, and offering guidance and advice enabling them to improve their services.
A successful candidate will likely have experience as a Systems Administrator that has moved on to development and automation in their career.
Help operate Apple’s largest infrastructure supporting millions of AI/ML customers. Troubleshoot complex issues across the entire stack. Advise other teams (within and outside of AI/ML) on technical direction. Make changes to our environment with the purpose of pushing AI/ML services to the next level.
Education & Experience
BS/MS/PhD in Computer Science or related field