Site Reliability Engineer

Sunnyvale, California, United States
Operations and Supply Chain

Summary

Posted:
Weekly Hours: 40
Role Number:200566279
Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. The people here at Apple don’t just create products — they create the kind of wonder that’s revolutionized entire industries. It’s the diversity of those people and their ideas that inspires the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Join Apple, and help us leave the world better than we found it. Apple's Manufacturing Systems & Infrastructure (MSI) team is responsible for gathering, consolidating and tracking all manufacturing data for Apple’s products and modules worldwide. This data is used throughout the company and the product's lifecycle, from the very beginning, to validate that units being built are fully tested and of high quality before leaving the factory, all of the way through to warranty support for customers. As a Senior Site Reliability Engineer, you will play a critical role in maintaining and enhancing the reliability of our production systems. You will collaborate with engineering teams to design, implement, and monitor infrastructure and services, employing your expertise in automation and performance optimization.

Description

Design, develop, and maintain scalable, reliable, and efficient infrastructure. Implement monitoring, alerting, and logging systems to ensure the health and performance of applications. Automate repetitive tasks and improve system efficiency through scripting and tool development. Collaborate with development teams to improve service reliability and promote best practices in software development and deployment. Conduct root cause analysis of system failures and implement corrective actions to prevent recurrence. Participate in on-call rotations and respond to incidents, minimizing downtime and impact on users. Drive continuous improvement initiatives to enhance system performance, scalability, and reliability. Mentor and provide guidance to junior team members, fostering a culture of learning and innovation.

Minimum Qualifications

  • 7+ years of experience in site reliability engineering, DevOps, or a related field.
  • Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.

Key Qualifications

Preferred Qualifications

  • Strong experience with cloud platforms: AWS, Google Cloud Platform, or Microsoft Azure.
  • Proficiency in infrastructure as code tools: Terraform, Ansible, or CloudFormation.
  • Expertise in containerization and orchestration: Docker, Kubernetes and HELM.
  • Experience with CI/CD pipelines and tools: Jenkins, ArcoCD.
  • Strong scripting and programming skills: Python, Go, Shell, or Ruby.
  • In-depth knowledge of monitoring and observability tools: Prometheus, Grafana, Open Telemetry, Splunk
  • Familiarity with version control systems: Git
  • Solid understanding of Linux/Unix system administration and networking.
  • Excellent problem-solving skills and a proactive approach to incident management.
  • Experience with database management and optimization: MySQL, PostgreSQL, or NoSQL databases like MongoDB and Cassandra.
  • Knowledge of message brokers and streaming platforms: Kafka, RabbitMQ, or Amazon Kinesis.

Education & Experience

Additional Requirements

Pay & Benefits

  • Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.