Site Reliability Engineer - Solr

Santa Clara Valley (Cupertino), California, United States
Software and Services


Weekly Hours: 40
Role Number:200199582
Software Delivery Services & Infrastructure is focused on ensuring engineers at Apple can do amazing things and we are looking for a talented application Site Reliability Engineer (SRE) to join us in this mission. You’ll play a critical role in the day-to-day operations of services relied upon across Apple. You'll partner with engineering teams to ensure they're successful. You'll look for opportunities to innovate all the while driving for rock-solid operations. Responsibilities will include * Adopt and apply SRE best practices to services you support * Keep users, key stakeholders, and leadership informed through regular reporting and communications * Identify areas of automation for manual tasks/toil * Develop playbooks related to actionable alerts * Foster strong relationships with cross-functional teams * Participating in on-call rotations

Key Qualifications

  • A positive and respectful attitude
  • A passion for providing reliable services at scale, on bare metal as well as in cloud environments
  • Hands-on experience with Apache Solr
  • Expertise in Solr replication, failover, fault tolerance, and high availability
  • Good understanding of administration of Linux services
  • Experience using Prometheus, Grafana, and Splunk
  • Superb collaboration skills with excellent written and verbal communication


As part of Software Delivery Services & Infrastructure SRE, you will be responsible for delivering reliable services and driving projects to a successful outcome. This role will focus on operating and supporting Solr clusters used by Software Engineering. You will monitor SLOs, respond to incidents, troubleshoot issues, and ensure the service is up-to-date and secure. You will collaborate with engineering teams throughout the on-boarding process and ensure a smooth process by relying on accurate documentation you’ve created. To ensure your success, this job will provide you with: * Passionate and talented coworkers ready to collaborate, mentor, and learn from you * Ownership to drive meaningful improvements to the operational reliability of the services you manage * Opportunities to contribute to the best practices used by SRE teams within Software Delivery

Education & Experience

Additional Requirements

  • * Prior experience as an SRE, software engineer, or system administrator
  • * Proven ability to self-manage large projects and meet deadlines
  • * Experience in system automation technology, such as Puppet or Ansible