Site Reliability Engineer (SRE), Logging
Santa Clara Valley (Cupertino), California, United States
Software and Services
The people here at Apple don’t just build products — they craft the kind of wonder that has revolutionized entire industries. It’s the diversity of those people and their ideas that inspires the innovation that runs through everything we do, from amazing technology to industry-leading environmental efforts. Imagine what you could do here. Join Apple, and help us leave the world better than we found it. A job at Apple is unlike any other you’ve had. The Apple Cloud Infrastructure team builds and provides systems that fuel Apple’s services (such as iCloud, iTunes, Siri, Maps). We are the foundation on which Apple’s software developers build the products that our customers love. We are looking for passionate and hardworking Site Reliability Engineers to continue our focus in providing the best Apple Services experience possible for our customers. Our systems have to scale globally, stay highly available, and "just work”. If you love designing, engineering and running systems that will help millions of customers, then this is the place for you!
- Strong sense of ownership and integrity demonstrated through clear communication and collaboration.
- Experience in managing and scaling distributed systems on infrastructure in a public, private, or hybrid cloud environment
- Strong coding and scripting skills in languages like Go, Python, Ruby or Java
- Acute drive to automate, to constantly replace manual operations with automated solutions
- Deep understanding of the Linux Operating System (including kernel, memory, process, threads, static/shared libraries, IPC, signals)
- Understanding of standard networking protocols and components such as: HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing
- Hands-on experience managing large numbers of diverse systems (with tools such as Puppet, Chef, Ansible, Spinnaker)
- Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
- Excellent troubleshooting and problem solving skills
- Experience with scale testing, disaster recovery and capacity planning
- Familiarity with microservices architecture and container orchestration with Kubernetes
The services that Apple Cloud Infrastructure runs are BIG. Operating at our scale, across multiple geographically dispersed data centers and servicing hundreds of millions of users presents unique challenges. As an SRE @ Apple, you'll need to solve these problems using data, teamwork, and your own expertise. SREs @ Apple own the full infrastructure stack; from device driver performance debugging to content delivery network traffic management — our responsibilities are both broad and deep. ACI runs the majority of its systems on Linux. We run a mix of open source, vendor licensed, and internally developed tools to perform functions such as system configuration management, provisioning, software deployment, logging, and monitoring. You'll learn these tools and have opportunities to improve them. Our team is collaborative; we work together and with the development teams we support to deliver the best results for Apple. We think rigorously and look for the best solution to the engineering challenges we face, while balancing the need to get things done. Good ideas are heard and results are rewarded.
Education & Experience
BS/MS in Computer Science or Equivalent (5+ years of software development or production operations experience in a large-scale environment)