Site Reliability Engineer (SRE), Cloud Services
Santa Clara Valley (Cupertino), California, United States
Software and Services
The Cloud Services SRE team is looking for Site Reliability Engineers to build and run the services that hundreds of millions of customers use every day. We are hiring high quality engineers with a diverse set of experiences and skill sets for positions on Apple’s primary cloud offering: Cloud Services. The best candidates will have both demonstrated Software Development skills and strong Linux / Systems expertise. Our customers count on us to provide extraordinary availability, scalability and security for services. As an SRE in Cloud Services, you'll be on a team of DevOps warriors whose mission it is to build and improve Apple's most critical internet services. We're looking for a hardworking and passionate person to join this amazing team, if you feel this is you, we'd love to hear from you. The Fraud, Engineering, Algorithms and Risk group under Cloud Services is looking for a Site Reliability Engineer (SRE). As a SRE in Cloud Services, you'll be on a team whose mission is to build and improve engineering services responsible for combating fraud and abuse for Internet Software and Services at Apple. In this role, you will be tasked with building mission-critical, robust and scalable distributed systems that can keep pace with data across a number of high-profile and large-volume Apple cloud properties. You will support engineering in building the next-generation libraries, platforms, and data pipelines to empower us to rapidly build and deploy complex models to production.
- Deep understanding of scaling Big Data Architectures and Operations including: monitoring, resilience, maintainability, and performance.
- Expert knowledge in Kafka, Hadoop, Spark and other Big Data infrastructure.
- Strong sense of ownership, customer service, and integrity demonstrated through clear communication and positive action.
- Passion for eliminating repetitive manual processes using automation.
- Proficient in tool development using one or more programming languages like: Java, Ruby, Python, Perl and C.
- Familiarity with Machine Learning workflows and how to optimize them for high performance data modeling.
- Experience managing applications running on private and public cloud platforms.
- Deep understanding of the Linux Operating System, including: Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, Signals.
- Proclivity towards data-driven programming and operations.
- Understanding of standard networking protocols and components such as: HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies.
Apple Cloud Services runs are massive. Operating across geographically dispersed data centers and multiple cloud providers and servicing over a billion users presents unique challenges. As an SRE @ Apple, you'll need to solve these problems using data, collaboration, and your own expertise. SREs @ Apple own the full infrastructure stack; from device driver performance debugging to content delivery network traffic management, our responsibilities are both broad and deep. Apple Cloud Services runs the majority of its systems on Linux. We run a mix of open source and internally developed tools for system & configuration management, provisioning, software deployment, and monitoring. You'll learn these tools and have opportunities to improve them. Our team embodies a "Startup" mentality; fostering a strong entrepreneurial spirit. If you have a better solution to a problem; document a strategy for improvement, advocate for your strategy through persuasion and socialization efforts, then carry it through to completion. Good ideas are heard and results are rewarded. RESPONSIBILITIES: - Deploy, support and monitor new and existing services, platforms, and application stacks. - Use scale testing to measure, tune and optimization system performance. - Architect, author and deliver software to improve the availability, scalability and security of Apple's internet services. - Build and manage systems, infrastructure and applications through automation. - Participate in periodic on-call duties.
Education & Experience
- BS in Computer Science or related field, or equivalent employment