Site Reliability Engineer - Apple Media Products
Santa Clara Valley (Cupertino), California, United States
Software and Services
Imagine what you could do here. At Apple, new ideas have a way of becoming great products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Apple Media Products SRE team is looking for an experienced Site Reliability Engineer to help take care of our high-performance low latency transaction processing platform. Our SRE team combines software and systems engineering and system administration practices to build and run large-scale, massively distributed, fault-tolerant systems. Be part of a team working on challenging and fast-paced projects and in addition work closely with the development teams to aid in the architectural design and assist with the implementation of complex new features.
- Demonstrated a systematic, test-and-measure driven approach to continually improving service levels.
- Consistent track record of troubleshooting and resolving issues in live production environments and implementing strategies to eliminate them.
- Can code using Python, Java, bash or similar languages.
- Being able to effectively use a relational database and SQL queries.
- Strong grasp of Linux systems, networking, and security.
- Experience with monitoring tools such as Splunk, Nagios.
- Excellent communication skills to work and collaborate with development teams.
- Comfortable working with large-scale server deployments.
- Strong ability and enthusiasm to learn new technologies in a short time.
- We seek a self starter, visionary person with strong leadership capabilities.
- Extraordinary communication skills, for collaborating across many participating teams.
- You will interact with many other group’s internal team to lead and deliver best-in-class products in an exciting fast-paced environment.
- Dynamic, smart people and inspiring, innovative technologies are the norm here. Will you join us in crafting solutions that do not yet exist?
Support and maintain services by measuring and monitoring availability, latency, and overall system health. Develop, manage and support SRE tools and applications. Engage in improving the whole lifecycle of services from inception through deployment, operations, and refinement. Analyze logs and telemetry data by writing monitoring and automation code. Provide OnCall support to 1st level production support teams. Provide hands-on technical expertise during service impacting events. Collaborate with other engineers on code reviews, internal infrastructure improvements and process enhancements.
Education & Experience
BS degree in computer science or equivalent field with 5+ years or MS degree with 3+ years experience, or equivalent.