Apple Pay - Site Reliability Engineer (SRE)
London, Greater London, United Kingdom
Software and Services
The Site Reliability Engineer position requires a mix of strategic engineering and technical operational experience. You will need an extensive background in supporting complex multi-tier systems at all levels from a monitoring and troubleshooting perspective. This engineer will be very customer focused and committed to delivering high quality experiences in line with business requirements. Your remit will involve hands-on site reliability engineering, but with a focus on improving efficiencies, replacing as far as possible manual tasks with automated solutions. The engineer will work closely with system engineers, network engineers, database administrators, developers, quality assurance, and information security teams to deliver high quality reliability across the platform. For this position, strict application security and high availability requirements must be balanced to achieve efficient solutions.
- Experience in similar hands on support roles.
- Understanding of web service API’s, Internet architecture, and software development.
- Experience with enterprise monitoring tools such as Splunk, Nagios, Graphite, ELK etc. highly preferred.
- Ideally some experience with deployment and automation tools such as Ansible/Puppet/Chef.
- Understanding of standard networking protocols and components such as HTTP, DNS, TCP/IP, ICMP & Load Balancing.
- Familiarity with ITIL methodologies in incident management, problem management and change control.
- Experience with Java highly helpful.
You'll be ambitious with a passion for perfection, quality and detail. This person will need to gain a hands-on understanding of key day-to-day operational support and change tasks, working closely with fellow engineers to provide efficient solutions and produce high quality work. • Define and run incident management, problem management and change control. • Support the release of new services, through capacity planning, rollout planning and release management. • Collaborate with all partners to improve and define monitoring processes, SLA’s and critical metrics across the platform. • Help design and build automation tooling and supporting services. • Work directly with global SRE teams. • Supporting a 24x7 online environment as part of an on-call rotation.
Education & Experience
• BS or higher in a technical field or meaningful experience.
- • Ability to thoroughly understand custom solutions down to individual API calls.
- • A desire to ensure the best possible customer experience at all times.
- • A passion to identify processes that can be automated and improved.
- • Able to work with existing toolsets, and optimize their usage.
- • Set priorities and work efficiently in a fast-paced environment.
- • Explore and evaluate new technologies and solutions to push the capabilities forward, getting ahead of customers’ needs, innovate and continually improve.
- • Strong interpersonal skills and ability to work effectively across multiple business and technical teams.
- • Demonstrate ability to deliver results on time.