Apple Media Products (AMP) Site Reliability Engineer (SRE) - Hadoop
London, Greater London, United Kingdom
Software and Services
The Service Reliability Engineer (SRE) role in Apple Media Products (AMP) requires a mix of strategic engineering and design along with hands-on, technical work. If you have experience in being a Systems Administrator and has moved on to DevOps/Automation we should talk. This SRE will configure, tune, and troubleshoot multi-tiered systems to achieve optimal application performance, stability and availability. We work closely with the systems engineers, network engineers, database administrators, monitoring team and information security team. For this position, strict application security and high availability requirements should be balanced to achieve the best solutions. If you are ambitious with a real passion for excellence, quality and detail look no further. If you enjoy closely partnering with the development engineers in addition to working on support operations then this the role for you. Here you will work within the team to aid in architectural design and assist with the implementation of complex features.
- 2-5+ years of managing services in a large scale *nix environment.
- Proven understanding of DNS, Load Balancing, TCP/IP, SSL and Linux.
- Proficient in scripting languages like Perl, Python, Shell etc.
- Deep understanding and experience in one or more of the following - Docker, Mesos, AWS, Ansible, Puppet, Chef.
- Deep understanding of J2EE application servers.
- Experience and understanding on Scaling, Capacity Planning and Disaster Recovery is important.
- Should have On-call experience.
- Experience using monitoring solutions like SNMP, Nagios, Zabbix etc.
- Familiarity using Splunk, other log aggregation tools.
- Experience with software, frameworks and APIs.
- Nice to have - Experience handling Big Data Environment like Kafka, Hadoop, Spark, Cassandra, ELK etc.
- Engage and improve life-cycle of service from inception and design, to deployment, operation, migration and sunsets. - Ensure Service level SLAs are met. - Experience working with different teams to coordinate and execute high level projects. - Write, review and develop code and documentation that solves the hardest problems that live on some of the largest and most complex systems in the world. - Real passion for quality and automation, an ability to understand complex systems and a desire to constantly make things better. - Set priorities and work efficiently in a fast-paced environment - Measure and optimize system performance. - Strong interpersonal skills. - Demonstrate ability to deliver results on time with high quality
Education & Experience
BS in engineering, computer science or other technical disciplines plus 2-5+ years of related experience.