Service Reliability Engineer - AMP Analytics Applications & Big Data

Bengaluru, Karnataka, India
Software and Services

Summary

Posted:
Weekly Hours: 40
Role Number:200156763
The Site Reliability Engineer (SRE) position requires a mix of strategic engineering and design along with hands-on, technical work. A successful candidate will have experience in being a Systems Administrator that has moved on to DevOps/Automation in their career. The SRE will configure, tune, and troubleshoot multi-tiered systems to achieve optimal application performance, stability and availability. The SRE will work closely with the systems engineers, network engineers, database administrators, monitoring team, and information security team. For this position, strict application security and high availability requirements must be balanced to achieve optimal solutions. The successful candidate will be highly self-motivated with a passion for excellence, quality and detail. The SRE will not only support operations, but also work closely with the development engineers within the team to aid in architectural design and assist with the implementation of complex features.

Key Qualifications

  • 8+ years of managing services in a large scale *nix environment.
  • Prov of DNS, Load Balancing, TCP/IP, SSL and Linux.
  • Proficient in scripting languages like Perl, Python, Shell etc.
  • Deep understanding and experience in one or more of the following Docker, Mesos, AWS, Ansible, Puppet, Chef.
  • Deep understanding of J2EE application servers.
  • Experience and understanding on Scaling, Capacity Planning and Disaster Recovery is important.
  • Should have On-call experience.
  • Experience using monitoring solutions like SNMP, Nagios, Zabbix etc.
  • Familiarity using Splunk, other log aggregation tools.
  • Experience with software, frameworks and APIs.
  • Nice to have - Experience handling Big Data Environment like Kafka, Hadoop, Spark, Cassandra, ELK etc.

Description

Responsibilities of the SRE include the following Engage and improve life cycle of service from inception and design, to deployment, operation, migration and sunsets. Ensure Service level SLAs are met, Experience working with different teams to coordinate and execute high level projects. Write, review and develop code and documentation that solves the hardest problems that live on some of the largest and most complex systems in the World. Passion for quality and automation, an ability to understand complex systems and a desire to constantly make things better. Set priorities and work efficiently in a fast-paced environment - Measure and optimize system performance - Strong interpersonal skills. Demonstrate ability to deliver results on time with high quality.

Education & Experience

Prefers a BS in engineering, computer science or other technical disciplines plus 5 years of related experience.

Additional Requirements