Site Reliability Engineer - Content Processing Team
Santa Clara Valley (Cupertino), California, United States
Software and Services
The Site Reliability Engineer (SRE) position requires a mix of strategic engineering and design along with hands-on, technical work. A successful candidate will have experience in being a Systems Administrator that has moved on to DevOps/Automation in their career. The SRE will configure, tune, and troubleshoot multi-tiered systems to achieve optimal application performance, stability and availability. The SRE will work closely with the systems engineers, network engineers, database administrators, monitoring team, and information security team. For this position, strict application security and high availability requirements must be balanced to achieve optimal solutions.
- - 5+ years of managing services in a large scale *nix environment
- - Experience with DevOps tools, processes, and culture. Experience with Puppet, Chef or Ansible is a plus
- - Experience with AWS
- - A systematic, test-and-measure approach to continually improving service operations
- - Understanding of standard networking protocols and components such as: HTTP, DNS, TCP/IP, ICMP and Load Balancing.
- - Experience with configuration management tools
- - Practical knowledge of shell scripting and at least one scripting language (i.e. Perl, Python, PHP)
- - Strong hands-on knowledge in Unix/Linux environment
- - Strong understanding of J2EE application servers
- - Track record of practical problem solving, excellent communication, and documentation skills
- - Experience with monitoring tools such as Nagios, Splunk highly preferred.
- - Good understanding of Java is a major plus
- - Working knowledge of Oracle Database is a plus
- - Understanding of cryptography is a plus
The successful candidate will be highly self-motivated with a passion for excellence, quality and detail. The SRE will not only support operations, but also work closely with the development engineers within the team to aid in architectural design and assist with the implementation of complex features. Responsibilities of the SRE include the following: - Passion for quality and automation, an ability to understand complex systems and a desire to constantly make things better. - Determine optimal configurations for application software, application servers (i.e. JBoss, Tomcat, etc.), database connections and indexes, HSM drivers, etc. - Develop and maintain scripts used for environment monitoring and task automation (Perl, Shell, PHP, etc.) - Deploy, support and monitor new platforms and application stacks - Set priorities and work efficiently in a fast-paced environment - Measure and optimize system performance - Plan and manage capacity of the systems - Explore and evaluate new technologies and solutions to push the capabilities forward, getting ahead of customers’ needs, innovate and continually improve - Strong communication skills and ability to work effectively across multiple business and technical teams - Demonstrate ability to deliver results on time with high quality.
Education & Experience
Prefer a BS in engineering, computer science or other technical disciplines plus 5 years of related experience.
- Python, OEL 6, Netscaler, ESXi, Isilon, netapp systems, Cisco familiarity