Site Reliability Engineer (SRE) - Cloud Technologies

Austin, Texas, United States
Software and Services

Summary

Posted:
Weekly Hours: 40
Role Number: 200115602
This position can be located in Santa Clara Valley (CA) or Austin (TX) This role will be responsible for designing, building, running, and monitoring public & private cloud infrastructure to support a variety of mission critical services. This is a highly technical, hands-on role that requires expertise supporting systems at enterprise scale. The candidate will deliver innovative solutions in key areas: Engineering - Continuously optimize secure, scalable and performant security tools and services Reliability - Drive fault detection and correction, performance and uptime at global scale Monitoring - Instrument systems to gain visibility and understanding of how they are performing at any time Automation and orchestration to enable - Accelerated infrastructure, application and software configuration deployment - Automated response to alerts or indicators of performance issues - Infrastructure as code

Key Qualifications

  • 5+ years of managing services in a distributed, mission critical *nix environment
  • Experience supporting infrastructure and services in public and private cloud environments
  • Expertise with monitoring or log aggregation tools (Prometheus, Splunk, ELK, etc.)
  • Experience building and supporting containerized application technologies including Docker, Kubernetes
  • Familiarity with CI/CD tools and deployment processes
  • Working knowledge of network protocols and network based services, including routing and network load balancing
  • Failure Testing and Chaos Engineering
  • Experience with virtualization technologies
  • Solid understanding Linux/Unix system internals, including kernel tuning
  • Solid understanding of storage systems, including network filesystems
  • Proficient with various programming languages such as Python/Java/Ruby/Perl/Go/Makefile for building automation or integration with APIs
  • Solid understanding and experience with centralized configuration management, coordination and provisioning technologies, such as Ansible, Chef, Puppet, etc.
  • Excellent communication skills, must be capable of working with cross functional technical and business teams and varying levels of management
  • Experience implementing and working with open source projects
  • Understanding of Agile methodologies like Scrum and be able to work in fast-paced environment
  • Strong project management skills, including excellent presentation skills
  • Must be capable of writing detailed solution specifications, diagrams, best practices/standards documentation, operating procedures, test plans/test reports, etc.
  • Understanding of Agile methodologies like Scrum and be able to work in fast-paced environment

Description

- Build, engineer and support cloud platform IaaS and PaaS services - Partner with application teams to provision scalable workloads reliably across distributed compute resources - Provide engineering and operational support for distributed systems and network based information security tools, including for configuration management and provisioning - Implement and maintain security controls - Work closely with development teams to understand application performance and behavior patterns to proactively monitor, tune and correct issues before they occur - Identify opportunities to improve security tooling reliability, performance and security - Develop tools and automation to eliminate manual and repetitive efforts

Education & Experience

Bachelor of Science in Computer Science or equivalent experience 4+ years

Additional Requirements