Manager, Performance and SRE
Austin, Texas, United States
Software and Services
We are looking for a Manager committed to leading our Site Reliability and Performance team. This position is based in Austin, Texas. The successful candidate will enjoy using technology to automate solutions and optimize outcomes, implementing continuous integration and deployment in an exciting, highly sophisticated and large scale, distributed system and fast paced environment, with key focus on site reliability and performance
- Extensive experience leading teams responsible for customer facing systems in a high uptime 24-7 environment
- Expertise analyzing sophisticated application, database, network, and OS issues across a distributed large scale business critical system
- A depth and breadth of experience with server side Java development, relational databases, eventually consistent, high efficiency, cluster-based NoSQL solutions and distributed streaming platforms.
- Excellent problem solving, critical thinking, and interpersonal skills - Lead by example to empower and challenge the team to deliver their best.
- Strong Experience leading multi-functional initiatives and thought leadership
- Ability to look at bigger picture and execute sophisticated tasks.
- Have a passion for automation by crafting tools using Python, Java or other JVM languages
- Track record of building and running high-performance teams
- 3+ years management experience leading teams of engineers and experience with performance analysis of large scale cloud based application system.
- 2+ years experience delivering high SLA production outcomes in a Public Cloud environment - leveraging cloud-native architectures to build and manage resilient, highly available infrastructures that deliver customer outcomes with high SLAs
- Strong experience in driving partnership with between SRE, Dev and QE teams towards delivering excellent customer outcomes - managing above the level of infrastructure to deliver great customer outcomes
- Experience partnering with senior leaders as part of a broad organizational product development leadership team that focuses on ensuring and delivering customer value
- Strong expertise in managing production incidents, with experience driving for resolution and stakeholder communication during incidents. The candidate should be adapt at prioritizing multiple issues in a high pressure environment
- Experience with SRE tools, process and culture. An ideal candidate would have experience leading through an SRE transformation - leading “ops” focused teams to develop engineering capabilities that improve ability to deliver at scale
- Excellent problem solving, critical thinking, and communication skills - Lead by example to motivate and challenge the team to deliver their best
- Should be able to understand sophisticated architectures and be comfortable working with multiple teams
- Experience with large eCommerce application
We are looking for a manager who can lead a team of engineers with focus on site reliability and performance. The team will be agile that moves smart and fast by consuming and optimizing readily available technology, collaborating to improve and scale capabilities across business and use cases to ensure high availability and stability of the production environments. The team’s responsibilities will include: Fleet management Performance tuning Scale and load testing uptime and high availability Automation and tooling Reporting and communication
Education & Experience
BS degree in computer science or equivalent field with 5+ years or MS degree with 3+ years experience, or equivalent.