Site Reliability Engineering Lead
Hyderabad, Telangana, India
Software and Services
Are you seeking an environment where you can drive innovation? Does the prospect of working with top engineering talent get you charged up? Apple is a place where extraordinary people gather to do their best work. Together we create products and experiences people once couldn’t have imagined — and now can’t imagine living without. Apple’s IS&T manages key business and technical infrastructure at Apple -- how online orders are placed, the customer experience with technology in our retail stores, how much network capacity we need around the world and much more. The SAP Global Systems team within IS&T runs the Operations and Financial transactional platform that powers all of Apple functions like Sales, Manufacturing, Distribution and Financials.
Think platform-as-product! Our team delivers great developer experiences to our Program, Project and Development teams through curated set of tools, capabilities and processes offered through our Internal Developer Platform. We automate infrastructure operations, support complex service abstractions, build flexible workflows and curate a frictionless ecosystem that enables end-to-end collaboration to help drive productivity and engineering velocity. This is a tremendous opportunity for someone who has the skill to own initiatives and a passion to work in a highly integrated global solution platform! Join us in crafting solutions that do not yet exist!
Key Qualifications
- 8+ years of experience with a track record of building and leading Cloud Native SRE and Operations for AWS or GCP hyperscalers
- Strong experience supporting customer facing applications in an 24-7 uptime environment of distributed systems
- Expertise handling production incidents, with experience working towards resolution and collaborator communication during incidents.
- Track record with improving service reliability and efficiency.
- Ability to implement and coordinate telemetry using monitoring and observability tools
- Adapt at prioritizing multiple issues in a high stress environment. Good experience in designing and improving response processes
- Experience leading multi-functional initiatives and thought leadership.
- Automation focus for operational efficiency - designing and implementing automation processes for repeatable and consistent service deployment
- A strong sense of ownership. Good critical thinking & interpersonal skills to work successfully across diverse business and technical & multi-functional teams.
- Working knowledge of core cloud services such as IAM, EC2/GCE, RDS/CloudSQL, EKS/GKE, CloudWatch/Cloud Monitoring, S3/GCS etc
- Understand complex landscape architectures. Have working knowledge of on-prem and cloud based hybrid architectures and infrastructure concepts of Regions, Availability Zones, VPCs/Subnets, Loadbalancers, API Gateways etc.
- Strong understanding of common authentication schemes, certificates, secrets and protocols
- Scripting and/or coding skills needed for automation, triaging and troubleshooting (Examples include but are not limited to Python, Go, Java)
Description
Build up, lead and improve existing processes to provide 24x7 operational response for applications in public cloud platforms. Maintain services once they are live by setting up monitoring, alerting and measuring availability, latency, and overall system health. Own and review work for accuracy, quality, application performance and completeness.
Review release readiness through activities such as system design consulting, reviewing all observability and monitoring, capacity planning, and launch reviews. Understand processes to improve incident coordination among Apple teams. Keep up to date with the latest technologies and tools and evangelize their value with the development teams. Partner with architects and engineers to design and implement automation, operations, and support solutions.
Strive for top quality results and continuously look for ways to improve and enhance platform reliability, performance, and security. We're looking for a hardworking and passionate person to join this amazing team.
Education & Experience
Bachelor’s or Masters degree in Computer Science / Software Engineering / Related fields.