Operations & Site Reliability Engineer

Hyderabad, Telangana, India
Software and Services


Weekly Hours: 40
Role Number:200534628
People at Apple don’t just build products — they craft the kind of experience that has revolutionised entire industries. The diverse collection of our people and their ideas encourage innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it. At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Every single day, people do amazing things at Apple. Join Apple’s Service Management team as an Operations and Site reliability Engineer and inspire the team for operational excellence and improve availability, scalability and security of multiple highly scalable, fault tolerant, business critical, global applications in Apple Service Management space. Lead operational planning, readiness, monitoring, measurement of system health, incident management and communication for these enterprise level applications. Build and manage systems, infrastructure and applications through automation. Develop tools that bring operational parity across all applications to improve team’s efficiency. The candidate’s skill will be a strong blend between Operations Lead and Engineering.


- We are looking for a highly technical and motivated individual who will own ultimate responsibility for operations of Service systems, working with teams to ensure 24X7 operations, coupled with the ability to ensure smooth rollout of applications that our customers use every day and improve our tool suite and develop new tools to improve the operational efficiency and product quality. - Identify and handle key performance indicators for global applications. Drive operational improvements, metrics tracking and implementation of standard methodologies through level one production support and engineering teams. - Handle Production backlog with business team and prioritize fixes in planned releases. Keep close tab on all product releases and ensure smooth and safe deployments in Production. Drive and handle product rollouts and partner/retail on-boardings. - Lead Production Support team to ensure all servers and application are monitored on an ongoing basis with alerts including CPU, memory, and storage utilization, as well as network and security issues, and performance tuning. Monitor production footprint and lead the effort for Capacity Planning - Keep track and interact with the Data Center, Network and other system teams to plan out OS patches, system upgrade and maintenance. - Drive the team to build, implement application automated health checks ensuring the high availability of applications - Along with applying your technical skills, you will have the opportunity to let your creative juices flowing. You will work very closely to design, develop and operate the best development support and automation tools you can imagine.

Minimum Qualifications

Key Qualifications

  • Strong sense of ownership, customer service, and integrity demonstrated through clear communication
  • Experience in leading and driving operations teams for large scale Critically important applications working in a 24x7 operations and on/off shore support model
  • Experience in strategizing and achieving operational excellence in global distributed systems
  • Strong knowledge of Production support practices for managing web and iOS applications
  • Experience in fixing, analyzing logs, building metrics and operational dashboards
  • Passion for eliminating repetitive manual processes using automation
  • Experience in interpreting data from systems like Hubble, ExtraHop, Splunk and other monitoring tools
  • Fundamental understanding of distributed systems including: Micro services, Messaging Brokers and Versioning
  • Experience in Java, JEE, REST, Swift/Objective C, database schema design and data access technologies
  • Deep Understanding of programs using a high-level programming language like: C, Java, Ruby, Python, or Perl
  • Experience managing large numbers of diverse systems with containers (Docker), build systems (Jenkins, Ansible, Spinnaker), and infrastructure as a service (Kubernetes, AWS)
  • Understanding of the Linux Operating System, including Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, Signals
  • Understanding of standard networking protocols and components such as: HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing is a plus
  • Experience in ethical hacking, system security and fraud monitoring are added advantage
  • Self-starter, flexible, motivated to learn in a fast-paced environment and comfortable working as part of a team of versatile engineers
  • Excellent communication and leadership skills
  • Excellent organizational and documentation skills
  • Passion for quality and the optimal user experience

Preferred Qualifications

Education & Experience

Bachelors and equivalent

Additional Requirements