Siri - Site Reliability Engineer
Elk Grove, California, United States
Machine Learning and AI
Play a part in revolutionizing how people use their computers and mobile devices. Build groundbreaking technology for algorithmic search, machine learning, natural language processing, and artificial intelligence. And work with the teams crafting the most scalable big-data systems in existence.
- Experience running Linux systems in a 24/7 production environment.
- Ability to program in Python, Ruby or Perl highly preferred.
- Working knowledge of multi-tier applications and their dependencies including load balancing, TCP/IP networking, web services, LDAP and DNS.
- Proficiency with web server administration including Apache and Nginx highly preferred.
- Knowledge of database support and administration including MySQL, Postgres & HBase.
- Experience with monitoring tools such as Nagios, Splunk and Munin highly preferred.
- Develop and maintain automation for system administration, provisioning, support and application management related tasks that make valuable contributions, enable support teams and users, reduce costs and increase business agility.
- Experience with Puppet, Chef or Ansible highly preferred.
- Excellent interpersonal and communication skills demonstrated through
- previous projects or assignments (work or academic related).
- Cisco and Juniper network administration experience a plus.
Monitor production, staging, test and development environments for a myriad of applications in an agile and dynamic organization. You are an independent problem- solver who is self-directed and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner. Provide incident resolution for all technical production issues. Create and maintain accurate, up-to-date documentation reflecting configuration, and responsible for writing justifications, training users in complex topics, writing status reports, documenting procedures, and interacting with other Apple staff and management. Provide guidance to improve the stability, security, efficiency and scalability of systems. Determine future needs for capacity and investigate new products and/or features. Strong troubleshooting ability will be used daily; will take steps on their own to isolate issues and resolve root cause through investigative analysis in environments where the candidate has little knowledge/experience/documentation. Administer and ensure the proper execution of the backup systems. Provide 24x7 on-call support to handle urgent critical issues. The position will require rotating day, night and weekend shifts.
Education & Experience
BS in Computer Science or equivalent Program preferred.