Site Reliability Engineer – System Administrator CSG
Software and Services
The Hardware Technology Compute and Storage Group has an opportunity available for a customer service oriented, self-driven, and motivated Site Reliability Engineer to join our operations team. The ideal candidate should possess a diverse background in system administration, excellent communication skills, a sense of ownership, and a drive to produce their best work. They should also possess the ability to analyze and troubleshoot a broad spectrum of problems. You will join an existing team dedicated to supporting the geographically diverse silicon design teams within Apple.
- 3-5+ years of broad experience in system administration supporting Linux (RedHat/CentOS preferred) at scale, including automated OS installation, software compilation, package management, virtualization, OS lifecycle management, diagnostic and performance troubleshooting/profiling.
- 2+ years of experience working with and maintaining configuration management tools (Puppet, Ansible, Chef, Salt, etc.)
- Experience with shell scripting, interpreted, or complied language such as Bash, Perl, Ruby, Python, Go, C, or C++
- Experience supporting and maintaining common Linux/Unix applications and services,
- Deep understanding of DNS, DHCP, LDAP, NFS, AutoFS, Kerberos, PAM, PXE, SNMP, SSH, VNC, X11, HTTP/S, and NTP – Advantage
- 2+ years of experience working with and maintaining common monitoring tools (Nagios, Zenoss, Ganglia, Splunk, etc.) - Advantage
- Experience with common version control software such as Git, CVS, SVN, or Perforce
- Experience with NFS and NAS storage appliances (NetApp or EMC preferred)
- Understanding of networking layers 1 through 3 (Arista preferred)
- Analytical and possess strong organization / problem-solving skills
- Ability to participate in a regular on-call rotation
You will be responsible for supporting internal hardware engineering teams by enhancing, maintaining, performance tuning, and planning capacity of the engineering compute environment. Your role will directly impact the development, enhancement and maintenance of the high-performance compute environment and will include operations support of various batch job scheduling, storage, network, monitoring, application, and other infrastructure services. RESPONSIBILITIES WILL INCLUDE: Support and improve the Hardware Technology engineering environment from design through deployment, including additional refinement and scale-up to support future growth Support the day-to-day operations of the environment including monitoring, measuring, and troubleshooting infrastructure and services Automate all the things by identifying, owning, collaborating, and driving new or further automation to enhance the consistent stability of the environment Achieve and maintain expected productivity levels with minimal supervision Be able to interact with people and explain to them problem resolution methods. Contribute to a culture of curiosity, diversity, openness, collaboration, improvement, and problems solving Participation in a regular on-call rotation to support the infrastructure 24/7
Education & Experience
BS degree in Computer Science, Computer Engineering, or related technical field or equivalent practical experience.