DevOps Engineer, Distributed Systems
Santa Clara Valley (Cupertino), California, United States
Software and Services
Imagine what you could do here. At Apple, new ideas have a way of becoming extraordinary products very quickly. Bring passion and dedication to your job, and there's no telling what we can accomplish together. Do you love crafting elegant solutions to highly complex challenges? Can you intrinsically see the importance of every detail? At Apple, our Platform Architecture group is responsible for connecting our hardware and software into one unified system. Join this team, and you'll collaborate with engineers across Apple to build and deploy forward-looking prototype systems that contribute to the development of our world renowned hardware and software architecture. You and your team will validate that every product we make performs exactly as intended. Together, our work will be the reason millions of customers feel that they can trust our devices every single day. As a DevOps Engineer within the Platform Architecture team, you will be responsible for supporting a team of software engineers and help to build software systems to automate the testing, deployment, management, and monitoring of Apple’s large-scale, internal batch compute services. Ideally, you have both solid Linux / Systems expertise and demonstrated Software Development abilities.
- Extensive experience in a SRE/Operations/DevOps role in a large-scale environment running production systems.
- Candidate must possess a strong knowledge of Linux systems internals and administration.
- Candidate must be comfortable analyzing and troubleshooting large-scale distributed systems.
- Practical, proven knowledge of at least one higher-level language (Python, Go, Ruby, or similar).
- Experience managing large numbers of diverse systems with configuration management tools like: SaltStack (preferred), Puppet, or Chef.
- Possess systematic problem-solving skills and the ability to work with ambiguity.
- Passionate and curious about solving everyday problems in innovative ways.
- Comfortable with remote management of bare-metal servers and virtualized environments.
- Strong aptitude towards automation and streamlining of tasks.
- Enthusiastic about learning new technologies.
- Detailed technical documentation writing skills.
- Communicate clearly and concisely.
Design and implement software and systems to automate the management of large-scale infrastructure and services across all phases of lifecycle: development, integration, testing, monitoring, debugging, deployment, operations. Proactively work with other engineers to ensure that our services meet availability, performance, scalability, and security goals. Solve issues across the entire stack hardware, software, and application. Work with the team to design, build, and maintain core systems and management tools. Collaborate with other engineers on code reviews, internal infrastructure improvements, and process enhancements.
Education & Experience
Bachelor’s degree in Computer Science or equivalent industry experience
- Experience with provisioning and orchestration technologies (such as xCAT, Razor, Foreman, Cobbler and Kubernetes, Docker, StackStorm).
- Experience with continuous integration and continuous deployment practices.
- Experience with the use and configuration of monitoring, metrics, and logging infrastructure (Prometheus, Grafana, Graphite, Logstash/Kibana, Splunk, etc.)
- Experience with virtualization, containerization, and system image management (KVM, LXC, Vagrant).
- Experience with building, configuring, scaling, and monitoring distributed storage systems (HDFS, Ceph, Amazon S3).
- Understanding of standard networking protocols and components: such as HTTP, DNS, TCP/IP, ICMP, DHCP, OSI Model, subnetting, and load balancing.