Silicon Validation Software Engineer: Site Reliability
San Diego, California, United States
Imagine what you could do here! At Apple, new ideas have a way of becoming extraordinary products, services, and customer experiences very quickly. Bring passion and dedication to your job and there's no telling what you could accomplish. Dynamic, smart people and inspiring, innovative technologies are the norm here. The people who work here have reinvented entire industries with all Apple Hardware products. The same passion for innovation that goes into our products also applies to our practices strengthening our commitment to leave the world better than we found it. Join us to help deliver the next groundbreaking Apple product. We are a small DevOps team providing continuous build and integration services for the Silicon Validation Software department. We are the bridge between software engineering and silicon validation platforms which is critical to delivering high quality Apple Silicon for the majority of Apple's new products. You will design, develop, and deploy systems to support the build and release of software developed by hundreds of Apple engineers. Your success here has a direct impact on products used by billions of users!
- 5+ years managing large-scale distributed systems (> 1000 nodes, on-prem and cloud)
- Expertise with build/release (CI/CD) pipelines, methodologies, and tools (Jenkins, artifact management, etc)
- 5+ years of object-oriented languages experience (Java/Groovy, Python, etc)
- Experience collecting metrics and maintaining a monitoring and alerting system to identify bottlenecks and improve service performance
- Experience with log aggregation and analysis systems (Splunk, ELK stack, etc)
- Expertise with a configuration management / provisioning system (Salt preferred)
- Ability to inspect/debug third-party/open-source code in order to investigate and understand subtle performance problems
- Experience with enterprise virtualization solutions (vSphere)
- Experience deploying and supporting Java applications
- A detailed test-and-measure approach to continually improving service reliability and performance
As a Site Reliability Engineer (SRE), you will be responsible for solving problems and maintaining an ever-expanding constellation of systems, tools, and services such as Jenkins, Gerrit, SaltStack, Grafana, Influx, Zabbix, and VMware. You will work with other teams to understand their needs in order to implement optimal solutions. This will require independent problem solving and collaborating with other teams to adjust our infrastructure to achieve their missions.
Education & Experience
Bachelor’s degree in Computer Science or relevant industry experience.