Sr. Site Reliability Engineer - Site Reliability Engineering
Seattle, Washington, United States
Software and Services
- Deep understanding of the Linux operating system, including kernel, memory, process, threads, cgroups, static / shared libraries, IPC, signals. Standard UNIX utilities, programs and packaging.
- Extensive experience in configuration management and fleet orchestration via Puppet, Chef, Ansible, or others.
- Understanding of basic Internet infrastructure services including DNS, DHCP, LDAP, server virtualization, server monitoring, cloud services (AWS S3/EC2/CloudFront/Steps... or equivalent).
- Demonstrated history in automating operations processes via services and tools
- Fluency in one or more high-level programming languages like Java, Python, Go, Ruby or equivalent.
- Consistent track record of troubleshooting and resolving issues in live production environments and implementing strategies to eliminate them.
- Driven approach to continually improving service levels.
- Comfortable working with large-scale server deployments, both on premise and in public clouds
- Knowledge of data platforms, including but not limited to: Apache, Kafka, Solr, Redis, MySQL, Cassandra, Hadoop.
- Knowledge of continuous integration, testing methodologies, TDD and agile development methodologies.
- Strong ability and enthusiasm to learn new technologies in a short time. We seek a self starter, visionary person with strong leadership capabilities.
- Experience in understanding how applications operate across distributed resources in diverse geographies
- Extraordinary communication skills, for collaborating across many participating teams.
Architect, author and deliver software to improve the availability, scalability and security of Apple Media Product's internal data infrastructure. Build and manage systems, infrastructure and applications through automation. Deploy, support and monitor new and existing services, platforms, and application stacks. Engage in improving the whole lifecycle of services from inception through deployment, operations, and refinement Provide hands-on technical expertise during service impacting events Collaborate with other engineers on code reviews, internal infrastructure improvements and process enhancements. Use scalability testing to measure, tune and optimize system performance.
Education & Experience
BS degree in computer science or equivalent field with 5+ years experience or MS degree with 3+ years experience, or equivalent.
- Participate in periodic 24x7 on-call duties
- This role may require occasional international travel/transatlantic travel