Apple Cloud Infrastructure - Data Services Engineer
San Francisco, California, United States
Software and Services
Come help us build the next generation cloud platform to support internet services across Apple. Our infrastructure engineering team develops and deploys software which forms the foundation for some of our most meaningful services, including iCloud, Maps, iTunes, and more. Our software ensures that Apple's services are reliable, scalable, fast, and secure. In this role you will have the rare opportunity to own and deliver key components in Apple's growing suite of infrastructure services. We are looking for a highly-skilled Infrastructure Engineer with experience in developing processes, tools, and automation for managing distributed systems in production environments. We balance our time across automating operations for our growing footprint of deployments, building self-service products to empower internal customers, and growing the reliability of our services with application and systems-level improvements. We’re looking for someone who has developed infrastructure services — but we value potential as highly as experience. You’ll be familiar with the broader field of distributed queues, storage, and data streaming, and you're excited by the prospect of working collaboratively with other groups to deliver truly amazing services to our users. You will be able to demonstrate a strong practical understanding of how to develop and operate fault-tolerant, high-performance distributed systems. You're also passionate about applying practical systems-level knowledge to understand and solve problems under and over the hood.
- Knowledge of Linux, operating systems, networking protocols, security, and file systems.
- Familiarity with DNS, HTTP, message queues, queueing theory, RPC frameworks, databases, and Linux tools.
- Experience in crafting, implementing, and managing systems that offer self-service and self-healing capabilities.
- Real passion for building automation tools, with a focus on writing high-quality code, tests, and documentation.
- Familiarity in systems-level or Java essentials: garbage collection internals, concurrency models, async IO, off-heap memory management, etc.
- Understanding of SRE principles and goals (SLI/SLOs, release management, monitoring and alerts).
- Excitement for Agile methodologies: pair programming, TDD, and continuous delivery of software.
- Great interpersonal skills and a deep sense of ownership for small tasks and large projects alike.
Develop creative and robust software solutions for automating the operations of large distributed systems. Build tools and processes that enable effective monitoring, debugging, and capacity planning of production clusters. Work closely with other software and systems engineers to improve the availability, reliability, utilization, and scalability of our services. Support engineering teams in integrating with mission-critical infrastructure services. Participate in an on-call schedule that's shared with team members in the UK and US.
Education & Experience
BS or MS in CS or equivalent
- Experience or excitement for developing, deploying and operating stateful systems in production environments, including automating build outs, monitoring, and configuration.
- Knowledge and experience with Big Data technologies such as Apache Kafka, ZooKeeper, NATS, Active/Open/RabbitMQ, Cassandra.
- Experience with Kubernetes.