Data Services Infrastructure Engineer
London, Greater London, United Kingdom
Software and Services
Come help us build the next generation cloud platform to support internet services across Apple. Our platform server engineering team develops and deploys software which forms the foundation for some of our most exciting services, including iCloud, Maps, iTunes, and more. Our software ensures that Apple's services are reliable, scalable, fast, and secure. In this role you will have the unique opportunity to own and deliver key components in Apple's growing suite of infrastructure services. We are looking for a world-class distributed systems engineer with experience in developing processes, tools, and automation for managing distributed systems in production environments. We balance our time across automating operations for our growing footprint of deployments, building self-service products to empower internal customers, and increasing the reliability of our services with application and systems-level improvements. We’re looking for someone with experience working on stateful infrastructure services — but we value potential as highly as experience.
- Knowledge of Linux, operating systems, networking protocols, security, and file systems
- Familiarity with DNS, HTTP, message queues, RPC frameworks, databases, and Linux tools
- Experience in designing, implementing, and managing systems that offer self-service and self-healing capabilities
- Passion for building automation tools in Java and Python, with a focus on writing high-quality code, tests, and documentation
- Familiarity in systems-level Java essentials: garbage collection internals, concurrency models, native & async IO, off-heap memory management, etc.
- Excitement for Agile methodologies, including pair programming, TDD, and continuous delivery of software
- Great communication skills and a deep sense of ownership for small tasks and large projects alike
- Develop creative and robust software solutions for automating the operations of large distributed systems
- Build tools and processes that enable effective monitoring, debugging, and capacity planning of production clusters
- Work closely with other software and systems engineers to improve the availability, reliability, utilization, and scalability of our services
- Support engineering teams in integrating with mission-critical infrastructure services
- Participate in an on-call schedule that's shared with team members in the UK and US
You’ll be familiar with the broader field of distributed queues, search, storage, and data streaming, and you're excited by the prospect of working collaboratively with other groups to deliver truly amazing services to our users. You will be able to demonstrate a strong practical understanding of how to develop and operate practical, fault-tolerant high-performance distributed systems. You're also excited about applying practical systems-level knowledge to understand and solve problems under and over the hood.
Education & Experience
BS, MS or PhD in Computer Science, or equivalent experience
- Experience or excitement for developing, deploying and operating stateful systems in production environments, including automating build outs, monitoring, and configuration management
- Knowledge and experience with Apache Kafka, Solr, ZooKeeper, Flume, Hadoop, Spark, Storm, Active/Open/RabbitMQ, Cassandra and other Big Data technologies preferred
- Familiarity with modern front-end frameworks e.g. React, Angular, or Ember
- This role may require occasional international travel/transatlantic travel