Data Infrastructure Engineer (Siri Experimentation Platform)
Santa Clara Valley (Cupertino), California, United States
Machine Learning and AI
Would you like to play a critical part in the next revolution in human-computer interaction? Would you like to contribute to data pipelines that process millions of user interactions per second and inform the development of our products? Our team develops the experimentation platform that empowers Siri to deliver excellent experiences to billions of users every day, and you could be part of it. We are looking for creative engineers who are excited about building world-class internet-scale systems. As part of this group, you will have an opportunity to build the experimentation platform that facilitates data-driven decision making and rapid iteration of the products that delight our customers every day. Our team enables Siri engineers and data scientists to improve user interactions through a data-driven approach while respecting our users' privacy.
- 2+ years of experience as a Software Engineer
- Excellent analytical and problem solving skills
- Proficiency in one or more of the following languages: Python, Go, Java, Scala
- Experience with big data frameworks such as Hadoop, MapReduce, Spark, Hive, Impala
- Desire to contribute to a nascent data ecosystem and to build a strong data toolset for the company
- Experience designing and managing large scale data pipelines is a must
- Experience with A/B testing or multivariate testing infrastructure is highly desired
- Proven knowledge of statistics is a plus
- Great communication skills required
You have at least a few of the following traits: - Is excited about digging into massive petabyte-scale semi-structured datasets - Has experience developing data extraction and transformation pipelines - Has experience in distributed systems, database internals, or performance analysis - Has experience with dimensional analytics platforms (OLAP) and data visualization systems - Has a deep understanding of polyglot data persistence (relational, key/value, document, column, graph, data warehousing, etc) - Strong dedication to code quality, automation, and operational excellence: unit and integration tests, linting, documentation, etc. WHAT YOU WILL DO: - Empower dozens of engineering teams, hundreds of co-workers, and hundreds of millions of users to dream of new possibilities for the product. - Partner with teams across Apple to surface evidence-based results and drive decision-making across many of our products. - Work with one of the most exciting high performance computing environments, with petabytes of data and millions of queries per second. - Create the tooling required for continuous and evidence-based improvement of our systems. - Automate the release and distribution of new experiments to a variety of server and client-side components, providing guardrail processes for partner teams to operate confidently. - Develop software to process, transform, and analyze data to identify signals from the billions of events we collect every day (batch, streaming, and low latency APIs). - Design and build abstractions that hide the complexity of the underlying big data stack (MapReduce, Hadoop, Hive, Impala, Spark, Kafka, Parquet, etc) and allow partners to focus on their strengths: product, data modeling, data analysis, search, information retrieval, and machine learning. - Build scalable backend services and tools to help partners implement, deploy and analyze data assets with a high level of autonomy and limited friction. - Surface datasets in near-real-time to mission critical products and business applications throughout the company, providing the signals that feed our machine learning algorithms as well as our daily product-defining decisions. - Improve the quality and reliability of data pipelines (monitoring, retry, failure detection).
Education & Experience
BS in Computer Science, Mathematics, Statistics, or a related field, or equivalent industry experience.