Siri Search and Knowledge - Analytics Platform Engineer
Santa Clara Valley (Cupertino), California, United States
Machine Learning and AI
Siri’s universal search engine powers search features across a variety of Apple products, including Siri, Spotlight, Safari, Messages, and News. The Search Data Platform team enables continuous improvement of the search system by building tools for data-driven decision-making and rapid iteration. As part of this group, you will work with one of the most exciting high-performance computing environments, with petabytes of data and millions of queries per second. You will have an opportunity to build out the data processing and analysis platform that helps drive development of the products that delight our customers every single day. DESCRIPTION
- 2+ years of experience as a Software Engineer
- Excellent analytical and problem solving skills
- Proficiency in one of the following languages: Python, Go, Java, C++
- Desire to contribute to a nascent data ecosystem and to build a strong data toolset for the company
- Experience with large, complex, highly dimensional data sets; hands-on experience with SQL/HQL
- Experience with Hadoop, Hive, and/or Impala is a plus
- Experience applying algorithms to understand real-world data (classification, anomaly detection, etc.) is a plus
- Phenomenal interpersonal skills required
You are someone with at least a few of the following traits: - Is excited about digging into massive petabyte-scale semi-structured datasets - Has experience in distributed systems, database internals, or performance analysis - Has experience with dimensional analytics platforms (OLAP) and data visualization systems - Has experience with A/B testing infrastructure, machine learning infrastructure - Has experience with map/reduce and big data frameworks, such as Hadoop, Spark, Dask - Has a deep understanding of polyglot data persistence (relational, key/value, document, column, graph, data warehousing) - Strong dedication to code quality, automation, and operational excellence: unit and integration tests, linting, documentation, etc What you will do: - Develop software to process, transform, and analyze data to identify signals from the billions of events we collect every day. - Design and build abstractions that hide the complexity of the underlying big data stack (HDFS, Hadoop, Hive, Impala, Spark, Kafka, Parquet, etc) and allow partners to focus on their strengths: product, data modeling, data analysis, search, information retrieval, and machine learning. - Act as the “source of truth” for our most fundamental data - such as search activity and content - as well as our core metrics across a variety of products. - Surface datasets in near-real-time to mission critical products and business applications throughout the company, providing the signal that feeds our machine learning algorithms as well as our daily product-defining decisions. - Empower dozens of engineering teams, hundreds of co-workers, and hundreds of millions of users to dream of new possibilities for the product. - Build scalable backend services and tools to help partners implement, deploy and analyze data assets with a high level of autonomy and limited friction. - Optimize end-to-end workflows of data users (crafting libraries, providing abstractions to define jobs, scheduling data pipelines, managing access data assets, etc). - Provide transparency into our data flows (comprehensive view of sources, transformations, sinks, data lineage). - Automate and handle lifecycle of datasets (schema evolution, metadata store, backfill management, deprecation, migration). - Improve the quality and reliability of the pipelines (monitoring, retry, failure detection). - Supply reusable backend abstractions to ingest or access data sets (batch, streaming, and low latency APIs). - Choose the right tools for the job, whether that means gluing things together with Bash, prototyping in Python, productizing in Java/Go, or scheduling with Groovy.
Education & Experience
BS in Computer Science, Mathematics, Statistics, or a related field, or equivalent industry experience.