AI/ML - Sr Search Data Quality Engineer, Siri Data
Santa Clara Valley (Cupertino), California, United States
Machine Learning and AI
Siri’s universal search engine powers search features across a variety of Apple products, including Siri Assistant, Spotlight, Safari, Messages, and News. The Siri Data organization seeks to improve Siri by using data as the voice of our customers. Within this organization the Search Data Engineering team builds systems that process data reliably at scale to generate scalable and high quality datasets that support confident, data-driven decision making for Siri Search. We’re looking for exceptional data engineers who are passionate about our product and values; who love working with data at scale; and who are committed to the hard work necessary to continuously improve. As a part of this group, you will work with petabytes of data daily using diverse technologies like Spark, Hive, Impala, Flink, Kafka, and others. You will be expected to effectively partner with upstream engineering teams and downstream consumers, including analysts and product engineers. In this role you will help to build out a data quality ecosystem that measures and tracks data quality across the stack and along the lifecycle of new products and features. You will work with data scientists and product engineers to generate the insights that drive improvements to our search products and beyond.
- You have excellent written and verbal communication skills
- You are self-directed and capable of operating amidst ambiguity
- You have a demonstrated ability to drive highly cross-functional solutions
- You are curious and you have excellent analytical and problem solving skills and great intuitions about data
- You are excited about digging into massive petabyte-scale semi-structured datasets
- 4+ years of industry experience working with distributed data technologies (e.g. Hive, Impala, Spark, etc.)
- Proficiency in at least one high-level programming language (Python, Go, Java, Scala, or equivalent)
- Experience with large, complex, highly dimensional data sets; hands-on experience with SQL/HQL
- You are pragmatic, not letting “the perfect” be the enemy of “the good”
- You are humble, continually growing in self-awareness and possessing a growth mindset
- Extras we’d be excited about...
- Experience applying ML to understand real-world data (classification, anomaly detection, etc.)
- Experience with data visualization (SQL, Tableau)
- Experience with native client development (iOS)
Dealing with data quality issues: Quantifying them, diagnosing root causes, and driving to resolution Conceptualizing, proposing, designing, building, and maintaining tools to improve data quality, from the logging through the aggregation phases of the system Interacting with multiple teams spread across the stack to synchronize the work towards improving data quality Analyzing data to identify signals from the billions of events we collect every day A champion of data hygiene, building toward a strong culture of data quality and inspiring everyone to do the same
Education & Experience
Surprise us! Many will have an MS or BS in CS, Engineering, Math, Statistics, or a related field or equivalent practical experience in data science or analytics.