Siri - Senior Data Engineer, Data organization
Santa Clara Valley (Cupertino), California, United States
Software and Services
Would you like to play a critical part in the next revolution in human-computer interaction? Contribute to the advancement of a product that is redefining human-computer interaction, and work with the people who build the intelligent assistant that helps hundreds of millions of people get things done — just by asking? The vision for the Siri Data organization is to improve Siri by using data as the voice of our customers. Within this organization, the mission of the Data Platform team is to build scalable, quality, and performant data systems with the overarching goal of improving Siri quality. We’re looking for exceptional data engineers who love working with data and are passionate about customer experience. As a data engineer on the team, you will design, develop and maintain highly available and scalable data systems, while maintaining the highest degree of security and privacy. As a part of this group, you will work with one of the most exciting high-performance computing environments, with petabytes of data, billions of events a day, and have an opportunity to imagine and build products that delight our customers.
- 5+ years of experience in developing ETL jobs for analyzing and processing high-volume data in Apache Hadoop ecosystem, especially with Spark
- Expert knowledge of one or more object-oriented programming languages (Scala preferred)
- Proficient at schema design and data modeling
- Experience with data tools (Jupyter Notebooks, Zeppelin...)
- Excellent problem-solving and analytic skills
- Ability to program in several scripting languages such as Python, Perl, and Bash
- Experience with workflow management tools: Oozie, Airflow, Azkaban, etc.
- Experience with batch and streaming data processing
- Ability to learn and research new technologies rapidly
- Passion for customer privacy
- Strong interpersonal skills and experience working on cross-functional projects
- Nice to have: Developed large-scale backend storage systems
You will be an integral part of the team that is in a unique position to align our quality initiatives to a singular platform. This platform allows data analysis, metric reporting, annotation, model training, and evaluation to utilize the unified data foundation to achieve consistency, quality, and efficiency. This platform also empowers our developers to perform queries into their problem space to tackle or validate the effectiveness of their code from a quality perspective. You will have experience in data analysis and data processing to design, implement, manage many large scale datasets and backend services, those data systems will be utilized by many Siri teams. Thus deep technical capabilities, strong communication skills and a knack to using hard data to triage issues is a must-have requirement. YOUR RESPONSIBILITIES INCLUDE, BUT ARE NOT LIMITED TO: - participate in instrumentation architectural reviews, collaborate with partners with specialist knowledge in data handling to review requirements & ensure instrumentation is designed correctly in a privacy-safe way - analyze and extract raw data from different sources and process (clean, transform) data - implement data storage solutions - instrument & surface data quality and pipeline metrics - work with data consumers to understand and utilize our data systems - collaborate with our quality initiative leaders to ensure the platform meets the needs, and iterate and innovate based on requirements and observations - evolve our pipelines & question the status quo
Education & Experience
Bachelor’s Degree or foreign equivalent in Computer Science, or related field, or equivalent experience.