Data Integrity Analyst (Data Scientist)
Santa Clara Valley (Cupertino), California, United States
Software and Services
At Apple, phenomenal ideas have a way of becoming great products, services, and customer experiences very quickly. If you are a self-motivated, high-energy individual who is not afraid of challenges, we’re looking for you. Apple is seeking a tried Data Integrity Analyst (Data Scientist) to join the Apple Media Products Analytics and Data Products team. The Data Integrity Analyst plays an important role in detecting, tracking data integrity issues and taking them to resolution end to end. This is a multi-functional role working with production support, engineering, operations, QA, legal etc. teams in assessing data integrity issues, performing data analysis, impact analysis, root cause analysis, QA and production rollouts.
- 6+ years of industry experience passionate about improving quality of the data in big data environment
- Strong working knowledge of supervised and unsupervised machine learning models.
- Experience with Python, Scala, or another high-level programming language.
- A good understanding of Spark. Experience using Spark is a great advantage.
- Experience working with large volumes of data and knowledge of big data challenges is highly desirable.
- Experienced in leading and coordinating multi-functional activities between QA, Engineering, Business, Legal etc.
- Experienced in data quality issue tracking and resolution activities end to end (impact analysis, root cause analysis, data analysis etc.)
- Reciprocal; able to build consensus and drive valuable meetings among cross-functional teams while also challenging assumptions in a relevant way
- Experience in working with ETL applications, data pipelines, and Big Data
- Knowledge of Data Integration tools, SQL, Talend, BI tools.
- Knowledge of Hadoop related technologies such as HDFS, Azkaban, Oozie, Impala, Hive, and Pig
- Excellent oral and written communication and presentation skills
Work with large-scale datasets to do feature engineering and selection, train and build machine learning models, evaluate and measure model performance, and work with partners on model adoptions. Work closely with data engineers, program managers and business partners to understand problems, define and build solutions, measure and communicate results. Performs data analysis, impact analysis, root cause analysis of the data quality issues and drives corrective actions. Monitors data quality results, reports and dashboard, and consults on data quality corrective action plans. Builds/Enhances data quality dashboard/scorecard to help monitor the quality of the data pipeline. Partners with business / reporting teams to assess the impact of data integrity issues, qualifies the issues with a priority. Partners with EPM team to drive corrective actions. Partners with operations team to perform root cause analysis. Partners with engineering teams on the development and implementation of the fixes / corrective actions. Partners with QA teams to build and implement test cases to improve data quality and stability of the data pipeline.
Education & Experience
Minimum of a Bachelor’s degree in Computer Science, Statistics, Mathematics, Engineering, Economics or relevant field. Ideally, Masters in related field.
- Broad knowledge of existing machine learning algorithms and creativity to invent and customise the algorithms when necessary
- Knowledge of building dashboards using Tableau, Qlikview or similar tools
- Experience with Spark or streaming technologies would be a bonus
- Experience working with Music, Video, Click stream event data