Senior DevOps Engineer - Hadoop

Santa Clara Valley (Cupertino), California, United States
Software and Services


Weekly Hours: 40
Role Number:200297770
Apple’s Applied Machine Learning team has built systems for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business. We use the latest in open source technology and as committers on some of these projects, we are pushing the envelope. Working with multiple lines of business, we manage many streams of Apple-scale data. We bring it all together and extract the value. We do all this with an exceptional group of software engineers, data scientists, SRE/devops engineers and managers.

Key Qualifications

  • Strong Infra Concepts in Networking ( VIP, GSLB, DNS, DHCP, CDN, Layer 4, Layer 7, for example )
  • Linux System Administration ( FSU, monitoring CPU, Memory, etc. )
  • Strong knowledge of Search Technologies, such as Apache Solr
  • Experience managing Big Data Environmentincluding HDFS, Hive, Oozie, Spark, PySpark, or similar Apache Kafka based ecosystem
  • Strong knowledge of managing Java, Node.js based applications with respect to deploying, debugging, securing ( TLS, Trust Stores and Key Stores )
  • Strong knowledge of DevOps philosophy and hands-on experience with one or more of technologies including Saltstack, Ansible, Spinnaker, Terraform, Cloud Formation and enabling workflows and pipelines using one or more of Jenkins, Rundeck, AirFlow with absolute emphasis on CI/CD pipeline using technologies such as Docker deploying on hybrid cloud and baremetal simultaneously
  • Background in building and handling large infrastructure supporting a huge volume of transactions in a high-demand environment.
  • Must possess deep knowledge of security practices such as Kerberos, mTLS, TLS/SSL, encryption etc
  • Have a passion for automation by creating tools using Go, Python, Java or other JVM languages
  • Proficient in shell scripting, command-line tools, and general system debugging
  • Strong communication skills and ability to work effectively across multiple business and technical teams
  • Ability to conduct performance analysis and troubleshoot large scale distributed systems
  • Strong expertise in troubleshooting complex production issues.
  • The candidate should be adapt at prioritizing multiple issues in a high pressure environment
  • Should be highly proactive with a keen focus on improving uptime availability of our mission-critical services
  • Comfortable working in a fast paced environment while continuously evaluating emerging technologies
  • The position requires solid knowledge of secure coding practices and experience with the open source technologies.


Monitor production, staging, test and development environments for a many Hadoop/YARN clusters spanning thousands of nodes, in an agile and dynamic organization. You like to automate anything which you do and you documents it for the benefit of others. You are an independent problem-solver who is self-directed and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner. Provide incident resolution for all technical production issues. Create and maintain accurate, up-to-date documentation reflecting configuration, and responsible for writing justifications, training users in complex topics, writing status reports, documenting procedures, and interacting with other Apple staff and management. Provide guidance to improve the stability, security, efficiency and scalability of systems. Determine future needs for capacity and investigate new products and/or features. Strong troubleshooting ability will be used daily; will take steps on their own to isolate issues and resolve root cause through investigative analysis in environments where the candidate has little knowledge/experience/documentation. Administer and ensure the proper execution of the backup systems. Provide 24x7 on-call support to handle urgent critical issues.

Education & Experience

BS in computer science with 7-10 years or MS plus 5-7 years experience or related experience.

Additional Requirements

  • Experience with Solr cluster administration
  • Experience with notebooks e.g. Jupyter, Zeppelin
  • Workflow & data pipeline orchestration using Airflow.
  • Debugging Hadoop/Spark/Hive issues using Namenode, Datanode, Nodemanager, spark executor logs.
  • Kubernetes, Docker, or other container orchestration framework
  • Position yourself as a go-to consultative resource and solution expert for Data Engineers and analysts.