Sr DevOps/MLOps Engineer

Austin, Texas, United States
Machine Learning and AI

Summary

Posted:
Weekly Hours: 40
Role Number:200364739
Apple’s Applied Machine Learning team has built platforms for a number of large-scale data science applications. We work on many high-impact projects that serve various Apple lines of business. We use the latest in open source technology and as committers on some of these projects, we are pushing the envelope. Working with multiple lines of business, we manage many streams of Apple-scale data. We bring it all together and extract the value. We do all this with an exceptional group of software engineers, data scientists, SRE/DevOps engineers and managers.

Key Qualifications

  • Expertise in migrating, supporting the applications on Kubernetes & services on third party clouds.
  • Experience in supporting highly scalable data systems and services written in Python or Scala or Java.
  • Have a passion for automation by creating tools using Go, Python, Java or other JVM languages
  • Background in building and handling large infrastructure supporting a huge volume of transactions in a high-demand environment.
  • Must possess deep knowledge of security practices such as Kerberos, mTLS, TLS/SSL, encryption etc
  • Experience in Infrastructure templating tools like CloudFormation, Terraform
  • Proficient in shell scripting, command-line tools, and general system debugging
  • Strong communication skills and ability to work effectively across multiple business and technical teams
  • Expertise in configuration management (such as Ansible, SaltStack) for deploying, configuring, and managing servers and systems
  • Ability to conduct performance analysis and troubleshoot large scale distributed systems
  • Strong expertise in troubleshooting complex production issues.
  • The candidate should be adapt at prioritizing multiple issues in a high pressure environment
  • Should be highly proactive with a keen focus on improving uptime availability of our mission-critical services
  • Comfortable working in a fast paced environment while continuously evaluating emerging technologies
  • The position requires solid knowledge of secure coding practices and experience with the open source technologies.

Description

Monitor production, staging, test and development environments for many Hadoop/YARN clusters spanning thousands of nodes, in an agile and dynamic organization. You like to automate anything which you do and you document it for the benefit of others. You are an independent problem-solver who is self-directed and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner. Provide incident resolution for all technical production issues. Create and maintain accurate, up-to-date documentation reflecting configuration, and responsible for writing justifications, training users in complex topics, writing status reports, documenting procedures, and interacting with other Apple staff and management. Provide guidance to improve the stability, security, efficiency and scalability of systems. Determine future needs for capacity and investigate new products and/or features. Strong troubleshooting ability will be used daily; will take steps on their own to isolate issues and resolve root cause through investigative analysis in environments where the candidate has little knowledge/experience/documentation. Administer and ensure the proper execution of the backup systems. Provide 24x7 on-call support to handle urgent critical issues.

Education & Experience

BS in computer science preferred 7+ years or MS preferred 5+ years of demonstrated ability or related experience.

Additional Requirements