Computational Support for Machine Learning and LSTM

Santa Clara Valley (Cupertino), California, United States
Software and Services


Weekly Hours: 40
Role Number:200149628
Are you a big-picture thinker who loves setting ambitious goals? Do you have a passion for understanding how each line of code affects all the others? In the Core Operating Systems group we ensure that the OS is inseparable from each device’s identity as a whole. That’s because this group is committed to building fully integrated operating systems that combine hardware, software, and apps into a single Apple experience. Your dedication to cross-disciplinary collaboration will help develop groundbreaking technologies for iOS, macOS, watchOS, and tvOS. By crafting these distinct, holistic user experiences, you’ll continue to uphold and advance the excellence people expect from Apple devices. The Vector and Numerics Group within the Core Operating System is tasked with designing, enhancing and improving various subsystems running on iOS, macOS, watchOs and tvOS. Most of this support is encapsulated in the Accelerate framework, a well used library serving many technologies, such as machine learning. The group is looking for an exceptional high performance programmer to complement the team and make a difference. As a member of our fast paced group, you will have the unique opportunity to delight and inspire millions of Apple’s customers every day. You will work in a cross functional team which is implementing innovative and state of the art routines to support the necessary computation for such things as vision algorithms, machine learning training and inference. You will push the state of the art in low level computation and drive them towards energy efficient and high performance implementations by tightly integrating software and hardware. The successful candidate will have excellent understanding and knowledge of machine learning primitives and micro architecture of NEON on ARM or AVX on Intel CPU cores from a vector programming perspective. Team members are engaged in the design and optimization of low level computational support for all aspects of machine learning and computational primitives in service of models such as Object Detection, Sound and Activity Classification to mention a few. These include 2D Multi-layered convolutions and LSTM. Being able to craft the fastest and the most energy efficient routines for a particular CPU core is a plus. Low level high performance programming experience is a must for this position. Being comfortable in vector assembly and low level C is a requirement. The ideal candidate would be at ease in developing both innovative and robust CPU core level algorithms derived from a particular technology’s need in a tight deadline.

Key Qualifications

  • Low level algorithmic development in Machine Learning primitives, BLAS and FFT.
  • Detailed knowledge of vector Instruction Set Architectures (ISA) of ARM and Intel.
  • Strong understanding of computational efficiency.
  • Excellent coding skills in ASM and C.
  • Strong verbal and written communication skills.
  • Ability to handle multiple tasks and self-prioritize.
  • Ability to work with cross functional teams in compression related components.


Design and implement micro architecturally optimized pieces of Accelerate framework taking into consideration performance and energy usage.

Education & Experience

BS/MSEE in mathematics, computer science or computer engineering required, advanced degree preferred.

Additional Requirements