AIML - Senior Embedded Machine Learning Engineer- Edge ML
We are looking for a highly motivated and experienced Senior Embedded ML Engineer to join our team focused on enabling cutting-edge machine learning capabilities on resource-constrained edge devices. In this role, you will be at the forefront of innovation as we bridg the gap between state-of-the-art ML models and the realities of low-power, real-time, embedded hardware.
You will play a key role in designing, implementing, and optimizing ML solutions for highly constrained compute environments. This is a cross-disciplinary role that blends expertise in embedded systems, computer architecture, and machine learning to unlock new applications in areas such as IoT, wearables, robotics, and autonomous systems.
RESPONSIBILITIES:
- Design and implement embedded ML pipelines on microcontrollers and custom SoCs with tight compute, memory, and power constraints.
- Optimize and quantize deep learning models for real-time inference on edge platforms.
- Develop and maintain low-level firmware in C/C++ to integrate ML models with custom hardware accelerators and sensors.
- Conduct performance benchmarking, memory profiling, and bottleneck analysis across various embedded platforms.
- Collaborate closely with ML researchers, hardware architects, and product engineers to co-design efficient ML solutions from model training to deployment.
- Evaluate new edge ML techniques, compilers (e.g., TVM, TFLite Micro, CMSIS-NN), and toolchains to advance the team's capabilities.
- Contribute to the overall system architecture with a deep understanding of embedded compute, memory hierarchies, and data flow optimization.
- Strong proficiency in C/C++ and Python, with a solid foundation in embedded firmware development.
- Deep understanding of computer architecture, particularly ARM Cortex-M/A cores, SIMD, caches, memory alignment, and DMA usage.
- Proficiency in model deployment tools and compilers such as TensorFlow Lite for Microcontrollers, TVM, ONNX Runtime, and custom model conversion pipelines.
- Demonstrated expertise in performance analysis, using tools like perf, valgrind, gprof, or hardware-specific profilers.
- Experience working with hardware interfaces such as SPI, I2C, UART, and integrating with sensors or custom accelerators.
- Bachelor's, Master's, or PhD or equivalent experience in Computer Science or a related field.
- Hands-on experience with deep learning concepts, including model architectures (CNNs, RNNs, Transformers), training workflows, and post-training optimization (quantization, pruning, distillation).
- Familiarity with embedded RTOSes (e.g., FreeRTOS, Zephyr) and real-time application constraints.
- Comfort with debugging low-level issues across software and hardware boundaries.
- Excellent problem-solving and analytical skills with a thorough approach.
- Most importantly: a strong curiosity, willingness to dive deep into unfamiliar problems, and an eagerness to learn and grow in a fast-evolving field.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.