AIML - Site Reliability Engineer, ML Platform & Technology
Singapore, Singapore, Singapore
Software and Services
Apple is a place where extraordinary people gather to do their best work. Together we create products and experiences people once couldn’t have envisioned — and now can’t imagine living without. If you’re excited by the idea of making an impact, joining a team where we pride ourselves in being one of the most diverse and expansive companies in the world, a career with Apple might be your dream job!
If you wish to play a part in revolutionizing how people use their computers and mobile devices; build ground breaking technology for algorithmic search, machine learning, natural language processing & artificial intelligence; and work with the teams building the most scalable big-data systems in existence. This is the role for you!
Key Qualifications
- 2 or more years of experience in a Site Reliability Engineering, observability or ML Ops focused role supporting internet services and distributed systems
- Proficiency in using Go, Python or other higher-level languages for automation, observability and infrastructure management
- Experience building and supporting telemetry, observability and logging solutions for incident, cost and performance management
- Experience with infrastructure or dashboards as code and provisioning tools for Kubernetes and cloud based services
- Working knowledge of open source or commercial monitoring and observability frameworks and platforms such as ELK, Splunk, OpenCensus, Datadog
- Working knowledge of ML Ops systems and tools advantageous
- Good interpersonal skills shown through previous projects or assignments
Description
- Monitor production, staging and development environments for a myriad of services in an agile and dynamic organization.
- Employ metrics for data driven solutions for reliability, performance and service insights
- Design, implement, and extend automation tools for monitoring, logging, ML and data processing pipelines
- Strive to improve the stability, security, efficiency and scalability of production systems by applying software engineering practices.
- Resolve future needs for capacity and investigate new features and products.
- Strong problem solving ability will be used daily; a successful Engineer will take steps on self-initiative basis to isolate issues and resolve root cause through investigative analysis.
- Responsible for writing justifications, incident reports, best practices documentation and solution specifications.
Education & Experience
Bachelor Degree in Computer Science or Computer Engineering or equivalent