Senior DevOps Engineer - AI/ML
Senior DevOps Engineer responsible for designing, operating, and optimizing large-scale GPU-based AI and HPC infrastructure across on-prem and cloud environments, ensuring performance, reliability, automation, and secure multi-tenant operations.
Client Details
A global leader renowned for innovative solutions, robust infrastructure, and driving digital transformation headquarterd in Singapore.
Description- Design, deploy, and operate scalable GPU clusters supporting AI, ML, and HPC workloads across on-prem and cloud environments
- Automate GPU resource provisioning, scheduling, and lifecycle management using Kubernetes, IaC, and scripting
- Build, manage, and optimize CI/CD pipelines for GPU-accelerated applications and AI models
- Monitor and ensure GPU cluster health, performance, capacity, and availability using modern observability tools
- Troubleshoot and optimize system-level components including Linux, Kubernetes, Slurm, GPU drivers, CUDA, and high-speed networking
- Implement performance tuning, benchmarking, and security best practices for multi-tenant GPUaaS platforms
- Collaborate with cross-functional teams to support users, resolve issues, and continuously improve AI and HPC infrastructure
- Bachelor's degree in Computer Science, Engineering, Information Technology, or a related technical discipline
- Strong Linux system administration experience across Ubuntu, CentOS, Rocky Linux, or similar distributions
- Hands-on experience with DevOps and infrastructure tools including Kubernetes, Terraform, Ansible, and CI/CD platforms
- Solid understanding of automation, CI/CD, monitoring, and operational best practices in production environments
- Proficiency in scripting and automation using Python, Bash, or similar languages
- Experience or working knowledge of cloud platforms (IaaS/PaaS), GPU architecture, and AI frameworks such as TensorFlow or PyTorch
- Strong problem-solving, communication, and collaboration skills, with the ability to work effectively across engineering and operations teams
Job Offer
As a growing firm with a tightly-knit team, the successful candidate will get the chance to contribute to a highly performing team while having the autonomy to make certain decisions for the team.
To apply online please click the 'Apply' button. For a confidential discussion about this role please contact Winson Low (Lic No: R22106039/ EA No.: 18C9065) on +65 6416 9865. © Michael Page International Pte Limited, company number 199804751N (including Page Executive (53295516A) and Page Personnel Recruitment Pte Ltd (Registration Number: 201736642C)) operates under the EA Licence Numbers of 18S9099 and 18C9065.