Senior DevOps Engineer - AI/ML

apartmentMichael Page placeToa Payoh scheduleFull-time calendar_month 

Senior DevOps Engineer responsible for designing, operating, and optimizing large-scale GPU-based AI and HPC infrastructure across on-prem and cloud environments, ensuring performance, reliability, automation, and secure multi-tenant operations.

Client Details

A global leader renowned for innovative solutions, robust infrastructure, and driving digital transformation headquarterd in Singapore.

Description
  • Design, deploy, and operate scalable GPU clusters supporting AI, ML, and HPC workloads across on-prem and cloud environments
  • Automate GPU resource provisioning, scheduling, and lifecycle management using Kubernetes, IaC, and scripting
  • Build, manage, and optimize CI/CD pipelines for GPU-accelerated applications and AI models
  • Monitor and ensure GPU cluster health, performance, capacity, and availability using modern observability tools
  • Troubleshoot and optimize system-level components including Linux, Kubernetes, Slurm, GPU drivers, CUDA, and high-speed networking
  • Implement performance tuning, benchmarking, and security best practices for multi-tenant GPUaaS platforms
  • Collaborate with cross-functional teams to support users, resolve issues, and continuously improve AI and HPC infrastructure
Profile
  • Bachelor's degree in Computer Science, Engineering, Information Technology, or a related technical discipline
  • Strong Linux system administration experience across Ubuntu, CentOS, Rocky Linux, or similar distributions
  • Hands-on experience with DevOps and infrastructure tools including Kubernetes, Terraform, Ansible, and CI/CD platforms
  • Solid understanding of automation, CI/CD, monitoring, and operational best practices in production environments
  • Proficiency in scripting and automation using Python, Bash, or similar languages
  • Experience or working knowledge of cloud platforms (IaaS/PaaS), GPU architecture, and AI frameworks such as TensorFlow or PyTorch
  • Strong problem-solving, communication, and collaboration skills, with the ability to work effectively across engineering and operations teams

Job Offer

As a growing firm with a tightly-knit team, the successful candidate will get the chance to contribute to a highly performing team while having the autonomy to make certain decisions for the team.

To apply online please click the 'Apply' button. For a confidential discussion about this role please contact Winson Low (Lic No: R22106039/ EA No.: 18C9065) on +65 6416 9865. © Michael Page International Pte Limited, company number 199804751N (including Page Executive (53295516A) and Page Personnel Recruitment Pte Ltd (Registration Number: 201736642C)) operates under the EA Licence Numbers of 18S9099 and 18C9065.

business_centerHigh salary

DevOps Engineer

apartmentMORGAN MCKINLEY PTE. LTD.placeToa Payoh
DevOps Engineer We are seeking an experienced DevOps Engineer to join a leading Systems Integrator (SI) in Singapore. The ideal candidate will have hands-on experience in developing, deploying, and maintaining enterprise-grade software solutions...
local_fire_departmentUrgent

Senior DevOps Engineer - AI/ML

apartmentMichael PageplaceGeylang, 4 km from Toa Payoh
Senior DevOps Engineer responsible for designing, operating, and optimizing large-scale GPU-based AI and HPC infrastructure across on-prem and cloud environments, ensuring performance, reliability, automation, and secure multi-tenant operations...
electric_boltImmediate start

DevOps Engineer

apartmentMorgan McKinleyplaceDowntown Core, 5 km from Toa Payoh
DevOps Engineer The ideal candidate will have hands-on experience in developing, deploying, and maintaining enterprise-grade software solutions, along with a strong passion for DevOps practices and technologies.As a DevOps Engineer, you will apply...