Lead Software Engineer (ITSM IT Service Operations and Resilience Lead)
Key Responsibilities
Reimagine and enhance core ITSM practices (Incident, Problem, Change, and Knowledge Management) using modern development frameworks and automation tools.
Design, prototype, and implement AI-driven operational tools, including predictive incident detection, automated remediation workflows, intelligent alerting, and large language model (LLM)-based knowledge agents.
Lead the development and deployment of custom automation solutions to improve IT service reliability and reduce manual workload across ITSM domains.
Collaborate with platform teams, enterprise architects, and developers to conceptualize and build next-generation IT operational capabilities.
Provide mentorship and guidance to ITSM IPC (Incident, Problem, Change and DR management) Engineers, ensuring effective execution and governance of ITSM processes aligned with ITIL best practices.
Drive adoption and continuous improvement of ITSM best practices across all IT teams.
Oversee operational aspects of the IT Command Centre and Helpdesk, including:
- Acting as the primary liaison between internal stakeholders and external service providers.
- Monitoring and managing performance of vendor-managed services to ensure SLA and KPI compliance.
- Participating in service reviews, audits, and performance assessments.
- Managing Incident, Problem, and Change Management processes across vendor operations.
- Leading continuous improvement initiatives and service enhancements.
- Supporting escalation management and root cause analysis efforts.
- Bachelor’s Degree in Computer Science, Engineering, or a related field (or equivalent experience).
- 5+ years of experience in IT operations or substantial exposure to ITSM processes and tooling.
- Strong understanding of ITIL framework and ITSM best practices; ITIL v3/v4 certification is preferred.
- Hands-on experience with automation tools, scripting, and AI/ML technologies relevant to IT operations.
- Proficient with ITSM platforms such as ServiceNow, BMC Remedy, or similar tools.
- Demonstrated ability to mentor technical teams and lead cross-functional collaboration.
- Excellent problem-solving, communication, and stakeholder management skills.
- Hands-on software development or scripting experience in Python, JavaScript (Node.js), or similar languages.
- Experience with monitoring and observability platforms like Splunk, Grafana, ScienceLogic, or equivalent (advantageous).
- Familiarity with CI/CD pipelines, GitOps practices, cloud platforms (AWS, Azure, GCP), and Infrastructure-as-Code (IaC) tools (advantageous).
- Proficiency with AI/ML frameworks and tools (e.g.,TensorFlow, scikit-learn, LangChain, OpenAI APIs) is a strong advantage.
- A passion for innovation and continuous improvement.