onsite
Staff/Principal AI Infrastructure Engineer
Staff/Principal AI Infrastructure Engineer
As a Staff/Principal AI Infrastructure Engineer, you will design and implement AI/ML solutions for infrastructure automation, including predictive autoscaling, anomaly detection, and intelligent cost optimization across multi-cloud environments. You will also develop AI-driven monitoring systems, automate incident response, and integrate AI tooling into CI/CD pipelines to enhance reliability and security.
About the role
About the Role
As a Staff/Principal AI Infrastructure Engineer at Xsolla, you will be instrumental in designing, implementing, and maintaining cutting-edge AI/ML-powered solutions to optimize and automate our infrastructure. This role involves working across various aspects of infrastructure, from predictive autoscaling and anomaly detection to intelligent cost optimization and automated remediation in multi-cloud environments, including GCP.
What You Will Do
- Design and implement AI/ML-powered solutions for infrastructure use cases, including predictive autoscaling, anomaly detection, intelligent cost optimization, and automated remediation across GCP and multi-cloud environments.
- Build and maintain AI-driven monitoring and observability systems that correlate logs, metrics, and traces to surface root causes, predict bottlenecks, and reduce mean time to resolution (MTTR).
- Develop and operate automated incident response workflows using AI-powered playbooks that diagnose, contain, and resolve infrastructure issues with minimal manual intervention.
- Integrate AI tooling into CI/CD pipelines to improve deployment reliability, automate test prediction, score release health, and support rollback automation.
- Contribute to the development of internal AI agents and virtual assistants integrated into developer workflows (Slack, IDEs, Confluence) — enabling self-service for provisioning, troubleshooting, and infrastructure guidance.
- Implement AI/ML-based anomaly detection and automated vulnerability management workflows to enhance the security posture of Xsolla's infrastructure.
- Prototype and productionize Generative AI solutions for infrastructure automation, including auto-generation of Terraform/Puppet modules, IaC configurations, runbooks, and change documentation.
- Collaborate with senior engineers and leadership to evolve and execute the infrastructure AI strategy across its implementation phases.
- Maintain clear documentation of AI tools, integrations, and automated workflows; share knowledge and best practices across the team.