remote
Data Center IT Infrastructure Engineer Modiin - Nebius
Devops Engineer
Lead the design, deployment, and maintenance of high‑availability data center infrastructure for a global AI cloud platform, leveraging cloud, GPU orchestration, networking, and storage expertise to support scalable AI/ML workloads.
About the role
Key Responsibilities
- Architect and implement scalable data center solutions that support GPU‑heavy AI workloads, ensuring high availability and performance.
- Collaborate with platform, networking, and storage teams to integrate new hardware and software into the AI cloud stack.
- Monitor, troubleshoot, and optimize infrastructure performance, applying automation and configuration management tools.
- Develop and enforce best practices for security, compliance, and disaster recovery across the data center.
- Lead capacity planning and cost‑optimization initiatives for compute, storage, and networking resources.
Requirements
- 5+ years of experience in data center or large‑scale IT infrastructure engineering.
- Proficiency with cloud platforms (AWS, Azure, GCP) and container orchestration (Kubernetes, Docker).
- Hands‑on experience with GPU clusters, high‑performance networking, and storage technologies.
- Strong scripting skills (Python, Bash) and familiarity with IaC tools (Terraform, Ansible).
- Excellent problem‑solving, communication, and teamwork abilities.