onsite
Member of Technical Staff, Platform Engineering - abundant
Software Engineer
Senior platform engineer building scalable, resilient infrastructure for next‑generation AI models, leveraging Kubernetes, Docker, Terraform, and CI/CD pipelines to support high‑throughput data pipelines and model training at scale.
About the role
Key Responsibilities
- Design, implement, and maintain a highly available Kubernetes‑based platform that supports large‑scale data ingestion, model training, and inference workloads.
- Automate infrastructure provisioning and configuration using Terraform, ensuring reproducibility and compliance across environments.
- Develop and maintain CI/CD pipelines for continuous integration, automated testing, and rapid deployment of services and models.
- Collaborate with ML, data, and robotics teams to optimize resource utilization, reduce latency, and improve system reliability.
- Monitor system performance, troubleshoot incidents, and implement proactive scaling and fault‑tolerance strategies.
Requirements
- 5+ years of experience in platform engineering or DevOps roles, with a strong focus on container orchestration and cloud infrastructure.
- Hands‑on experience with Kubernetes, Docker, Terraform, and CI/CD tools such as GitHub Actions, Jenkins, or Argo CD.
- Deep understanding of cloud services (AWS, GCP, or Azure) and experience designing cost‑efficient, scalable architectures.
- Excellent problem‑solving skills, strong communication, and a passion for building robust, high‑performance systems.
Skills
pythonkubernetesdockerterraformcicd