onsite
Staff Software Engineer - Agentic AI Infrastructure - Western Governors University
Software Engineer
Lead the design and delivery of scalable, cloud‑native infrastructure for agentic AI platforms, leveraging Python, Kubernetes, and AWS to enable high‑performance, production‑grade machine‑learning workloads.
About the role
Key Responsibilities
- Architect, build, and operate a cloud‑native infrastructure stack that supports large‑scale, agentic AI models and services.
- Design and implement automated provisioning and deployment pipelines using Terraform, CI/CD tools, and container orchestration (Kubernetes).
- Collaborate with data scientists and ML engineers to optimize model training, inference, and monitoring in a distributed environment.
- Ensure reliability, security, and cost‑efficiency of AI workloads on AWS, including networking, storage, and compute resources.
- Mentor engineering teams, establish best practices for observability, scaling, and incident response.
Requirements
- 10+ years of software engineering experience with a focus on cloud infrastructure and large‑scale distributed systems.
- Strong proficiency in Python and extensive experience with Kubernetes and AWS services (EKS, S3, Lambda, etc.).
- Hands‑on expertise in infrastructure‑as‑code (Terraform, CloudFormation) and CI/CD automation.
- Demonstrated ability to design, deploy, and operate production‑grade Machine Learning Ops pipelines.
- Excellent problem‑solving skills and a track record of mentoring technical teams.
Skills
pythonkubernetesawsterraform