onsite
AI Ops Senior Architect - TechVirtue LLC
Systems Engineer
Lead the design and optimization of AI‑driven operational platforms for large‑scale, mission‑critical environments, leveraging machine learning, observability, automation, and cloud engineering.
About the role
Key Responsibilities
- Architect and implement AI‑powered observability and automation frameworks across hybrid cloud environments.
- Design scalable, resilient pipelines that integrate machine‑learning models into SRE/DevOps workflows.
- Lead infrastructure as code initiatives using Terraform and container orchestration with Kubernetes.
- Collaborate with engineering and product teams to define performance, reliability, and security standards.
- Mentor senior engineers and drive best‑practice adoption for AI Ops, CI/CD, and incident response.
Requirements
- 15+ years of experience in cloud engineering, SRE, or DevOps, with a focus on AI‑enabled operations.
- Strong proficiency in Python and modern automation tools (Terraform, Ansible, Helm).
- Deep knowledge of observability stacks (Prometheus, Grafana, OpenTelemetry) and container platforms (Kubernetes, Docker).
- Hands‑on experience with major cloud providers (AWS, Azure, or GCP) and designing highly available architectures.
- Proven track record of integrating machine‑learning models into production monitoring and remediation workflows.
Skills
pythonkubernetesterraformprometheusgrafanaawsmachine learning