onsite
Real Time Operations Solutions Engineer Principal - Lead - aep
Software Engineer
Lead the design and delivery of real‑time operational solutions, driving performance, reliability, and scalability across cloud platforms. Leverage deep expertise in real‑time systems, DevOps, and incident response to mentor teams and shape enterprise‑wide best practices.
About the role
Key Responsibilities
- Architect and implement high‑availability, low‑latency operational solutions for mission‑critical workloads.
- Lead cross‑functional teams in the design, deployment, and continuous improvement of cloud‑native monitoring and alerting pipelines.
- Drive incident response excellence by establishing playbooks, root‑cause analysis processes, and post‑mortem reviews.
- Mentor and coach engineers, fostering a culture of ownership, collaboration, and technical excellence.
- Collaborate with product, security, and compliance stakeholders to ensure solutions meet regulatory and performance standards.
Requirements
- 10+ years of experience in operations engineering, with a focus on real‑time systems and cloud infrastructure.
- Proficiency with AWS services (EC2, ECS, Lambda, CloudWatch, CloudFormation) and container orchestration (EKS, Kubernetes).
- Strong background in DevOps tooling (CI/CD, Git, Terraform, Ansible) and monitoring platforms (Prometheus, Grafana, Datadog).
- Excellent leadership, communication, and problem‑solving skills.
- Experience with incident management frameworks (Runbooks, SRE practices) and security best practices.
Skills
process improvementproject managementoperations management