remote
Senior Platform Reliability Engineer - Manulife
Software Engineer
Senior Platform Reliability Engineer driving scalable, secure cloud infrastructure with AWS, Kubernetes, and Terraform, ensuring high‑performance, automated environments through CI/CD pipelines and robust monitoring.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, secure cloud‑based infrastructure on AWS, leveraging Kubernetes and Terraform for infrastructure as code.
- Develop and manage CI/CD pipelines to automate application deployments, configuration changes, and infrastructure updates.
- Implement comprehensive monitoring, alerting, and logging solutions to ensure platform reliability and rapid incident response.
- Collaborate with cross‑functional teams to define and enforce platform standards, best practices, and security controls.
- Continuously evaluate and adopt emerging DevOps tools and techniques to improve automation, scalability, and resilience.
Requirements
- 5+ years of experience in platform reliability or SRE roles within cloud environments.
- Proficiency with AWS services (EC2, EKS, RDS, CloudWatch) and Kubernetes cluster management.
- Strong scripting skills in Python or Bash and experience with Terraform or similar IaC tools.
- Hands‑on experience building and maintaining CI/CD pipelines using tools such as GitHub Actions, Jenkins, or ArgoCD.
- Deep understanding of monitoring, logging, and alerting frameworks (Prometheus, Grafana, ELK).
Skills
awskubernetesterraformcicdpython