remote
Site Reliability Engineer West Coast - Sectigo
Site Reliability Engineer
Site Reliability Engineer responsible for building and operating highly available, cloud‑native services on AWS, automating infrastructure with Terraform, and ensuring reliability through monitoring, incident response, and performance tuning.
About the role
Key Responsibilities
- Design, implement, and maintain scalable infrastructure on AWS using Terraform and IaC best practices.
- Operate and support containerized workloads in Kubernetes clusters, ensuring high availability and performance.
- Develop automation scripts and tools in Python to streamline deployment, monitoring, and incident response workflows.
- Implement robust monitoring, alerting, and observability solutions to proactively detect and resolve issues.
- Participate in on‑call rotations, perform root‑cause analysis, and drive continuous improvement of reliability processes.
Requirements
- 3+ years of experience in site reliability or DevOps roles managing production systems.
- Strong proficiency with Linux systems and networking fundamentals.
- Hands‑on experience with Kubernetes orchestration and container runtimes.
- Expertise in infrastructure as code using Terraform (or similar tools).
- Proficient programming/scripting skills in Python and familiarity with cloud platforms, preferably AWS.
Skills
linuxkubernetesterraformpythonaws