remote
SRE / DevOps Engineer - Hitachi Energy
Site Reliability Engineer
Experienced SRE/DevOps Engineer to design, automate, and operate cloud-native platforms on AWS, leveraging Kubernetes, Terraform, CI/CD pipelines, and Python scripting to ensure high availability and performance.
About the role
Key Responsibilities
- Design, build, and maintain highly available, scalable infrastructure on AWS using Terraform and IaC best practices.
- Deploy, manage, and optimize containerized workloads with Kubernetes, ensuring reliability and performance.
- Develop and maintain CI/CD pipelines (e.g., GitLab, Jenkins) to automate build, test, and release processes.
- Implement monitoring, logging, and alerting solutions (Prometheus, Grafana, ELK) to proactively detect and resolve incidents.
- Collaborate with development teams to embed reliability, security, and cost‑efficiency into the software delivery lifecycle.
- Write and maintain automation scripts in Python and Bash for routine operational tasks.
Requirements
- 3+ years of hands‑on experience in SRE or DevOps roles, preferably in cloud‑native environments.
- Strong proficiency with AWS services (EC2, S3, RDS, IAM) and infrastructure‑as‑code tools such as Terraform.
- Deep knowledge of Kubernetes orchestration, Helm charts, and container runtimes.
- Experience building CI/CD pipelines and automating workflows using Python, Bash, or similar scripting languages.
- Solid understanding of Linux systems, networking, and monitoring tools (Prometheus, Grafana, ELK).
Skills
kubernetesterraformawspythoncicdlinux