onsite
IT Operations Engineer - 17live
Systems Engineer
Lead the design, deployment, and maintenance of scalable IT infrastructure, ensuring high availability and performance across cloud and on‑prem environments using AWS, monitoring tools, and automation scripts.
About the role
Key Responsibilities
- Design, implement, and manage scalable IT infrastructure across AWS and on‑prem environments.
- Develop and maintain automation scripts (Python, Bash) for deployment, configuration, and monitoring.
- Implement and optimize monitoring solutions (Prometheus, Grafana, CloudWatch) to ensure 99.9% uptime.
- Collaborate with development teams to integrate CI/CD pipelines and enforce infrastructure as code (IaC) practices.
- Troubleshoot and resolve production incidents, performing root‑cause analysis and post‑mortem documentation.
Requirements
- 3+ years of experience in IT operations or system administration.
- Proficient with AWS services (EC2, S3, RDS, CloudFormation).
- Strong scripting skills in Python or Bash and familiarity with configuration management tools.
- Experience with monitoring and alerting platforms (Prometheus, Grafana, CloudWatch).
- Excellent problem‑solving skills and ability to work in a fast‑paced environment.