remote
Site Reliability Engineer II - CrowdStrike
Site Reliability Engineer
Mid‑level Site Reliability Engineer focused on building and operating scalable, secure cloud infrastructure using Python, Kubernetes, AWS, and IaC tools to ensure high availability and performance of critical services.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, fault‑tolerant services on AWS using Kubernetes and Terraform.
- Develop automation scripts and tools in Python to streamline deployment, configuration management, and incident response.
- Monitor system health, performance, and security metrics; create alerts and dashboards to proactively detect issues.
- Collaborate with development and security teams to embed reliability and compliance best practices into CI/CD pipelines.
- Participate in on‑call rotation, conduct root‑cause analysis, and drive post‑mortem improvements.
Requirements
- 2+ years of experience in site reliability or DevOps roles, with strong Linux administration skills.
- Proficiency in Python scripting and automation frameworks.
- Hands‑on experience with Kubernetes orchestration and AWS services (EC2, S3, RDS, IAM).
- Solid understanding of Infrastructure as Code using Terraform or similar tools.
- Familiarity with CI/CD platforms (Jenkins, GitLab CI, or similar) and monitoring solutions (Prometheus, Grafana, CloudWatch).
Skills
pythonlinuxkubernetesawsterraformcicd