onsite
Site Reliability Engineer Tech Lead - Freddie Mac
Site Reliability Engineer
Lead SRE role driving reliability, automation, and cloud operations for a large-scale housing finance platform, leveraging Kubernetes, Docker, CI/CD pipelines, AWS, and Python to ensure high availability and performance.
About the role
Key Responsibilities
- Design, implement, and maintain highly available, scalable infrastructure on AWS using Kubernetes and Docker containers.
- Develop and manage CI/CD pipelines to automate build, test, and deployment processes.
- Implement monitoring, alerting, and incident response using Prometheus, Grafana, and PagerDuty.
- Collaborate with development teams to embed reliability best practices into code reviews and release cycles.
- Lead capacity planning, performance tuning, and cost optimization initiatives.
Requirements
- 5+ years of experience in site reliability engineering or DevOps roles.
- Proficiency with Kubernetes, Docker, and cloud-native tooling.
- Strong scripting skills in Python and Bash.
- Hands‑on experience with CI/CD tools (GitLab CI, Jenkins, ArgoCD).
- Solid understanding of monitoring, logging, and alerting frameworks.
Skills
kubernetesdockercicdawspythonprometheus