onsite
Site Reliability Engineer - Java Spring Boot Applications - Peraton
Site Reliability Engineer
Lead the reliability, performance, and scalability of mission‑critical Java Spring Boot services in a cloud‑native environment, leveraging Docker, Kubernetes, and AWS to ensure high availability and rapid incident response.
About the role
Key Responsibilities
- Design, implement, and maintain highly available Java Spring Boot microservices on Kubernetes clusters.
- Develop and enforce CI/CD pipelines using Git, Jenkins, and Helm for automated deployments.
- Monitor application health with Prometheus, Grafana, and ELK stack; respond to alerts and conduct post‑mortems.
- Collaborate with development teams to optimize performance, reduce latency, and improve fault tolerance.
- Implement security best practices, including IAM, secrets management, and compliance controls in AWS.
Requirements
- 5+ years of experience in Java development and SRE practices.
- Proficient with Spring Boot, Docker, Kubernetes, and AWS services (EKS, ECS, CloudWatch).
- Strong scripting skills in Bash or Python for automation.
- Experience with monitoring, logging, and incident management tools.
- Excellent problem‑solving skills and a proactive, collaborative mindset.
Skills
javadockerkubernetesaws