remote
Senior. SRE II - Remote - SRE II Filevine
Site Reliability Engineer
Senior Site Reliability Engineer to design and maintain scalable cloud infrastructure for legal AI applications, with focus on automation, monitoring, and incident management.
About the role
Key Responsibilities
- Design and maintain scalable, highly available cloud infrastructure to support legal AI applications
- Implement and optimize monitoring, logging, and alerting systems for system health and performance
- Lead incident response and post-mortem analysis to drive reliability improvements
- Automate operational tasks using infrastructure-as-code and configuration management tools
- Collaborate with development teams to ensure reliable deployment pipelines and system architecture
- Optimize system performance, cost, and resource utilization through continuous improvement
Requirements
- 5+ years of experience in Site Reliability Engineering or related DevOps roles
- Strong expertise in cloud platforms (AWS, GCP, or Azure) and containerization technologies
- Experience with infrastructure-as-code tools (Terraform, CloudFormation) and CI/CD pipelines
- Proficiency in scripting languages (Python, Bash) for automation tasks
- Deep understanding of system reliability principles and performance optimization
Skills
kubernetesprometheusgrafana