About the Role
As a Site Reliability Engineer 2, you will help operate and improve Accela 's Civic Platform, ensuring high availability, performance, security, and scalability across our SaaS offerings. Working closely with Cloud Engineering, DevOps, Database Engineering, Security, and Development teams, you will support the reliability and operational excellence of our production environments through automation, observability, incident response, and continuous improvement initiatives.
This role is ideal for an engineer who enjoys solving complex technical challenges, improving system reliability, and building automation that enhances operational efficiency and customer experience.
Specific Responsibilities
- Contribute to the operation, maintenance, and continuous improvement of Accela 's production cloud environments.
- Support platform modernization initiatives, including containerization, cloud-native technologies, and automation efforts.
- Monitor platform health, availability, performance, and capacity using modern observability and monitoring tools.
- Participate in incident response activities, troubleshooting production issues and contributing to Root Cause Analysis efforts.
- Develop and maintain automation, tooling, and scripts that improve reliability, scalability, deployment efficiency, and operational effectiveness.
- Support the implementation and monitoring of service level objectives (SLOs), service level agreements (SLAs), and operational metrics.
- Partner with Development, DevOps, Database Engineering, and Security teams to identify and resolve reliability, performance, and scalability challenges.
- Assist with platform deployments, operational readiness reviews, and change management activities.
- Contribute to observability initiatives through monitoring, logging, metrics collection, and distributed tracing.
- Support compliance-related operational activities associated with SOC 2, HIPAA, FedRAMP, StateRAMP, and PCI-DSS environments.
- Participate in post-incident reviews and contribute to corrective and preventive actions that improve platform stability.
Required Qualifications
- 4+ years of experience in Site Reliability Engineering, Cloud Operations, Systems Engineering, DevOps, Software Engineering, or a related technical discipline.
- Experience supporting cloud-based SaaS environments, preferably within Microsoft Azure.
- Experience with Kubernetes and containerized application environments.
- Working knowledge of scripting and automation using Python, PowerShell, Bash, or similar languages.
- Experience troubleshooting distributed systems across application, infrastructure, networking, and operating system layers.
- Familiarity with monitoring, logging, metrics, and observability platforms.
- Strong analytical and problem-s