onsite

Site Reliability Engineer - TEKsystems c/o Allegis Group

Site Reliability Engineer

Site Reliability Engineer responsible for ensuring production stability, performance, and reliability of enterprise applications through end‑to‑end monitoring, observability, incident response, and SRE best practices, with a focus on Dynatrace APM.

About the role

Key Responsibilities

Design, implement, and maintain monitoring and observability solutions using Dynatrace to provide real‑time insight into application health and performance.
Develop and automate incident response processes, including alert routing, on‑call rotations, and post‑mortem analysis.
Collaborate with engineering and product teams to embed SRE best practices into the software development lifecycle.
Manage and optimize cloud infrastructure (AWS) and container orchestration platforms (Kubernetes) for high availability and scalability.
Write and maintain automation scripts (Python, Bash) for deployment, configuration, and remediation tasks.

Requirements

3+ years of experience in site reliability, systems engineering, or a related role.
Strong hands‑on experience with Dynatrace or comparable APM tools.
Proficiency in Linux administration and scripting languages such as Python or Bash.
Solid understanding of cloud services (AWS) and container orchestration (Kubernetes).
Demonstrated ability to lead incident management, conduct root‑cause analysis, and drive continuous improvement.

Skills

linuxpythonawskubernetes

CompanyTEKsystems c/o Allegis Group

DepartmentEngineering

LocationSun Lakes, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 25, 2026