onsite

Site Reliability Engineer - TEKsystems

Site Reliability Engineer

Site Reliability Engineer responsible for ensuring production stability and performance of enterprise applications through end‑to‑end monitoring, Dynatrace APM, incident response, and automation on Linux and cloud platforms.

About the role

Key Responsibilities

Design, implement, and maintain end‑to‑end monitoring and observability solutions, with a focus on Dynatrace APM.
Own incident management lifecycle: detection, triage, root‑cause analysis, and post‑mortem documentation.
Collaborate with engineering and product teams to define service level objectives (SLOs) and reliability targets.
Automate operational tasks using Python or Bash scripts to improve reliability and reduce manual toil.
Manage and optimize Linux‑based production environments, including cloud resources on AWS.

Requirements

3+ years of experience in site reliability, production support, or DevOps roles.
Strong hands‑on experience with Dynatrace or similar APM tools.
Proficiency in Linux system administration and scripting (Python, Bash).
Solid understanding of monitoring, observability, and incident response best practices.
Experience working with cloud platforms, preferably AWS, and infrastructure‑as‑code concepts.

Skills

linuxpythonaws

CompanyTEKsystems

DepartmentEngineering

LocationMaricopa County, United States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 26, 2026