onsite
Technical Operations Engineer - Medlytix
Systems Engineer
Technical Operations Engineer ensuring performance, reliability, and visibility of distributed production systems using monitoring, telemetry, cloud, and data pipeline expertise to drive automation and orchestration.
About the role
Key Responsibilities
- Design, implement, and maintain monitoring dashboards and alerting for production workloads across cloud and on‑prem environments.
- Analyze telemetry data to identify performance bottlenecks, troubleshoot incidents, and recommend capacity planning actions.
- Collaborate with data engineering teams to optimize data pipelines, ensuring high‑throughput and low‑latency processing.
- Support workflow orchestration platforms (e.g., Airflow, Prefect) by creating and maintaining DAGs, managing dependencies, and automating routine tasks.
- Develop and maintain automation scripts and tools to streamline deployment, configuration, and incident response processes.
Requirements
- 3+ years of experience in production operations, monitoring, or site reliability engineering.
- Proficiency with monitoring/telemetry stacks such as Prometheus, Grafana, Datadog, or similar.
- Hands‑on experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes).
- Strong scripting skills in Python or Bash for automation and data pipeline support.
- Excellent problem‑solving skills and ability to work collaboratively in a fast‑paced environment.
Skills
pythonsqlawsdatadogairflow