remote
Systems Support Lead - HighLevel
Software Engineer
Lead the systems support team for a high‑scale SaaS platform, overseeing Linux/Windows environments, cloud infrastructure, automation, and incident response while ensuring reliability and performance for billions of daily API calls.
About the role
Key Responsibilities
- Lead a cross‑functional support team, defining processes for incident detection, escalation, and resolution across Linux, Windows, and cloud environments.
- Design, implement, and maintain automation scripts (Python, Bash) for provisioning, configuration management, and routine maintenance.
- Monitor system health using Prometheus, Grafana, and cloud native tools; proactively identify capacity and performance bottlenecks.
- Collaborate with engineering and product teams to troubleshoot complex issues, perform root‑cause analysis, and drive continuous improvement.
- Manage on‑call rotations, incident post‑mortems, and knowledge‑base documentation following ITIL best practices.
- Oversee cloud resource optimization and security compliance within AWS, including IAM, VPC, and backup strategies.
Requirements
- 5+ years of systems administration experience in large‑scale SaaS environments, with deep expertise in Linux and Windows Server.
- Strong proficiency in Python or Bash scripting for automation and tooling.
- Hands‑on experience with AWS services (EC2, RDS, S3, CloudWatch) and infrastructure‑as‑code concepts.
- Proven track record in incident management, monitoring, and performance tuning of high‑throughput systems.
- ITIL or comparable service‑management certification and excellent leadership/communication skills.
Skills
linuxwindows serverawspythonbashitil