remoteonsite

Site Reliability Engineer Incident Manager

Site Reliability Engineer Incident Manager position — see original posting for full details.

About the role

Arkose Labs is on a mission to create an online environment where all consumers are protected from spam and abuse. As a Fast Company 2025 Best Workplace for Innovators, we provide a proactive fraud deterrence platform, Arkose Titan, designed to neutralize modern attacks powered by Agentic AI and LLMs. By combining proprietary intelligence with dynamic friction, we undermine attacker ROI to protect global giants like Microsoft, Meta, and Roblox. Headquartered in San Mateo, CA, we maintain a global presence across APAC, Central and South America, and EMEA.

About the Role

As a Livesite Engineer, you'll own the reliability and operational health of our live production environment. You'll take incidents from detection to resolution, lead post-mortems, manage release changes for your services, and drive platform improvements that reduce toil and improve resilience. You're the primary on-call for your domain and a go-to escalation point for more junior engineers on the team.

The role is based in Brisbane and can be fully remote or hybrid. You'll work primarily within AEST business hours, with some structured overlap with our India and US-based teams.

What You'll Be Doing

Monitor the live production environment to proactively identify potential issues or anomalies before they become incidents.
Respond to P1/P2 alerts and outages — take ownership from detection through resolution, not just escalation.
Serve as incident commander for the company: manage war-room communications, drive diagnosis, and coordinate cross-functional responders.
Manage customer-facing P1 communications — provide clear, timely stakeholder updates and prepare post-incident reports.
Lead post-mortems and RCAs for significant incidents; own action items through to closure and share learnings with the team.
Own and maintain runbooks for your team's services; proactively identify gaps and close them before the next incident.
Own release management for your services — SCOPE change ticket submissions, approval coordination, and rollback planning.
Contribute to SLO/SLA definition for services you own; monitor and report against targets.
Develop and maintain automation scripts, tooling, and monitoring dashboards to reduce toil and improve MTTR.
Contribute to platform engineering efforts that improve reliability or operability.
Mentor Associate Livesite Engineers — pair on incidents, review their documentation, share institutional context.
Act as primary on-call for your service area and escalation point for Associates during their on-call shifts.

What We Want From You

Must Have

Bachelor's degree in Computer Science, Information Technology, or a related field — or equivalent practical experience.
3-5 years of experience as a Livesite Engineer, Site Reliability Engi

About the role

About the Role

The role is based in Brisbane and can be fully remote or hybrid. You'll work primarily within AEST business hours, with some structured overlap with our India and US-based teams.

What You'll Be Doing

Monitor the live production environment to proactively identify potential issues or anomalies before they become incidents.
Respond to P1/P2 alerts and outages — take ownership from detection through resolution, not just escalation.
Serve as incident commander for the company: manage war-room communications, drive diagnosis, and coordinate cross-functional responders.
Manage customer-facing P1 communications — provide clear, timely stakeholder updates and prepare post-incident reports.
Lead post-mortems and RCAs for significant incidents; own action items through to closure and share learnings with the team.
Own and maintain runbooks for your team's services; proactively identify gaps and close them before the next incident.
Own release management for your services — SCOPE change ticket submissions, approval coordination, and rollback planning.
Contribute to SLO/SLA definition for services you own; monitor and report against targets.
Develop and maintain automation scripts, tooling, and monitoring dashboards to reduce toil and improve MTTR.
Contribute to platform engineering efforts that improve reliability or operability.
Mentor Associate Livesite Engineers — pair on incidents, review their documentation, share institutional context.
Act as primary on-call for your service area and escalation point for Associates during their on-call shifts.

What We Want From You

Must Have

Bachelor's degree in Computer Science, Information Technology, or a related field — or equivalent practical experience.
3-5 years of experience as a Livesite Engineer, Site Reliability Engi

Site Reliability Engineer Incident Manager

About the role

Site Reliability Engineer Incident Manager

About the role

Skills