remote
Staff Reliability Engineer Full Stack - Feeld
Software Engineer
Senior reliability engineer driving end‑to‑end stability for backend and mobile services, building detection, response, and prevention tooling while shaping documentation, runbooks, and quality standards across a distributed team.
About the role
Key Responsibilities
- Design and implement robust monitoring, alerting, and incident‑response frameworks for both backend APIs and mobile integrations.
- Automate reliability improvements using CI/CD pipelines, infrastructure‑as‑code, and self‑service tooling.
- Collaborate with product and engineering squads to embed reliability best practices into feature development and release cycles.
- Develop and maintain runbooks, post‑mortem processes, and documentation that enable rapid, safe incident resolution.
- Mentor engineers on reliability patterns, performance tuning, and fault‑tolerant architecture.
Requirements
- 7+ years of software engineering or site‑reliability experience, with a strong focus on production systems.
- Proficiency in Python and modern cloud platforms (AWS) plus container orchestration (Kubernetes).
- Hands‑on experience with infrastructure‑as‑code tools such as Terraform and CI/CD frameworks.
- Deep understanding of observability stacks, logging, tracing, and metrics for large‑scale services.
- Track record of leading cross‑functional initiatives that improve system stability without slowing delivery.
Skills
pythonkubernetesawsterraformcicd