remote
Software Engineer II, Backend - Reliability Platform - Affirm
Software Engineer
Backend engineer focused on building a next‑generation reliability platform, combining distributed systems design, observability tooling, and AI‑assisted development to improve service health and debugging across production environments.
About the role
Key Responsibilities
- Design and implement core components of a reliability platform that aggregates telemetry, alerts, and diagnostics for production services.
- Develop scalable, fault‑tolerant services using Go and Kubernetes on AWS, ensuring high availability and low latency.
- Integrate AI‑assisted analysis and machine‑learning models to surface anomalies, predict failures, and recommend remediation steps.
- Collaborate with cross‑functional engineering teams to embed observability best practices into their services and APIs.
- Maintain rigorous testing, code review, and continuous‑delivery pipelines to uphold reliability and quality standards.
Requirements
- 2+ years of professional experience building backend systems in Go or a comparable language.
- Strong understanding of distributed systems concepts, including consistency, fault tolerance, and scalability.
- Hands‑on experience with Kubernetes, container orchestration, and cloud services (AWS preferred).
- Familiarity with observability stacks (metrics, tracing, logging) and modern monitoring tools.
- Exposure to machine‑learning or AI‑assisted development workflows is a plus.
Skills
kubernetesawsgomachine learning