remoteonsite
Principal Software Engineer - NielsenIQ
Software Engineer
Lead the reliability strategy for large‑scale, cloud‑native applications, driving performance, availability, and resilience across the full stack using Python, Node.js, Java, and Angular.
About the role
Key Responsibilities
- Define and execute the reliability roadmap for distributed, cloud‑native services across the full application stack.
- Design and implement observability solutions, including metrics, logs, and tracing, to detect and resolve incidents proactively.
- Lead incident response, root‑cause analysis, and post‑mortem processes to continuously improve system stability.
- Collaborate with development teams to embed SRE best practices into CI/CD pipelines and deployment workflows.
- Mentor and coach engineers on reliability principles, tooling, and automation.
Requirements
- 10+ years of software engineering experience with a focus on reliability and operations.
- Deep expertise in Python, Node.js, Java, and front‑end technologies such as Angular.
- Proven track record building and scaling observability, monitoring, and incident management systems.
- Strong knowledge of cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes).
- Excellent communication skills and a collaborative mindset.