Observability Platform Engineer
Principal Observability Platform Engineer position — see original posting for full details.
Principal Observability Platform Engineer – Nscale
About Nscale
Nscale is the GPU cloud engineered for AI. We provide cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers. Nscale simplifies AI development while enabling superior results, supporting strategic business outcomes such as cost management, rapid innovation, and environmental responsibility.
We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscale r, you’ll build trust through openness and transparency while contributing to the technology that powers the future.
About the Role
Nscale is seeking a Principal Observability Platform Engineer to join our Global CISO Organization. This role is critical in designing, implementing, and scaling observability and security solutions for our GPU cloud infrastructure. You will drive platform security and operational excellence across compute, networking, storage, and control plane systems.
What You’ll Be Doing
Lead security and observability engineering initiatives across distributed, multi-tenant infrastructure.
Identify architectural and systemic risks, and design solutions that are scalable and resilient.
Harden Kubernetes, virtualization layers, GPU workloads, and platform services.
Strengthen identity, authentication, authorization, and secrets management systems.
Partner with Networking teams on secure segmentation and traffic isolation strategies.
Embed automated security validation and guardrails into CI/CD pipelines.
Conduct deep technical design reviews and threat modeling exercises.
Mentor and develop junior engineers, raising the technical bar across the team.
Partner with the CISO to shape long-term platform security and observability strategy.
Represent Nscale externally as a subject matter expert in infrastructure, observability, and cloud security.
About You
Required:
10+ years of hands-on security or observability engineering experience in cloud, hyperscale, or large distributed systems.
Strong software engineering skills (Go, Python, Rust, or similar).
Deep expertise in:
Linux systems internals
Kubernetes and container security
Infrastructure-as-Code (Terraform or equivalent)
Cloud-native architectures
Network security and segmentation
Identity and access management
Proven experience securing multi-tenant environments at scale.
Nice to Have:
Experience building observability platforms and telemetry pipelines.
Familiarity with GPU cloud infrastructure or AI work
Posted June 11, 2026