Software Engineer
Lead end‑to‑end system software architecture for NVIDIA's Orbital Data Center, designing the full stack from host OS and GPU/CPU drivers to firmware, telemetry, and inference workloads that operate reliably in low‑Earth orbit environments.
Space-1 is NVIDIA 's first Orbital Data Center (ODC) module — a Vera Rubin–class compute platform engineered for low-Earth orbit mission. It is the first step in a multi-generation orbital roadmap to speed up AI adoption. We are looking for a strong technical architect to own end-to-end system software architecture for Space-1 and successor orbital platforms. You will architect the full stack — application to libraries, from data center stack to BMC and BIOS firmware, manageability, and telemetry through the host OS, GPU and CPU drivers, and CUDA — to deliver a production-ready inference platform that operates reliably in the radiation, thermal-cycling, and remote-operations environment of LEO. You will partner closely with the orbital hardware system architecture team, drive customer use cases with constellation operators, align architecture with mission requirements, and bring the best orbital AI products to market. Join us at the forefront of technological advancement.
What you'll be doing:
Own system architecture for inference stack and other applications running on this class of products and make it resilient to any fault happening in space.
Co-architect with the orbital hardware system architecture team to define interfaces, partitioning, and trade-offs across silicon, board, firmware, OS, and AI workload layers for 5-year LEO missions.
Own end-to-end system software architecture for Space-1 and successor Orbital Data Center modules — covering data center stack, BMC firmware, BIOS, host OS, GPU/CPU drivers, CUDA, DCGM, and manageability telemetry as a single integrated stack.
Define the manageability architecture for an unreachable, autonomous data center: remote bring-up, in-orbit firmware update, dual-module redundancy, fault containment, recovery from SEU/SEFI events, and telemetry for fleets ranging from tens to millions of nodes.
Architect rad-tolerant system software behaviors — ECC handling, memory scrubbing, latch-up mitigation, deterministic recovery, and graceful degradation through 5 years and up to ~8,000 thermal cycles in dawn–dusk sun-synchronous orbit.
Drive Redfish, MCTP, PLDM, and constellation-level management protocols across BMC, BIOS, and host software so customers can operate orbital fleets with the same tools they use on the ground.
Define minimum BMC feature set, pin budget, boot architecture (rugged M.2 / VPX-class options), and dual-module redundancy strategy in partnership with platform and mechanical engineering.
Partner with cloud and constellation customers (SpaceX, Blue Origin, Starcloud, Planet, Cowboy Space, and others) to translate mission requirements — orbit, duty cycle, NSA PHIPs security, post-quantum networking (CX9), inference SLAs — into actionable platform software architecture.
Drive reliability and optimization in the system software architecture from an orbital data center viewpoint, including correct operation thro
Posted June 21, 2026