Software Engineer, Data Infrastructure - Research
OpenAI is seeking a Software Engineer, Data Infrastructure - Research to design and implement the dataset infrastructure for their next-generation training stack. This role involves building standardized dataset interfaces, scaling pipelines across GPUs, and ensuring datasets are unified, efficient, and easy for researchers to consume. The engineer will also be responsible for proactive testing, bottleneck resolution, and maintaining data reproducibility.
The Workload team is responsible for designing and running OpenAI’s LLM training and inference infrastructure that powers frontier models at massive scale. Our systems unify how researchers train and serve models, abstracting away the complexity of performance, parallelism, and execution across vast GPU/accelerator fleets. By providing this foundation, the Workload team ensures that researchers can focus on advancing model capabilities while we handle the scale, efficiency, and reliability required to bring those models to life.
We are looking for an engineer to design and implement the dataset infrastructure that powers OpenAI’s next-generation training stack. You will be responsible for building standardized dataset interfaces, scaling pipelines across thousands of GPUs, and proactively testing performance bottlenecks. In this role, you will collaborate closely with the multimodal researchers, and other infra groups to ensure datasets are unified, efficient, and easy to consume.
Posted June 6, 2026