onsite
Senior Software Engineer vLLM - CommonAI C.I.C.
Software Engineer
Senior Software Engineer to design, implement, and scale the open‑source vLLM inference engine, leveraging Python, C++, CUDA, and cloud-native technologies such as Kubernetes and AWS for high‑throughput LLM serving.
About the role
Key Responsibilities
- Architect, develop, and maintain the vLLM inference engine to deliver low‑latency, high‑throughput LLM serving.
- Implement performance‑critical components in C++/CUDA and integrate them with Python APIs.
- Design and operate cloud‑native deployment pipelines using Kubernetes and AWS services for scalable production workloads.
- Collaborate with open‑source contributors and internal research teams to incorporate the latest model optimizations and safety features.
- Write comprehensive tests, documentation, and monitoring tools to ensure reliability and observability in production.
Requirements
- 5+ years of software engineering experience, with deep expertise in Python and C++ development.
- Strong background in GPU programming (CUDA) and performance optimization for large language models.
- Hands‑on experience deploying distributed AI workloads on Kubernetes and cloud platforms such as AWS.
- Proficiency with deep learning frameworks, especially PyTorch, and familiarity with LLM architectures.
- Track record of contributing to open‑source projects and working in collaborative, fast‑paced engineering environments.
Skills
pythonccudapytorchkubernetesaws