remote
Senior Staff AI Engineer - On-Prem AI Infrastructure & Agentic Systems - SK hynix memory solutions America
AI Engineer
Lead design and deployment of on‑premise AI infrastructure and agentic systems, leveraging Python, C++, CUDA, and container orchestration to deliver high‑performance, scalable machine‑learning workloads for memory‑intensive applications.
About the role
Key Responsibilities
- Architect and implement on‑prem AI compute platforms, integrating GPU acceleration, Kubernetes, and Docker for scalable model training and inference.
- Design and develop agentic AI systems that autonomously manage resources, optimize workloads, and interact with memory‑centric hardware.
- Collaborate with hardware and firmware teams to align AI software stacks with next‑generation DRAM and NAND architectures.
- Lead performance tuning, profiling, and debugging of large‑scale models using CUDA, Python, and C++.
- Establish best practices for CI/CD pipelines, monitoring, and security in high‑performance AI environments.
Requirements
- 10+ years of software engineering experience with deep expertise in Python, C++, and GPU programming (CUDA).
- Proven track record building and operating containerized AI workloads on Kubernetes and Docker in on‑premise data centers.
- Strong background in machine‑learning frameworks such as TensorFlow or PyTorch and distributed training techniques.
- Hands‑on experience with Linux system administration, performance profiling, and low‑latency networking.
- Demonstrated ability to lead cross‑functional teams and deliver complex AI infrastructure projects at scale.
Skills
pythonccudakubernetesdockerlinuxtensorflowpytorch