remote
Senior AI Data Pipeline Engineer - 42dot
Software Engineer
Lead the design and scaling of high‑throughput, petabyte‑scale data pipelines that feed global AI workloads, leveraging Python, Spark, AWS, Kubernetes and GPU infrastructure to ensure reliability and multi‑region availability.
About the role
Key Responsibilities
- Architect and build high‑performance, scalable data pipelines to ingest and process petabyte‑scale data for AI workloads.
- Design multi‑region data infrastructure ensuring global availability and seamless synchronization.
- Implement flexible branching and logic isolation to support concurrent AI projects.
- Operate and optimize GPU‑enabled data pipelines on large‑scale cloud infrastructure.
- Collaborate with data scientists and ML engineers to meet evolving data needs.
Requirements
- Extensive experience with Python and Apache Spark for large‑scale data processing.
- Proficiency in AWS services (S3, EMR, Glue, Redshift) and Kubernetes orchestration.
- Strong background in GPU infrastructure and high‑throughput system design.
- Hands‑on experience with multi‑region architecture and data synchronization.
- Excellent problem‑solving skills and ability to work in a fast‑paced, global environment.
Skills
pythonapache sparkawskubernetes