remote
Staff Software Engineer, ML Infrastructure - Snap Inc.
Software Engineer
Lead the design and scaling of machine‑learning infrastructure that powers real‑time features for millions of users, leveraging Python, Spark, Kubernetes, AWS, and TensorFlow to deliver high‑performance, reliable pipelines.
About the role
Key Responsibilities
- Architect and implement end‑to‑end ML pipelines that ingest, process, and serve data at scale for real‑time product features.
- Design and maintain distributed systems using Kubernetes, Spark, and AWS services to ensure high availability and low latency.
- Collaborate with data scientists and product teams to translate research prototypes into production‑ready services.
- Drive performance optimization, monitoring, and observability across the ML stack.
- Mentor junior engineers and champion best practices in code quality, testing, and CI/CD.
Requirements
- 10+ years of software engineering experience with a focus on large‑scale distributed systems.
- Deep expertise in Python, Apache Spark, Kubernetes, and AWS (EC2, S3, EKS, Lambda).
- Hands‑on experience building production ML pipelines with TensorFlow or PyTorch.
- Strong understanding of data modeling, batch and streaming architectures, and performance tuning.
- Excellent communication skills and a proven ability to lead cross‑functional teams.
Skills
pythonapache sparkkubernetesawstensorflow