remote
Principal Machine Learning & Data Engineer
Principal Machine Learning & Data Engineer
Twilio is seeking a Principal Machine Learning & Data Engineer to lead the design, build, and operation of their internal ML-and-data platform. This role involves architecting cloud-native pipelines, model-serving infrastructure, and developer tooling on AWS to enable rapid and safe iteration for product teams.
About the role
About the Role
Join the team as Twilio’s next L5 Machine Learning & Data Engineer to lead the design, build, and operation of the internal ML-and-data platform that powers every customer interaction. You will architect cloud-native pipelines, model-serving infrastructure, and developer tooling that allow Twilio’s product teams to iterate rapidly and safely at scale, advancing our mission to unlock the imagination of builders.
Responsibilities
- Architect and evolve Twilio’s end-to-end ML and real-time data platforms for reliability, security, and cost efficiency.
- Design scalable feature stores, streaming and batch pipelines, and low-latency model-serving layers on AWS.
- Implement MLOps best practices—automated testing, CI/CD, monitoring, and rollback—for hundreds of daily deployments.
- Own system design reviews, threat modeling, and performance tuning for high-volume communications workloads.
- Lead cross-functional engineering efforts, breaking down complex initiatives into executable roadmaps.
- Mentor staff and senior engineers, raising the technical bar through code reviews and pair programming.
- Partner with Product, Security, and Compliance to meet stringent privacy and governance requirements (HIPAA, SOC 2, GDPR).
- Champion a culture of experimentation, data-driven decision-making, and continuous improvement.
Required Qualifications
- Bachelor’s or higher in Computer Science, Engineering, Mathematics, or equivalent practical experience.
- 7+ years building and operating production data or machine-learning systems at scale.
- Expert fluency in Python and one compiled language (Java, Scala, Go, or C++).
- Hands-on mastery of distributed data frameworks (Spark/Flink), SQL/NoSQL stores, and streaming platforms (Kafka/Kinesis).
- Demonstrated success designing cloud-native architectures on AWS, including Terraform-managed infrastructure.
- Deep knowledge of container orchestration (Kubernetes/EKS), service-mesh networking, and autoscaling strategies.
- Practical experience implementing MLOps tooling such as MLflow, Kubeflow, SageMaker, or Vertex AI.
- Strong grasp of model-lifecycle concerns—feature engineering, offline/online parity, A/B testing, drift detection, and retraining.
- Proven ability to lead technical projects end-to-end and influence without authority across multiple teams.
- Exceptional written and verbal communication skills, with a bias toward clarity and action.
Desired Qualifications
- Graduate degree focused on machine learning, distributed systems, or applied statistics.
- Contributions to open-source ML or data infrastructure projects.
- Experience with privacy-enhancing technologies (differential privacy, homomorphic encryption) or on-device inference.
- Background in conversational AI, real-time communications, or large-language-model deployment at scale.
- Exposure to compliance-heavy environments (HIPAA, PCI-DSS) and secure multi-tenant design patterns.
- Published research, patents, or conference talks in ML systems or data engineering.