hybrid
Staff Machine Learning Engineer
Machine Learning Engineer
As a Staff Machine Learning Engineer, you will be a senior individual contributor in the AI & Data Science team, responsible for the end-to-end Machine Learning Development Lifecycle. You will architect, build, and deploy scalable ML systems on AWS, focusing on transforming user-generated content into actionable insights using NLP and Generative AI techniques.
About the role
About the Role
We are seeking a highly experienced and strategic Staff Machine Learning Engineer to join our AI & Data Science team. This is a senior-most individual contributor role where you will be responsible for owning the end-to-end Machine Learning Development Lifecycle (MDLC). You will architect, build, and deploy production-grade, scalable ML systems that transform massive volumes of user-generated content into actionable insights for our customers. This position requires a proven track record of solving complex, unstructured data challenges and a deep expertise in building robust, high-performance systems on the AWS cloud.
Responsibilities
- Architect and Innovate: Lead the design, development, and deployment of complex, production-grade ML systems and data pipelines, particularly for Natural Language Processing (NLP) and Generative AI applications.
- Solve High-Complexity Problems: Serve as a domain expert in the application of AI to solve core business challenges, including sentiment analysis, content moderation, product recommendations, and personalized search.
- Technical Leadership: Drive innovation by identifying and addressing high-impact technical challenges and long-standing technical debt within our ML and data infrastructure.
- Mentorship and Standards: Provide technical mentorship to other engineers on the team and beyond, raising the bar for engineering excellence, maintainability, and best practices across the organization.
- Cross-Functional Collaboration: Collaborate closely with Data Scientists, Product Managers, and other engineering teams to translate complex business requirements into robust, data-driven ML solutions.
- Operational Excellence: Implement and oversee MLOps practices, including automated CI/CD pipelines, model monitoring, and governance, to ensure our systems are reliable, reproducible, and performant at scale.
- Observability: Implement robust observability frameworks to proactively detect and diagnose issues like model drift, data quality anomalies, and performance degradation in production.
Required Skills & Experience
- Experience: Minimum of 8+ years of experience in Machine Learning Engineering, Applied Machine Learning, or a related field, with a proven track record of building and maintaining production models.
- MLOps & AWS: Expert proficiency with the AWS ecosystem for MLOps, including a deep understanding of how to architect solutions using key services like Amazon SageMaker, S3, AWS Step Functions, AWS CloudFormation, Amazon CloudWatch, Amazon Managed Streaming for Apache Kafka (MSK), and Amazon Bedrock.
- Technical Expertise:
- Deep expertise in building and deploying scalable solutions for NLP, including experience with challenges such as sarcasm detection, polysemy, and managing multilingual data.
- Experience with a variety of ML algorithms and models, including traditional supervised and unsupervised learning, deep learning, and modern Generative AI techniques (e.g., LLMs, RAG, Prompt Engineering).
- Proficiency with ML frameworks and libraries such as PyTorch, TensorFlow, and scikit-learn, with an ability to adapt and tune open-source or pre-trained models.
- Software Engineering & Observability: A strong understanding of core software engineering principles, including design patterns, data structures, testing, security, and version control. Experience with continuous integration (CI/CD) and regression testing. You should be able to apply model observability practices for faster issue detection and root cause analysis.
- Problem-Solving: The ability to translate complex business problems into viable technical solutions and communicate findings to stakeholders in non-technical terms.
- Software Engineering: A strong understanding of software engineering principles, including design patterns, data structures, testing, security, and version control.