onsite
AI Engineer in ML Data
AI Engineer in ML Data
As an AI Engineer in ML Data, you will be responsible for designing and refining data and ML pipelines for scaled distributed training and validation of machine learning models. You will research new reasoning algorithms, develop model benchmarking processes, and build infrastructure for data augmentation and synthetic data generation, collaborating with cross-functional teams to deliver groundbreaking solutions.
About the role
About the role
Join our team as an AI Engineer and help us push the boundaries of what's possible in logical reasoning! We’re looking for a motivated individual to design and refine the data and ML pipelines for scaled distributed training and validation of ML models. You'll work closely with a talented team of AI experts, EBM specialists, formal verification engineers, and software developers to create groundbreaking solutions.
What you'll do
- Research new reasoning algorithms and models
- Develop model benchmarking processes and tools
- Build effective and efficient ML data pipelines
- Adjust frameworks and interfaces to accelerate machine learning development
- Develop the infrastructure for data augmentation pipelines and synthetic data generation
- Collaborate with other teams to understand their pain points and priorities to define milestones of the corresponding roadmaps
- Derive practical solutions and integrate them with the results of other teams to provide the best overall resolution
Qualifications
- You have an M.Sc. focusing on one or more of the following areas: Computer Science, Artificial Intelligence, Mathematics, or a closely related field
- 3+ years of production experience in ML Infra, DataOps, distributed training
- Expertise in programming languages and tools critical for high-performance computing in Python/C++ and machine learning including Deep Learning frameworks like PyTorch /TensorFlow/JAX
- Ability to understand deep learning algorithms, e.g. in natural language processing, reasoning
- Familiarity with Azure/AWS/GCP cloud products for MLOps and DataOps pipelines
- Proficiency with Kubernetes clusters and distributed compute assets
- Strong communication and teamwork skills
- Readiness to explore and promote cutting edge technologies in ML Infrastructure domain and beyond
Bonus Points
- Demonstrated publications in any of the major conferences
- Multi-node and multi-GPU training
- Mathematical Reasoning – discrete math and logic
- Formal Verification - lean