onsite
Lead Data Engineer
Data Engineer
Freshworks is seeking a Lead Data Engineer for its Machine Learning engineering development team. This role involves gathering requirements from ML/DS teams, designing and implementing scalable distributed big data pipelines for ML use-cases, and working with Data Scientists to train, refresh, and serve models using these pipelines.
About the role
Overview
We are looking for a Lead Data Engineer for the Machine Learning (ML) engineering development team. The primary focus will be to gather requirements from ML/DS teams and identify the optimal solution. Then design, implement, monitor and maintain these scalable distributed big data pipelines for different big data ML use-cases. You will be working with Data Scientists to train, refresh and serve models using big data ML pipelines.
Responsibilities
- Collaborate with ML engineers and Data Scientists to gather requirements.
- Design and Implement ETL big data pipelines to train ML models.
- Streaming processing and Batch pipelines using UDFs, ML libraries and load processed data to multiple distributed data sources.
- API programming knowledge to train and server the ML models.
- Selecting and integrating a variety of big data tools and frameworks required for processing.
- Responsible for availability, scalability, reliability, and performance of the big data platform.
Skills And Qualifications
- Minimum of 6+ years relevant experience.
- Proven background in ETL development and large scale data processing.
- Proficiency with Big Data ecosystem - Spark (PySpark), Hadoop, HDFS, HIVE, NoSQL, and modern Cloud Data lakes (Cloudera Data Platform or Deltalake).
- Strong SQL expertise, optimizing complex joins and database concepts.
- Strong programming development experience in languages like Python and Java.
- Experience with building stream-processing systems, using Spark-Streaming.
- Experience with workflow orchestration tools, such as Oozie, Airflow.
- Experience with Unix/Shell or Python scripting.
- Knowledge of AWS is a plus.
- Knowledge of AI/ML and MLOps is a plus.