Overview
Job Title: Data Scientist/Engineer Location: Remote Type: Corp to Corp Start Date: ASAP
Pay Rate: $28-$30 per hour
Contract Length: 12 months - potential conversion to FTE
We are seeking a highly skilled and motivated Data Scientist/Engineer to join our dynamic and innovative team. The ideal candidate will have hands-on experience designing, building, and maintaining scalable data processing pipelines, implementing machine learning solutions, and ensuring data quality across the organization. This role requires a strong technical foundation in Azure cloud platforms, data engineering, and applied data science to support critical business decisions and technological advancements.
Responsibilities
Data Engineering
- Build and Maintain Data Pipelines: Develop and manage scalable data pipelines using Azure Data Factory, Azure Synapse Analytics, or Azure Databricks to process large volumes of data.
- Data Quality and Transformation: Ensure the transformation, cleansing, and ingestion of data from a wide range of structured and unstructured sources with appropriate error handling.
- Optimize Data Storage: Utilize and optimize data storage solutions, such as Azure Data Lake and Blob Storage, to ensure cost-effective and efficient data storage practices.
Machine Learning Support
- Collaboration with ML Engineers and Architects: Work with Machine Learning Engineers and Solution Architects to seamlessly deploy machine learning models into production environments.
- Automated Retraining Pipelines: Build automated systems to monitor model performance, detect model drift, and trigger retraining processes as needed.
- Experiment Reproducibility: Ensure reproducibility of ML experiments by maintaining proper version control for models, data, and code.
Data Analysis and Preprocessing
- Azure Data Lake Storage
- Azure Synapse Analytics
- Azure Data Factory
- Exploratory Data Analysis (EDA): Perform exploratory data analysis using notebooks like Azure Machine Learning Notebooks or Azure Databricks to derive actionable insights.
- Data Quality Assessments: Identify data anomalies, evaluate data quality, and recommend appropriate data cleansing or remediation strategies.
General Responsibilities
- Pipeline Monitoring and Optimization: Continuously monitor the performance of data pipelines and workloads, identifying opportunities for optimization and improvement.
- Collaboration and Communication: Communicate findings and technical requirements effectively with cross-functional teams, including data scientists, software engineers, and business stakeholders.
- Documentation: Document all data workflows, experiments, and model implementations to facilitate knowledge sharing and maintain contin