remote
Data Scientist / Data Engineer - Caterpillar
Data Scientist
Develop and deploy data pipelines and machine‑learning models on cloud platforms, turning raw data into actionable insights for industrial applications.
About the role
Key Responsibilities
- Design, build, and maintain scalable data pipelines using Python, SQL, and Apache Spark.
- Develop, train, and operationalize machine‑learning models to support predictive maintenance and optimization initiatives.
- Collaborate with cross‑functional teams to translate business requirements into data solutions.
- Implement cloud‑native services on AWS for data storage, processing, and model deployment.
- Monitor data quality, performance, and model accuracy, applying continuous improvements.
Requirements
- Strong proficiency in Python and SQL for data manipulation and analysis.
- Experience with machine‑learning frameworks (e.g., scikit‑learn, TensorFlow, PyTorch).
- Hands‑on experience building data pipelines with Apache Spark or similar big‑data technologies.
- Practical knowledge of AWS services such as S3, Redshift, Lambda, and SageMaker.
- Solid understanding of data modeling, ETL processes, and software engineering best practices.
Skills
pythonsqlmachine learningawsapache spark