remote
Data Ops Engineer - AI Enablement - UST
Data Engineer
Data Ops Engineer focused on building and maintaining scalable data pipelines while integrating generative AI models. Leverages Python, SQL, Airflow, Spark, and cloud services to deliver high‑quality data for LLM‑driven analytics.
About the role
Key Responsibilities
- Design, develop, and operate end‑to‑end data pipelines using Python, SQL, Apache Airflow, and Spark in a cloud environment.
- Containerize data workloads with Docker and orchestrate them on Kubernetes for reliable, scalable execution.
- Collaborate with AI teams to prepare curated datasets and fine‑tune Large Language Models for generative AI solutions.
- Implement monitoring, logging, and alerting to ensure data quality, pipeline performance, and cost efficiency on AWS.
- Automate data validation, versioning, and metadata management to support reproducible AI experiments.
Requirements
- 3+ years of experience in data/analytics engineering, including hands‑on work with Airflow, Spark, and cloud platforms (AWS preferred).
- Proficiency in Python programming and SQL for data transformation and orchestration.
- Experience containerizing workloads with Docker and managing deployments on Kubernetes.
- Familiarity with Large Language Models, prompt engineering, or generative AI workflows.
- Strong problem‑solving skills, ability to work autonomously in a remote setting, and effective communication with cross‑functional teams.
Skills
pythonsqlapache sparkdockerkubernetesaws