remoteonsite
Data Engineer - NTT DATA
Data Engineer
Data Engineer role focused on building and maintaining LLM‑based GenAI RAG pipelines and agentic workflows on AWS, leveraging Python, OpenSearch/Elasticsearch, Docker, CI/CD, and MLOps tooling to deliver scalable, high‑performance data solutions.
About the role
Key Responsibilities
- Design, develop, and deploy LLM‑based GenAI Retrieval Augmented Generation (RAG) pipelines and agentic workflows (MCP) using Python and AWS services.
- Build and maintain scalable data ingestion, processing, and storage solutions on AWS, integrating OpenSearch/Elasticsearch for search and analytics.
- Implement containerized microservices with Docker, ensuring robust CI/CD pipelines and automated testing for rapid delivery.
- Collaborate with data scientists and product teams to translate business requirements into technical specifications and production‑ready models.
- Monitor, troubleshoot, and optimize data pipelines, ensuring high availability, performance, and security compliance.
Requirements
- Proven experience as a Data Engineer or similar role, with strong Python programming skills.
- Hands‑on expertise with AWS services (Lambda, S3, Glue, Athena, SageMaker, etc.) and experience building data pipelines on the cloud.
- Solid knowledge of OpenSearch/Elasticsearch for indexing, searching, and analytics.
- Experience with Docker, CI/CD, and MLOps tooling (e.g., MLflow, Kubeflow, or similar).
- Familiarity with LLMs, GenAI concepts, and RAG architectures is highly desirable.
Skills
pythonawsdockercicdmlops