onsite
Lead Validation Engineer Data Lake / CSV Engineer - Verista
Software Engineer
Lead Validation Engineer responsible for designing, building, and maintaining data lake pipelines and CSV validation frameworks, leveraging Python, ETL tools, and AWS services to ensure high‑quality, compliant data for life‑science applications.
About the role
Key Responsibilities
- Design and implement scalable data lake architectures to ingest, store, and process large volumes of scientific data.
- Develop robust CSV validation and transformation pipelines using Python and ETL frameworks.
- Automate data quality checks, error handling, and reporting to meet regulatory and client standards.
- Collaborate with data scientists, analysts, and domain experts to translate business requirements into technical solutions.
- Optimize performance and cost of AWS‑based data services, including S3, Glue, and Redshift.
Requirements
- 5+ years of experience in data engineering, with a focus on data lake design and CSV data processing.
- Proficiency in Python scripting and ETL tools (e.g., AWS Glue, Apache Spark).
- Strong knowledge of AWS services such as S3, Lambda, and Redshift.
- Demonstrated ability to implement data validation, cleansing, and quality frameworks.
- Excellent problem‑solving skills and ability to work cross‑functionally in a fast‑paced, regulated environment.