remote
Data Engineer - General Dynamics Information Technology
Data Engineer
Data Engineer building scalable data pipelines and analytics platforms for defense missions, leveraging Python, SQL, Spark, and AWS to transform and secure large datasets across the DoD ecosystem.
About the role
Key Responsibilities
- Design, develop, and maintain robust data pipelines using Python, SQL, and Apache Spark to ingest, transform, and load data from diverse sources into the Advana platform.
- Implement scalable ETL processes on AWS (EMR, S3, Glue) ensuring data quality, lineage, and compliance with DoD security standards.
- Collaborate with data scientists and analysts to model data, create reusable data assets, and support advanced analytics and AI initiatives.
- Monitor pipeline performance, troubleshoot issues, and optimize for cost and speed across distributed environments.
- Document architecture, data flows, and best practices to enable knowledge sharing and maintainability.
Requirements
- Bachelor’s degree in Computer Science, Engineering, or related field; advanced degree preferred.
- 3+ years of experience building production data pipelines in a cloud environment, preferably AWS.
- Strong proficiency in Python, SQL, and Spark (PySpark) for data processing.
- Hands‑on experience with AWS services such as S3, EMR, Glue, Redshift, and IAM.
- Solid understanding of data modeling, ETL design patterns, and data governance principles.
Skills
pythonsqlapache sparkaws