remote
Data Engineer - Anika Systems
Data Engineer
Design, build, and optimize scalable data pipelines for federal clients, leveraging ETL/ELT, XBRL processing, Apache Iceberg, and advanced data optimization techniques to deliver trusted analytics and reporting.
About the role
- Design, develop, and maintain robust ETL/ELT pipelines to ingest, transform, and deliver data across enterprise platforms.
- Build scalable data ingestion frameworks for structured and semi-structured data, including XBRL filings and financial datasets.
- Implement data transformation logic to support analytics, reporting, and regulatory use cases.
- Ensure data pipelines are reliable, performant, and scalable in cloud environments.
- Leverage AI-assisted development tools to accelerate pipeline development, testing, and optimization.
- Develop and manage data solutions leveraging AWS services (e.g., S3, Airflow, DAGs, Glue, Lambda, Redshift).
- Implement and optimize Apache Iceberg table formats for large-scale, ACID-compliant data lakes.
- Support lakehouse architectures that unify data lakes and data warehouses.
- Optimize data storage and retrieval strategies for performance and cost efficiency.
- Enable data platforms that support AI/ML workloads and downstream generative AI use cases.
- Design and implement CI/CD pipelines for data pipelines, infrastructure, and analytics code using tools such as GitHub Actions, GitLab CI, Jenkins, or AWS-native services.
- Automate build, test, and deployment processes for ETL pipelines and data platform components.
- Implement DataOps best practices, including version control, automated testing, environment promotion, and rollback strategies.
- Ensure reproducibility, reliability, and governance of data pipeline deployments across environments.
- Integrate AI-driven testing and monitoring tools to improve pipeline quality and reduce operational risk.
- Design and implement materialized views and other performance optimization techniques to improve query efficiency.
- Tune data pipelines and queries for performance, scalability, and cost.
- Implement partitioning, indexing, and caching strategies aligned to workload patterns.
- Develop pipelines to ingest, parse, and normalize XBRL (eXtensible Business Reporting Language) data.
- Support regulatory and financial data use cases requiring high accuracy and traceability.
- Ensure alignment with data standards and validation rules for financial reporting datasets.
- Apply context engineering principles to ensure data is enriched with meaningful metadata, lineage, and business context.
- Collaborate with Data Architects to support data modeling, schema design, and entity relationships.
- Enable downstream analytics and AI use cases by structuring data for usability, discoverability, and governance.
- Integrate pipelines with enterprise data catalogs and metadata management systems.
- Support automated metadata capture, lineage tracking, and data quality monitoring.
- Ensure alignment with data governance frameworks and standards established by OCDO organizations, including AI data readiness and trace