Capnexus is a comprehensive services provider. Our team consists of outstanding professionals, highly experienced in designing, building, and supporting retail software. We see ourselves as a build-as-a-service provider who follows a repeatable business pattern that can be applied to a variety of platforms and verticals. Having a culture built on outcomes and delivery at the core of the business, Capnexus is providing its customers with a complete suite of services for software development, system analysis, integration, implementation, and support, as well as the option to engage a single team to perform all the services they require.
Who You Are and What You'll Do:
Capnexus is looking for a highly skilled Senior AWS Data Engineer to lead data architecture, pipeline development, and data integrations. This is an exciting opportunity to apply advanced cloud data engineering skills on a platform that leverages generative AI to automate and modernize enterprise workflows.
Responsibilities:
- Participate in data discovery workshops to inventory source systems including property management platforms, marketing channels, and CRM data, and translate findings into data lake architecture requirements.
- Design and implement a multi-zone enterprise data lake on Amazon S3 (raw, conformed, enriched, aggregated) with ingest, cleansing, and business layers including schema versioning, checksum validation, business rule validation, and quarantine/notify workflows on failure.
- Build batch and streaming data ingestion pipelines using AWS Glue, Amazon Kinesis, and containerized ingestion applications across CDP, marketing, and property management data sources.
- Write PySpark and Python ETL code for AWS Glue jobs to transform, cleanse, and enrich data at scale; apply Apache Iceberg table format for ACID-compliant, schema-evolving data lake tables.
- Implement data transformation and orchestration frameworks using AWS Glue ETL and AWS Step Functions; configure AWS Glue Data Catalog with crawlers for automated metadata management and discovery.
- Implement AWS Lake Formation for fine-grained data governance including table-level and column-level permissions, data filters, and resource links — not just IAM-level access controls.
- Configure Amazon Athena for serverless SQL querying across the data lake with performance optimization (Parquet format, partitioning, column pruning, file size management, caching); implement Amazon DynamoDB for sub-second customer profile lookups, with DAX where latency requirements demand it.
- Develop and deploy AWS Lambda functions using AWS Lambda Powertools for structured logging, handler routing, and observability; implement error handling patterns including exponential backoff, retries, dead-letter queues, and CloudWatch alarms.
- Write and maintain Terraform (or CloudFormation/CDK) modules to provision and deploy AWS dat