onsite

Data Infrastructure Engineer

As a Data Infrastructure Engineer, you will be responsible for building and maintaining large-scale data processing pipelines and storage systems for machine learning use cases. This includes working with technologies like Spark, Kafka, Kubernetes, and various databases to ensure robust and scalable data solutions.

About the role

About the Role

We are seeking a skilled Data Infrastructure Engineer to join our team. The ideal candidate will have a strong background in building and maintaining large-scale data processing pipelines and storage systems for machine learning use cases.

Responsibilities

Design and implement infrastructure for large-scale data processing pipelines, covering both batch and streaming methods, utilizing tools such as Spark, Kafka, Apache Flink, and Apache Beam.
Develop and implement large-scale data storage systems, including feature stores and timeseries databases, specifically for ML applications.
Demonstrate strong familiarity with various data storage technologies including relational databases, data warehouses, object storage, and timeseries data, along with expertise in database schema design.
Build observable, debuggable, and verifiably correct data pipelines for external data sources, addressing challenges like data versioning, point-in-time correctness, and evolving schemas.
Apply strong distributed systems and infrastructure skills, including scaling and debugging Kubernetes services, writing Terraform, and working with orchestration tools like Flyte, Airflow, or Temporal.
Exhibit strong software engineering skills, writing easy-to-extend and well-tested code.

Our Stack Includes

Python
GCP
Kubernetes
Terraform
Flyte
React/NextJS
Postgres
BigQuery

Skills

SparkkafkaApache FlinkApache BeamKubernetesTerraformFlyteAirflowTemporalPythonGCPreactNextJSPostgresBigQuery

CompanyGridmatic

DepartmentEngineering

LocationUnited States

Experience3+ years

Tenurefull-time

LevelMid-Level

Posted June 10, 2026

About the role

About the Role

Responsibilities

Design and implement infrastructure for large-scale data processing pipelines, covering both batch and streaming methods, utilizing tools such as Spark, Kafka, Apache Flink, and Apache Beam.
Develop and implement large-scale data storage systems, including feature stores and timeseries databases, specifically for ML applications.
Demonstrate strong familiarity with various data storage technologies including relational databases, data warehouses, object storage, and timeseries data, along with expertise in database schema design.
Build observable, debuggable, and verifiably correct data pipelines for external data sources, addressing challenges like data versioning, point-in-time correctness, and evolving schemas.
Apply strong distributed systems and infrastructure skills, including scaling and debugging Kubernetes services, writing Terraform, and working with orchestration tools like Flyte, Airflow, or Temporal.
Exhibit strong software engineering skills, writing easy-to-extend and well-tested code.

Our Stack Includes

Python
GCP
Kubernetes
Terraform
Flyte
React/NextJS
Postgres
BigQuery