onsite
Data Infrastructure Engineer
Data Infrastructure Engineer
As a Data Infrastructure Engineer, you will be responsible for building and maintaining large-scale data processing pipelines and storage systems for machine learning use cases. This includes working with technologies like Spark, Kafka, Kubernetes, and various databases to ensure robust and scalable data solutions.
About the role
About the Role
We are seeking a skilled Data Infrastructure Engineer to join our team. The ideal candidate will have a strong background in building and maintaining large-scale data processing pipelines and storage systems for machine learning use cases.
Responsibilities
- Design and implement infrastructure for large-scale data processing pipelines, covering both batch and streaming methods, utilizing tools such as Spark, Kafka, Apache Flink, and Apache Beam.
- Develop and implement large-scale data storage systems, including feature stores and timeseries databases, specifically for ML applications.
- Demonstrate strong familiarity with various data storage technologies including relational databases, data warehouses, object storage, and timeseries data, along with expertise in database schema design.
- Build observable, debuggable, and verifiably correct data pipelines for external data sources, addressing challenges like data versioning, point-in-time correctness, and evolving schemas.
- Apply strong distributed systems and infrastructure skills, including scaling and debugging Kubernetes services, writing Terraform, and working with orchestration tools like Flyte, Airflow, or Temporal.
- Exhibit strong software engineering skills, writing easy-to-extend and well-tested code.
Our Stack Includes
- Python
- GCP
- Kubernetes
- Terraform
- Flyte
- React/NextJS
- Postgres
- BigQuery
Skills
SparkkafkaApache FlinkApache BeamKubernetesTerraformFlyteAirflowTemporalPythonGCPreactNextJSPostgresBigQuery