remoteonsite
Data Engineer - Persistent Systems
Data Engineer
Data Engineer responsible for designing, building, and maintaining scalable data pipelines and warehouses using Python, SQL, Spark, and cloud services to enable data‑driven insights.
About the role
Key Responsibilities
- Design, develop, and maintain robust ETL pipelines to ingest, transform, and load data from diverse sources.
- Build and optimize data models and warehouses on cloud platforms (AWS) for high‑performance analytics.
- Implement real‑time streaming solutions using Apache Kafka and process large datasets with Apache Spark.
- Collaborate with data scientists, analysts, and product teams to understand data requirements and deliver reliable data services.
- Monitor pipeline health, troubleshoot issues, and ensure data quality and security compliance.
Requirements
- 3+ years of hands‑on experience in data engineering, preferably in a cloud environment.
- Proficiency in Python and SQL for data manipulation and scripting.
- Strong knowledge of ETL tools, data modeling, and relational/NoSQL databases.
- Experience with AWS services (S3, Redshift, Glue, Lambda) and big‑data technologies such as Spark and Kafka.
- Solid understanding of data governance, security best practices, and performance tuning.
Skills
pythonsqlapache sparkawskafka