remote
Data Analyst - PySpark
Data Analyst
Data Analyst skilled in PySpark and Python to design, monitor, and improve data pipelines while ensuring compliance, quality, and security of large‑scale datasets.
About the role
Key Responsibilities
- Develop, maintain, and optimize PySpark data pipelines for large‑scale batch and streaming workloads.
- Implement data quality checks, validation rules, and monitoring dashboards to guarantee accurate and reliable data.
- Collaborate with governance teams to enforce data compliance policies and security standards across all data assets.
- Investigate and resolve data anomalies, security incidents, and performance bottlenecks.
- Document data flows, lineage, and technical specifications in Jira for traceability and continuous improvement.
Requirements
- 3+ years of experience with PySpark, Python, and SQL in a big‑data environment.
- Strong understanding of data quality frameworks, compliance regulations, and security best practices.
- Proficiency with data‑pipeline orchestration tools and issue‑tracking systems such as Jira.
- Ability to translate business requirements into scalable technical solutions.
- Excellent problem‑solving skills and a collaborative mindset.