ANIT K

Data Engineer

Key Strengths

Extensive experience (7+ years overall, 5+ in Big Data) in Data Engineering, aligning well with the target role.
Strong proficiency in PySpark and Spark for complex data processing, transformation, and analysis, including RDDs and DataFrames.
Hands-on experience with key AWS Big Data services such as S3, EMR, CloudWatch, Lambda, EC2, and IAM.
Expertise in Hive for structured data storage, including performance optimization techniques like partitioning and bucketing.
Proficiency in data ingestion from diverse sources (MySQL, Oracle SQL, Teradata, RDBMS, HDFS) using Spark and Sqoop, including incremental loads.
Experience with various data storage formats (Orc, Avro, Parquet, CSV) based on customer requirements.
Familiarity with workflow orchestration tools (Control-M, Autosys) and CI/CD pipelines for automated software development processes.
Demonstrated ability to implement data validation and quality checks using Spark.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate exhibits a strong cultural fit for a Data Engineer role, primarily through projects 1, 2, and 3, which showcase extensive experience in Big Data technologies, cloud platforms (AWS), and data pipeline development. The emphasis on collaboration with BAs and participation in daily scrums aligns with a team-oriented, agile environment. The breadth of skills across various Big Data components, programming languages, and databases suggests a versatile and adaptable professional. However, Project 4, a 'Quality Analyst' role, appears less aligned with the core Data Engineering responsibilities and might warrant further discussion to understand its relevance to the candidate's current career trajectory.

Soft Skills & Operational Fit

The candidate demonstrates strong operational fit through experience in diagnosing and troubleshooting errors, monitoring data processing environments (AWS CloudWatch, log monitoring), and optimizing performance. Collaboration with Business Analysts and active participation in daily scrum meetings highlight good communication and teamwork skills. Adherence to Agile methodology further indicates adaptability to modern development practices.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

Possessing over 7+ years of valuable experience in the IT industry. 5+ years of overall IT experience in Application Development in SQL, Big Data Hadoop, HDFS, SQOOP, HIVE, MAPREDUCE, PySpark. Adhered to Agile methodology and utilized Jira for effective management of user stories, ensuring streamlined project development. Demonstrated a solid understanding of PySpark, encompassing both Data Frames and Resilient Distributed Datasets (RDDs). Proficient in optimizing Hive queries and Spark jobs, ensuring efficient and high-performance data processing. Skilled in Spark transformations and actions, facilitating robust data manipulation and analysis. Worked extensively with various AWS services, including EC2, Athena, Glue, EMR, and S3, showcasing versatile cloud computing expertise. Experienced in handling diverse file formats such as CSV, ORC, Avro, and Parquet, adapting to varied data storage and processing requirements. Proficient in schedule management tools like Control-M and Airflow, ensuring the smooth execution of data workflows and job schedules. Hands on experience in Snowflake. Utilized Continuous Integration/Continuous Deployment (CI/CD) pipelines for code build, testing, and deployment, promoting a systematic and automated approach to software development processes.

Top Skills

PysparkSQL ServerSparkGithub

Projects

Project:1

May 27, 2026 – Present

Diagnose and troubleshoot errors occurring during job processing to ensure seamless data flow. Retrieve data from an Amazon S3 bucket for processing, employing PySpark and Spark SQL on AWS EMR to transform and subsequently load processed data back into S3. Create tables and populate Hive tables with data using Spark, facilitating structured storage. Utilize AWS CloudWatch to monitor and track node-level events, ensuring effective oversight of the processing environment. Collaborate closely with Business Analysts (BAs) to comprehend and align with project requirements. Implement data validation and quality checks using Spark, ensuring accuracy and reliability in data processing. Develop and maintain PySpark and Python programs/scripts, including the creation of data frames and their transformations. Transfer processed final data to Amazon S3 in various file formats (Orc, Avro, Parquet, CSV) based on customer requirements, using PySpark scripts. Actively participate in daily scrum meetings to gather and align with project requirements. Engage in data loading activities, orchestrating the transfer of data from diverse sources like MySQL, Oracle SQL, and Teradata using Spark. Optimize Hive query performance by implementing Partitioning and Bucketing techniques for effective data distribution. Monitor logs closely and address any issues that arise during the data processing, ensuring a smooth and error-free workflow.

Project:2

May 27, 2026 – Present

Demonstrated advanced proficiency in implementing Spark RDD transformations and actions to support comprehensive business analysis. Prepared filtered data in text format for business analysis, strategically storing specific data in separate Hive tables as per specific requirements. Developed Spark and Python scripts utilizing the DataFrame API as a substitute for Spark SQL, enhancing flexibility and customization. Established Hive tables with partitions and buckets to optimize data organization, enhancing efficiency in data retrieval. Leveraged Spark SQL queries for processing Hive tables, making optimal use of Hive context for seamless integration. Facilitated the import of data from Relational Database Management Systems (RDBMS) to Hive and Hadoop Distributed File System (HDFS) using Sqoop. Executed Sqoop incremental loads to populate Hive external tables, ensuring real-time updates and synchronization. Imported data from diverse sources like HDFS and RDBMS into Spark RDD, enabling comprehensive data processing. Formulated Spark SQL queries to efficiently process Hive tables using the Hive context, optimizing query performance

Project : 3

May 27, 2026 – Present

Attending daily status meeting with business users to discuss on open issues. Involved in transferring files from Local to HDFS. Involved in writing queries with Hive QL. Involved in database connection by using SQOOP. Process and analyze the data from Hive tables using HiveQL. Analyzed transactions using Hive scripts and Hive to generate reports for end users. Preparing shell scripts for checks and balances related to Ingestion jobs. Workflow generation for Sqoop import and export jobs. Creating Hive tables with appropriate partitions and mapping with ingested data. Tuning import/export jobs to support faster data transfer between Hadoop & source systems. Creating Hive queries for analysis and involved in performance tuning of Hive. Processing Hive tables using Spark and making the final data available for visualization.

PROJECT:4

May 27, 2026 – Present

Experience in SharePoint, SQL, Advanced Excel, MS Project, Power point and outlook. Support and co-ordination with various cross functional teams to meet project timeline. Responsible for supporting in preparing and maintaining the tracker for audits, SOPs and policies. Responsible for SharePoint site management support activities. Responsible for maintaining database. Responsible to communication between different departments. Responsible for project related documents are in the database, as per the SOPs or department standard. Plan, organize and manage workload, sometimes working across multiple studies. Ensuring quality control of statistical output is executed thoroughly and in accordance with SOPs. Ensuring all statistical work is complete to a high standard and in accordance with SOPs. Complete day-to-day tasks ensuring quality and productivity.

Key Strengths

Extensive experience (7+ years overall, 5+ in Big Data) in Data Engineering, aligning well with the target role.
Strong proficiency in PySpark and Spark for complex data processing, transformation, and analysis, including RDDs and DataFrames.
Hands-on experience with key AWS Big Data services such as S3, EMR, CloudWatch, Lambda, EC2, and IAM.
Expertise in Hive for structured data storage, including performance optimization techniques like partitioning and bucketing.
Proficiency in data ingestion from diverse sources (MySQL, Oracle SQL, Teradata, RDBMS, HDFS) using Spark and Sqoop, including incremental loads.
Experience with various data storage formats (Orc, Avro, Parquet, CSV) based on customer requirements.
Familiarity with workflow orchestration tools (Control-M, Autosys) and CI/CD pipelines for automated software development processes.
Demonstrated ability to implement data validation and quality checks using Spark.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

ANIT K

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Key Strengths

Cultural & Operational Fit