onsite

Data Scientist

Red Hat is seeking a skilled Data Scientist to join the CEE team in Pune, India. This role involves turning business problems into analytical projects, managing data pipelines for predictive analytics, and creating software tools to deliver data and insights, with a focus on AI/ML solutions and collaboration across multi-disciplinary teams.

About the role

About the Role

The Red Hat CEE team is looking for a skilled and well-rounded Data Scientist with excellent programming skills and ability to partner with internal stakeholders. To be successful in this role, you should have an established set of foundational skills and the ability to learn new skills quickly as we modernize platforms and tooling. You must also be able to work with minimal supervision in a fast-paced and ambiguous environment. You will be accountable for implementing opportunities for using in-house analytics packages, translating and manipulating large sets of data, and creating and maintaining software and tools that deliver data and insights to the right people at the right time. Our ideal candidate has interest in AI/ML solutions, has experience collaborating across multi-disciplinary teams, and has demonstrated experience partnering with business leaders to deliver impactful assets and solutions.

Primary Responsibilities

Work closely with team members and stakeholders to turn business problems into analytical projects, translated requirements, and solutions
Work cross-functionally with teams on data migration, translation, and organizational initiatives
Translate large volumes of raw, unstructured data into highly visual and easily digestible formats
Manage data pipelines for predictive analytics modeling, model lifecycle management, and deployment
Recommend ways to improve data reliability, efficiency, and quality
Help create, maintain, and implement tools, libraries, and systems to increase the efficiency and scalability of the team
Develop and maintain proper controls and governance for data access
Communicate data-related challenges and help to prioritize resolutions based on alignment to organizational goals

Required Skills & Experience

Ability to critically analyze data, testing hypothesis, and validating data quality
Ability to problem solve and to test and implement new technologies and tools
Solid grasp of data systems and how they interact with each other
Exceptional analytical skills to detect the source and resolution of highly complex problems
Proficient Python programming skills are required and experience with Python-based analysis frameworks such as pandas a plus
Excellent data manipulation skills required, namely using SQL and the Python Scientific stack (pandas, numpy, sci-kit learn)
Experience extracting unstructured data from REST APIs, NoSQL databases, and object storage (Ceph/S3)
Experience with Linux system administration, shell scripting, and virtualization technology (containers) is required
Mastery of git (version control) and experience with versioning, merge request, review, etc. processes and techniques is required
Experience with distributed computing frameworks (eg., dask, pyspark) preferred
OpenShift application development and administration is a plus
Experience deploying applications using PaaS technologies (e.g,. OpenShift, Airflow) is a plus
Well-versed and a desire to stay on top of the current industry landscape of computer software, programming languages, and technology
Bachelor's degree in a related field (e.g., Computer Science or Software Engineering) with 5+ years of relevant working experience or Masters degree with 3+ years of working experience

About the role

About the Role

Primary Responsibilities

Work closely with team members and stakeholders to turn business problems into analytical projects, translated requirements, and solutions
Work cross-functionally with teams on data migration, translation, and organizational initiatives
Translate large volumes of raw, unstructured data into highly visual and easily digestible formats
Manage data pipelines for predictive analytics modeling, model lifecycle management, and deployment
Recommend ways to improve data reliability, efficiency, and quality
Help create, maintain, and implement tools, libraries, and systems to increase the efficiency and scalability of the team
Develop and maintain proper controls and governance for data access
Communicate data-related challenges and help to prioritize resolutions based on alignment to organizational goals

Required Skills & Experience

Ability to critically analyze data, testing hypothesis, and validating data quality
Ability to problem solve and to test and implement new technologies and tools
Solid grasp of data systems and how they interact with each other
Exceptional analytical skills to detect the source and resolution of highly complex problems
Proficient Python programming skills are required and experience with Python-based analysis frameworks such as pandas a plus
Excellent data manipulation skills required, namely using SQL and the Python Scientific stack (pandas, numpy, sci-kit learn)
Experience extracting unstructured data from REST APIs, NoSQL databases, and object storage (Ceph/S3)
Experience with Linux system administration, shell scripting, and virtualization technology (containers) is required
Mastery of git (version control) and experience with versioning, merge request, review, etc. processes and techniques is required
Experience with distributed computing frameworks (eg., dask, pyspark) preferred
OpenShift application development and administration is a plus
Experience deploying applications using PaaS technologies (e.g,. OpenShift, Airflow) is a plus
Well-versed and a desire to stay on top of the current industry landscape of computer software, programming languages, and technology
Bachelor's degree in a related field (e.g., Computer Science or Software Engineering) with 5+ years of relevant working experience or Masters degree with 3+ years of working experience