About the Role
The Red Hat CEE team is looking for a skilled and well-rounded Data Scientist with excellent programming skills and ability to partner with internal stakeholders. To be successful in this role, you should have an established set of foundational skills and the ability to learn new skills quickly as we modernize platforms and tooling. You must also be able to work with minimal supervision in a fast-paced and ambiguous environment. You will be accountable for implementing opportunities for using in-house analytics packages, translating and manipulating large sets of data, and creating and maintaining software and tools that deliver data and insights to the right people at the right time. Our ideal candidate has interest in AI/ML solutions, has experience collaborating across multi-disciplinary teams, and has demonstrated experience partnering with business leaders to deliver impactful assets and solutions.
Primary Responsibilities
- Work closely with team members and stakeholders to turn business problems into analytical projects, translated requirements, and solutions
- Work cross-functionally with teams on data migration, translation, and organizational initiatives
- Translate large volumes of raw, unstructured data into highly visual and easily digestible formats
- Manage data pipelines for predictive analytics modeling, model lifecycle management, and deployment
- Recommend ways to improve data reliability, efficiency, and quality
- Help create, maintain, and implement tools, libraries, and systems to increase the efficiency and scalability of the team
- Develop and maintain proper controls and governance for data access
- Communicate data-related challenges and help to prioritize resolutions based on alignment to organizational goals
Required Skills & Experience
- Ability to critically analyze data, testing hypothesis, and validating data quality
- Ability to problem solve and to test and implement new technologies and tools
- Solid grasp of data systems and how they interact with each other
- Exceptional analytical skills to detect the source and resolution of highly complex problems
- Proficient Python programming skills are required and experience with Python-based analysis frameworks such as pandas a plus
- Excellent data manipulation skills required, namely using SQL and the Python Scientific stack (pandas, numpy, sci-kit learn)
- Experience extracting unstructured data from REST APIs, NoSQL databases, and object storage (Ceph/S3)
- Experience with Linux system administration, shell scripting, and virtualization technology (containers) is required
- Mastery of git (version control) and experience with versioning, merge request, review, etc. processes and techniques is required
- Experience with distributed computing frameworks (eg., dask, pyspark) preferred
- OpenShift application development and administration is a plus
- Experience deploying applications using PaaS technologies (e.g,. OpenShift, Airflow) is a plus
- Well-versed and a desire to stay on top of the current industry landscape of computer software, programming languages, and technology
- Bachelor's degree in a related field (e.g., Computer Science or Software Engineering) with 5+ years of relevant working experience or Masters degree with 3+ years of working experience