onsite

Data Scientist

Mindshift Analytics is seeking an Entry-Level Data Scientist to manage data ingestion, design ETL pipelines, and perform data transformation and visualization. The role involves developing web-based UI/UX for data visualization, conducting time series analysis, and creating API endpoints for data access and custom reports.

About the role

About the Role

Mindshift Analytics is looking for a Data Scientist to join our team. This is an Entry-Level, Full-time position focused on Engineering and Information Technology within the Software Development industry.

Responsibilities and Skills

The ideal candidate will possess a strong set of skills related to data management, analysis, and visualization. Key responsibilities and required skills include:

Data Ingestion: Implement and manage data ingestion from IoT sensors, ensuring efficient and reliable data flow.
ETL Pipeline Management: Design and maintain ETL pipelines using MQTT, REST API, and TCP socket programming, ensuring data integrity.
Data Munging: Perform data transformation and preparation using either R or Python, with a working knowledge of the second language.
UI/UX Development for Data Visualization: Develop web server-based UI/UX for data visualization, preferably using R Shiny, creating interactive visual tools.
Inter-working of Python and R: Integrate and leverage functionalities between Python and R environments for seamless data processing.
Time Series Data Analysis: Conduct time series data classification to identify patterns and anomalies. Develop algorithms for classification problems in temporal data.
API Endpoint Creation: Develop API endpoints for data access and integration, ensuring secure and efficient data exchange with partners.
Custom Report Development: Generate custom reports from diverse raw data sources, tailored to specific client needs. Experience with interactive report generation.
Data Cleaning: Implement data cleaning techniques, including spike removal and noise reduction, to ensure data quality.
Data Pipeline Management: Oversee the data pipeline lifecycle, from ingestion to visualization, focusing on efficiency and scalability. Proficiency in MariaDB / MySQL for database management and querying.
Docker / Kubernetes: Deploying and managing containerized applications at scale in cloud environments.

Preferred Experience

Experience with at least one of the following big data technologies will be highly preferred:

Databricks Interface with R: Databricks native integration with Posit Workbench. Databricks clusters and the Unity Catalog via the sparklyr and pysparklyr packages.
RHIPE: R with Hadoop for big data analytics using MapReduce.
Apache Spark with SparkR: Scalable data processing framework with an R interface for large-scale data analysis.
DBI (Database Interface) with R: Communication between R and various relational database management systems.
Google BigQuery