About the Role
Mindshift Analytics is looking for a Data Scientist to join our team. This is an Entry-Level, Full-time position focused on Engineering and Information Technology within the Software Development industry.
Responsibilities and Skills
The ideal candidate will possess a strong set of skills related to data management, analysis, and visualization. Key responsibilities and required skills include:
- Data Ingestion: Implement and manage data ingestion from IoT sensors, ensuring efficient and reliable data flow.
- ETL Pipeline Management: Design and maintain ETL pipelines using MQTT, REST API, and TCP socket programming, ensuring data integrity.
- Data Munging: Perform data transformation and preparation using either R or Python, with a working knowledge of the second language.
- UI/UX Development for Data Visualization: Develop web server-based UI/UX for data visualization, preferably using R Shiny, creating interactive visual tools.
- Inter-working of Python and R: Integrate and leverage functionalities between Python and R environments for seamless data processing.
- Time Series Data Analysis: Conduct time series data classification to identify patterns and anomalies. Develop algorithms for classification problems in temporal data.
- API Endpoint Creation: Develop API endpoints for data access and integration, ensuring secure and efficient data exchange with partners.
- Custom Report Development: Generate custom reports from diverse raw data sources, tailored to specific client needs. Experience with interactive report generation.
- Data Cleaning: Implement data cleaning techniques, including spike removal and noise reduction, to ensure data quality.
- Data Pipeline Management: Oversee the data pipeline lifecycle, from ingestion to visualization, focusing on efficiency and scalability. Proficiency in MariaDB / MySQL for database management and querying.
- Docker / Kubernetes: Deploying and managing containerized applications at scale in cloud environments.
Preferred Experience
Experience with at least one of the following big data technologies will be highly preferred:
- Databricks Interface with R: Databricks native integration with Posit Workbench. Databricks clusters and the Unity Catalog via the sparklyr and pysparklyr packages.
- RHIPE: R with Hadoop for big data analytics using MapReduce.
- Apache Spark with SparkR: Scalable data processing framework with an R interface for large-scale data analysis.
- DBI (Database Interface) with R: Communication between R and various relational database management systems.
- Google BigQuery