Swanand Joshi

Data Analyst

https://www.opentalent.in/swanand-joshi-4334645

Machine Learning at Netflix

Netflix

Key Strengths

Extensive experience in Machine Learning, NLP, and Deep Learning from top-tier companies (Netflix, Meta, Amazon).
Strong background in sentiment analysis, recommendation systems, and misinformation detection, directly relevant to data analysis and insights.
Demonstrated ability to lead ML initiatives and deploy models in production environments.
Master's degree in Computer Science from a reputable university.

Cultural & Operational Fit

Cultural Fit Analysis

The candidate has worked at leading tech companies (Netflix, Meta, Amazon) known for fast-paced, innovative environments. The diverse range of personal projects, from NLP to game development and database management, indicates a broad interest and willingness to explore different technical domains. The volunteer experience suggests a community-oriented aspect. While the experience is heavily skewed towards Machine Learning Research, the target role is 'Data Analyst'. This represents a potential mismatch in primary focus, as the candidate's background is more advanced in ML/AI than typical data analysis roles, which might lead to overqualification or a desire for more advanced ML-focused tasks. The breadth of projects, however, shows adaptability.

Soft Skills & Operational Fit

The candidate's project history, particularly the volunteer work and diverse project portfolio, suggests a proactive and engaged individual. Experience in leading ML initiatives at Meta indicates strong problem-solving and potentially leadership skills. However, without specific psychometric test results, a definitive assessment of work attitude, stress handling, and team collaboration is not possible.

AI is analyzing your overall score…

Identifying your key strengths…

Evaluating your skill match against the job requirements…

Assessing your cultural and operational fit

About

Machine Learning Research and Engineering with expertise in the following domains - 1. Personalization and Recommendation AI Research at Netflix. 2. Large-scale end to end ML for Hate speech/Bullying detection and Misinformation matching at Facebook/Meta. 3. Conversational AI and Natural Language Understanding Research during my time at Amazon Alexa.

Top Skills

Deep LearningReinforcement LearningJavaC++PythonC++MatlabEclipseMysqljspProgrammingSQLDatabasesLinuxHTMLMachine LearningNlpCore JavaContent ManagementMicrosoft OfficeBig DataSolrNltkD3.jsJavaScriptScalable ArchitectureConversational AINatural Language Processing

Education

University of Southern California

Master's degree, Computer Science

January 1, 2015 – January 1, 2017

Pune Institute of Computer Technology

Bachelor of Engineering (B.E.), Information Technology

January 1, 2011 – January 1, 2015

Experience

Netflix

Machine Learning Researcher

January 1, 2023 – Present

Los Gatos, California, United States · On-site

Jnana Prabodhini Foundation

Volunteer

November 1, 2019 – Present

United States · Remote

Projects

Toutiao Q&A Recommendation System

September 1, 2016 – November 1, 2016

Finished #28 out of 1036 international teams in the following competition - Toutiao Q&A is an upcoming mobile social platform, which has around 530 million Toutiao users and a precise recommendation algorithm, which promotes short-form content creation and interaction on mobile devices in the format of Q&A. They strive to match information with the right people, finding the best respondent to the questions, and the best readers to the answers based on the the expert’s area of expertise and the tags related to the questions. Each data record includes expert tags, question data and question distribution data. Given certain questions, the task was to forecast which experts are more likely to answer which questions. Specifically, given each question and each expert, we had to calculate the probability of that expert answering the question. The competition uses Normalized Discounted Cumulative Gain (NDCG) as the as evaluation criteria, using the formula: NDCG@5 * 0.5 + NDCG@10 * 0.5

Sarcasm Detection in Hindi

March 1, 2016 – May 1, 2016

Detection of sarcasm can benefit many sentiment analysis NLP applications, such as review summarization, dialogue systems, opinion mining and review ranking systems. In this project, we define our problem precisely as follows: We formulate sarcasm detection as a classification task. Given a text, the goal is to predict whether it is sarcastic or not. Twitter as a micro-blogging platform offers a diverse range of sarcastic and non-sarcastic tweets. These tweets are available in multiple domains like politics, sports, environment, regional etc. Cross Language Text Classification: We have trained our classifier on tweets available in Hindi and then test it on both Hindi and English tweets and evaluate performance with comments on aspects of language conversion. The biggest challenge of this research paper lies in the feature engineering of the problem. We wish to exploit different language features along with contextualized twitter features to train classifiers. Existing work in the field emphasizes on using NB and SVM for classification using various features formulations. None of the previous work has been done in Hindi language or Cross Language Learning. Our aim is to achieve both. We believe that such a project will help improve the accuracy of sentiment analyses across different languages.

Part of Speech Tagger for Catalan corpus

March 1, 2016 – Present

Implemented a Hidden Markov Model for POS tagging. The corpus used was Catalan. Implemented Viterbi decoding algorithm for output sequence. Final accuracy of 94.04% was achieved on test data

Content Enrichment in Big Data Text Retrieval

February 1, 2016 – April 1, 2016

The objective of the project It is to significantly enrich the metadata, and automatically extracted text and entities from the TREC Polar Dataset, and to make the dataset easily to relate to and to interact with. Key Steps 1. Context Extraction Enrichment – We applied the Tag Ratios algorithm to identify text, and constructed a Tika parser to extract Measurement mentions from text automatically. 2. Metadata Enrichment – We applied the GROBID journal parser with Tika, and extract TEI metadata, and also scientific publication metadata using the Google Scholar API to develop a network of related scientific publications to the Polar dataset, and to map publications to the data. In addition, we classified the data using a common Earth science domain model, ontology, called SWEET, for Semantic Web for Earth and Environmental Terminology (http://sweet.jpl.nasa.gov/). We also createed Digital Object Identifiers (DOIs) for the data. 3. Information Similarity and Clustering – We created clusters of the Polar data using the enriched measurements extracted, and using the enriched metadata, and demonstrated information using Data-Driven-Documents visualizations after ingesting data into Apache Solr. 4. Named Entity Recognition (NER) – We applied geospatial NER using the GeoTopicParser in Apache Tika and using the MEMEX GeoParser tools

Truthful and Deceptive Hotel Reviews Detection

February 1, 2016 – Present

A naive Bayes classifier to identify hotel reviews as either truthful or deceptive, and either positive or negative. word tokens were used as features for classification on real data from hotel corpus. Smoothing and unknown words were handled using Laplace smoothing and set priors. F1 score of 0.87 was achieved on the test data.

Mime Diversity Analysis in Big Data

January 1, 2016 – March 1, 2016

In this project concepts from MIME Taxonomy,data similarity, and regarding learning Byte-based fingerprints of the data via Byte Frequency Analysis (BFA), Byte Frequency Distribution (BFD) Correlation, Byte Frequency Cross-Correlation (BFC), and File Header Trailer (FHT) were employed. We implemented a set of MIME diversity programs and applications that helped in better understanding these unknown types in a rich scientific domain. We then computed BFA, BFC and FHT of these unknown (and other) Polar data types from the dataset, and built a system that allows visual interaction and introspection of the MIME diversity in this dataset. Those classifications improved Tika’s overall ability by suggesting new MIME magic for its database, and improved techniques for MIME detection in the Big Data present in the TREC-DD-Polar dataset.

Mancala Game Engine Development

October 1, 2015 – November 1, 2015

Implemented Mancala game engine using Minimax algorithm with alpha beta pruning.

Spell Suggest and Grammar Checker

September 1, 2015 – October 1, 2015

The spell suggest tool uses language model as a unigram modelled language dataset of words. Edit distances upto 2 are covered to correct the spelling mistake. In grammar chcecking , the given text is parsed into POS tags using Stanford NLP POS tagger. The error detction is carried out using a train POST model of correct english dataset.

Octave Based Sentiment Analysis

August 1, 2015 – September 1, 2015

The application built in Octave trains svm-based classifiers to predict the emotion present in given text message.Logistic Regression algorithm to chat applications was applied . Text messages and word lists were used as features in designing the system

Chess Game Development

July 1, 2014 – August 1, 2014

A GUI based chess game developed in Visual Basic 6 The game provides the chess board and all the valid moves to the player at the current time. Rules such as en passant and promotions are provided.

Benchmarking-SQL-vs-NoSQL

July 1, 2014 – May 1, 2015

Java based tool for comparing the database performances To compare the two trends in databases, MySQL and MongoDB were chosen as the representatives. Datasets of sizes varying from 100 to 1000000 were generated using Python and were compared on various criteria. Output of the benchmarking was represnted in the form of graphs to depict the differences between the perfomances.

Textile Industry Database Management System

July 1, 2014 – October 1, 2014

This business application was deployed on a textile industry. Entire business flow right from supply chain management to customer support service was designed and implemented. The application front end was developed in Visual Studio and Oracle database was used for back end. Visual Studio based enterprise DBMS. Complete design and implementation of a textile industry DBMS in Visual Studio. Analysis of every expense and gain in the form of visually intuitive reports. Hanlding unique way of bill payments and reminders where customers are allowed a duration of debt.

Web Based Student Help Forum

February 1, 2014 – April 1, 2014

Interactive web portal for student help forum. The website allowed students to ask questions, give answers and share knowledge. The website included facilities for upvoting and downvoting an asnwer. The website was developed using Java, JSP, JavaScript, HTML/CSS.

Key Strengths

Extensive experience in Machine Learning, NLP, and Deep Learning from top-tier companies (Netflix, Meta, Amazon).
Strong background in sentiment analysis, recommendation systems, and misinformation detection, directly relevant to data analysis and insights.
Demonstrated ability to lead ML initiatives and deploy models in production environments.
Master's degree in Computer Science from a reputable university.

Cultural & Operational Fit

Cultural Fit Analysis

Soft Skills & Operational Fit

Swanand Joshi

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit