Sean Lachhander

Experience

Memorial Sloan Kettering Cancer Center

Software Engineer V, Bioinformatics

November 1, 2023 – Present

Memorial Sloan Kettering Cancer Center

Software Engineer IV, Bioinformatics

February 1, 2022 – November 1, 2023

Memorial Sloan Kettering Cancer Center

Software Engineer III, Bioinformatics

January 1, 2022 – February 1, 2022

Memorial Sloan Kettering Cancer Center

Software Engineer II, Bioinformatics

June 1, 2020 – January 1, 2022

Memorial Sloan Kettering Cancer Center

Software Engineer I, Bioinformatics

June 1, 2018 – June 1, 2020

University at Albany, SUNY

Graduate Assistant II: Numerical Methods - Scientific Computing

June 1, 2017 – January 1, 2018

Albany, New York Area

GivDapps

Software Engineer — Full Stack

June 1, 2017 – January 1, 2018

Albany, New York

The Research Foundation for SUNY

Machine Learning Engineer

April 1, 2017 – June 1, 2018

Albany, New York

University at Albany, SUNY

Lead Teaching Assistant: Algorithms & Data Structures

January 1, 2017 – June 1, 2017

Albany, New York Area

University at Albany, SUNY

Teaching Assistant: Data Structures

August 1, 2016 – January 1, 2017

Albany, New York Area

Freelance / Contract

Software Engineer

March 1, 2015 – April 1, 2018

Clifton Park, New York

De Joint Records

Audio Engineer & Graphic Designer

June 1, 2013 – January 1, 2017

Queens, New York

Kumon North America, Inc.

Database Analyst / Lead Assistant

June 1, 2010 – September 1, 2017

Clifton Park, New York

Projects

Identifying Software Defect Density in the Aerospace Industry Using Cross-Project Metrics and Imbalanced Learning

January 1, 2021 – December 1, 2021

Aerospace software consists of highly complex and interdependent modules that directly impact the overall project quality which may result in a mission failure. Early defect data can provide quality assurance teams the ability to optimally allocate labor resources to both improve software reliability and reduce development costs. The assemblage of past aerospace software product metric data provides historical defect details used to distinguish the difference between defective modules and non-defective modules. Current defect prediction models face several challenges such as inadequate treatment of software module class imbalances, data heterogeneity, multicollinearity, and the curse of dimensionality. Predictive model deficiencies can lead to substandard and unreliable defect identification performance. These current predictive model shortfalls motivate the need to improve automatically identifying software defect densities in future software products. This implementation can provide optimal insight on defective data to quality assurance and engineering teams early in the software development lifecycle. This project introduces a novel approach to predicting defective aerospace software module densities and providing an alternative to the resource-intensive task of obtaining semi-labeled metric module data from project source files. The proposed cross-project defect prediction model effectively identifies the defect densities of aerospace software products based on several evaluation metrics by using static cross-project metrics. This research establishes a benchmark for the aerospace industry to measure software quality and reliably identify defective software modules in the early stages of the software development lifecycle.

WSDM: KKBox's Music Recommendation Challenge

October 1, 2017 – April 1, 2018

The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018) is challenging you to build a better music recommendation system using a donated dataset from KKBOX. WSDM (pronounced "wisdom") is one of the the premier conferences on web inspired research involving search and data mining. They're committed to publishing original, high quality papers and presentations, with an emphasis on practical but principled novel models. WSDM has challenged the Kaggle ML community to help solve these problems and build a better music recommendation system. The dataset is from KKBOX, Asia’s leading music streaming service, holding the world’s most comprehensive Asia-Pop music library with over 30 million tracks. They currently use a collaborative filtering based algorithm with matrix factorization and word embedding in their recommendation system but believe new techniques could lead to better results.

Wearable: Outfit Recommendation Engine (Cross-Platform Mobile App Development: Android • iOS)

June 1, 2017 – August 1, 2017

As a team, we designed a Clothing Recommendation Engine on React Native for Android and iOS (Cross-Platform Mobile App Development) while running under an Agile Methodology. Technologies used: Python (Backend), ReactJS, React Native, Google Firebase (Database), NumPy, Pandas, Sklearn (cluster, KMeans, spatial, etc.), Scipy (dendrogram, linkage, cophenet, pdist, cdist, etc.), math, etc.

Website Design & Maintenance: Glam by Annie

May 1, 2017 – March 1, 2020

Glam by Annie was done, and completed with a Responsive Web Design approach. A responsive web design suggests that design and development should respond to the user's behavior and environment based on screen size, platform and orientation. This project consists of a mix of flexible grids, layouts, images, JavaScript, and use of CSS media queries.

Automated Modeling of Complex Social Behavior: Topic Control

April 1, 2017 – May 1, 2017

After passing conversational datasets through CoreNLP, and exporting tagged .xml files, I chose to acquire the following features to establish who has the highest rank in Topic Control. The features acquired are local Topic Introduction (LTI), Cite Score (CS), SMT score, and Turn Length Index (TLI). Local Topic Introduction measured who introduced a successful topic first (given the discourse doesn't die after 1-2 mentions). Cite Score measured the extent to which other participants discussed topics introduced by each speaker. SMT Score is also another measure of Topic Control that is based on subsequent mentions of already introduced local topics. Cite Score and SMT Score differ because the SMT Score includes self-citations, counts every mention including the speaker, while Cite Score doesn't include self-citation, counts every mention except the speaker. The Turn Length Index stipulates that more influential speakers take longer turns than those who are less influential. In this project, I compared my results after weighting the features, and computing the summation of the features to determine Topic Control, with the Ground Truth data. The results are promising since it ranks each person exactly the same compared to the Ground Truth. Data Science Scripting Language: Python Imports: TextBlob, nltk, OrderedDict, stopwords, punctuation, inflect, PyDictionary, timeit, defaultdict, os.

Twitter Sentiment Analysis: Machine Learning and Natural Language Processing

February 1, 2017 – March 1, 2017

In this project, I worked with machine learning and sentiment extraction from text. I built a classifier that can distinguish sentiment in text. I wrote a report that discussed how I built the classifier, and then presented the performance of the classifier to Dr. Tomek Strzalkowski. The classifier is based on text features extracted from the training dataset. The format of the contents of the report was: • Feature construction (including unigrams, bigrams, TF-IDF, Part-of-Speech tags, length of words, etc. of the messages). I also discussed why I chose specific features when constructing the classifier. • Description of the classifier: I discussed what the particular classifier I chose (k-Nearest Neighbor/Naïve Bayes), and justified the choice and applicability with the Twitter dataset. • Evaluation technique: I presented how I evaluated my classifier of choice on performance in distinguishing sentiment in Tweets. I discussed applicability of the concept of k-fold cross validation, and presented the metrics used to evaluate the performance of the classifier. • Implementation: I discussed how I preprocessed the data by using stopword removal, stemming, and tokenization over the content of the messages. I extracted the features presented in the dataset, and went on to discuss how I partitioned the dataset for k-fold cross validation, along with what my chosen, 'k' was. I discussed how I calculated the metrics of performance evaluation such as accuracy, precision, recall etc. • Analysis of results: I reported the performance of my classifier. I used charts, graphs, and tables with help of Microsoft Excel to report actual numbers such as the three performance metrics. I reported numbers for each of the k iterations of the k-fold cross validation setup. I also reported the average performance over all k cross validation folds, corresponding to each evaluation metric. • I applied the classifier to the dataset, and presented the results and analyzed the outcomes.

Conversation Analyzation

February 1, 2017 – June 1, 2017

Given a dataset, I used Stanford CoreNLP to parse the chat room conversation and produce an XML file that would tag annotators such as tokenize, ssplit, pos, lemma, ner, parse, and dcoref. Tokenize - Tokenizes the text, splits a sequence of tokens into sentences Ssplit - Splits a sequence of tokens into sentences. POS - Labels tokens with their POS (Part-of-Speech) tag Lemma - Generates the word lemmas for all tokens in the corpus, Ner - Recognizes named (person, location, organization, misc), numerical (money, number, ordinal, percent), and temporal (date, time, duration, set) entities, Parse - Provides full syntactic analysis using both the constituent and the dependency representations, Dcoref - Implements both pronominal and nominal coreference resolution. After producing the XML file, I wrote a Python script to count the number of Pronouns each person had said in the chat. Upon finding the number of pronouns, I split the participants into two groups, "Male" and "Female" and determined the population of Male and Female participants through two separate treatments while allowing the significance level to be 0.05 and the hypothesis to be one-tailed.

Operating System (OS) Development

January 1, 2017 – May 1, 2017

This project consists of writing an Operating System in x86 architecture and consisted of work within a UNIX/Linux environment, systems programming, C language, and computer systems in general.

MP3 Music Organizer

October 1, 2016 – November 1, 2016

The executable version of the program is named organizer. The makefile ensures this. The organizer program supports the following usage: organizer [musiccollectionpath]. If no argument is given, the organizer program should operate on the current working directory. The MP3 files should follow the neat and logical convention of: "tracknum-artist-year-album-trackname.mp3". The organizer program will scan the music collection and will be responsible for arranging the MP3 files in the collection by album, year, and artist and moving the files into the corresponding folders. The program reorganizes the MP3 files in the path so that they are stored in subdirectories. One subdirectory per artist is created. Inside each artist subdirectory, one subdirectory per year for which there is at least one track for this artist is created. Inside the "year" subdirectory, one subdirectory per album for that year and artist is created. Also the MP3 files will be moved to the correct subdirectory and renamed as "tracknum - trackname.mp3". The program additionally creates one directory per year for which there is at least one track in the collection. That directory will contain symbolic links to all albums released that year for which at least one track exists. The symbolic link is to be the directory containing the album, and is named, "artist - album". The program handles errors, and produces a suitable error message to stderr and stop. Some error-checking include: The number of command line arguments is more than one, an unknown switch is provided, a "musiccollectionpath" directory doesn't exist, directory, "musiccollectionpath" can't be accessed, and the contents of directory, "musiccollectionpath" can't be accessed.

SRIC Music: Music Streaming Service

June 1, 2016 – June 1, 2017

SRIC Music is a music streaming service that allows musicians, artists, producers, or remixers to upload their audio files online to be streamed worldwide. SRIC Music has a fully responsive design, allows users to create a profile, explore and discover music. Similar users are able to message the person who uploaded the audio file, create playlists, etc. Technologies used: HTML, CSS, JavaScript, PHP, SQL.

Shamir Secret Sharing and Homomorphism

March 1, 2016 – April 1, 2016

The purpose of this program is to solidify the concepts of Secret Sharing and Homomorphism. Shamir's Secret Sharing is an algorithm in cryptography created by Adi Shamir. It is a form of secret sharing, where a secret is divided into parts, giving each participant its own unique part, where some of the parts or all of them are needed in order to reconstruct the secret. Counting on all participants to combine the secret might be impractical, and therefore sometimes the threshold scheme is used where any, "k" of the parts are sufficient to reconstruct the original secret. Homomorphic secret sharing is a type of secret sharing algorithm in which the secret is encrypted via homomorphic encryption. A homomorphism is a transformation from one algebraic structure into another of the same type so that the structure is preserved. Importantly, this means that for every kind of manipulation of the original data, there is a corresponding manipulation of the transformed data.

Image Encryption Using AES Symmetric Encryption Algorithm

February 1, 2016 – March 1, 2016

Using Java, this program/prototype provides the user with a GUI to select the image he/she would like to encrypt, and the program/prototype will encrypt/decrypt an image file using the AES cryptographic algorithm, then record the time required for encryption and decryption. I've also given this prototype system the option of 4-round reduced AES encryption/decryption for image(s).

Steganography

January 1, 2016 – June 1, 2016

Hide a text document (text.txt) within the same folder as this program into an image (host_image.jpg). The GUI will show the host image in the left panel. The hiding will be performed by replacing the ’n’ lower order bits of image with the ASCII values of the ‘m’ number of characters from the text document. The more text you embed into the image, the lesser the quality the image will be. The GUI will show the value of ’n’ and ‘m’, where ’n’ can vary from 0 to 8 and ‘m’ can be any value between 0 and the maximum number of characters that can be embedded in the image.

Key Strengths

Cultural & Operational Fit

About

Top Skills

Skills

Education

Experience

Projects

Certifications

Key Strengths

Cultural & Operational Fit