
Software Engineer
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
#Machine Learning# #NLP# #Data Science# #Information Extraction# #Recommendation system# #Search# #Ranking#
Carnegie Mellon University
Master of Science (M.S.), Computational Biology
January 1, 2015 – January 1, 2017
Shanghai Jiao Tong University
Bachelor's degree, Biomathematics, Bioinformatics, and Computational Biology
January 1, 2011 – January 1, 2015
Software Engineer
November 1, 2021 – Present
Petuum, Inc.
Software Engineer II (Machine Learning)
February 1, 2018 – October 1, 2021
San Francisco Bay Area
Agile SDE, LLC
Data Scientist
July 1, 2017 – February 1, 2018
San Francisco Bay Area
Carnegie Mellon University
Graduate Research Assistant - SCS - CBD - Langmead Lab
January 1, 2017 – March 1, 2017
Greater Pittsburgh Region
Amyris
Scientific Computing Intern
June 1, 2016 – August 1, 2016
Emeryville, CA
Survival Prediction on Kaggle Titanic
February 1, 2017 – March 1, 2017
· Set up a Hadoop mapreduce version of KNN classifier from scratch · Performed data pre-processing and feature engineering on Titanic dataset, creating 3 new features · Improved the accuracy of KNN classifier from baseline of 74% to 90.43%, with precision and recall highly increased · Built up Gradient Boosted Trees classifier with xgboost, implemented grid search optimizing parameters in model tuning, improving accuracy from baseline of 76.08 to 88.52%
Active Learning on Image Classification
October 1, 2016 – December 1, 2016
· Trained an active leaner with Query by Committee(QBC) as query strategy and SVM with rbf kernel as classifier, and compare its performance with a baseline learner using random sampling strategy · Achieved meow than 90% accuracy, which is the same as base learner using only 40% of total samples
TSS Recognition Pipeline Construction-"PyCorn"
March 1, 2016 – May 1, 2016
· Participated in the design of "PyCorn"- a genome-wide transcription start sites(TSS) prediction pipeline of Zea mays, as group technical leader · Built up neural network(NN) to predict if the input genome sequence contains a TSS with scikit-learn package · Improved the prediction ability by optimizing the parameters of NN, such as activation function, number of hidden units, etc. · Designed the input module of PyCorn to deal with large input sequence · Achieved 83.9% accuracy on TSS prediction
Neuroscience meets Deep Learning
February 1, 2016 – April 1, 2016
· Performed data pre-processing, transforming functional magnetic resonance imaging (fMRI) data associated with different words of 9 subjects, to three-dimensional(3D) space · Built up three-dimensional convolutional neural network(3D CNN) model, which is composed of convolution layers, max-pooling layers, a dense layer and a logistic regression layer, to predict the neural activation associated with different categories of words · Achieved three times higher accuracy than a random classifier · Compared 3D CNN prediction ability with other basic machine learning models, such as neural network, random forest, etc.
Automated Recognition of Pancreatic Cancer
January 1, 2016 – May 1, 2016
· Built up SVM classification models to determine whether a patient has pancreatic cancer with proteomic data · Enhanced the classification accuracy to above 80% by Implementing feature extraction and false negative control · Implemented unsupervised feature selection with Gaussian Mixture Model(GMM) · Applied "Affinity Propagation", a clustering method, to find the subtypes of pancreatic cancer · Proved statistical correlation between different pancreatic cancer clusters and clinical symptoms with chi-square test
Image Classification of CIFAR-10 dataset
November 1, 2015 – December 1, 2015
· Extracted visual features of CIFAR-10 dataset, which consists of 5000 pictures in 10 classes · Established classification models based on SVM, softmax regression and k-binary logistic regression, and K-nearest neighbors algorithm · Improved the correctness from 20% to above 50%
Integration and Application of Regulatory-Metabolic Network
December 1, 2014 – May 1, 2015
· Built up an integrated metabolic-regulatory network for yeast based on a new automatic metabolic-regulatory integration algorithm, EGRIN-PROM · Proved a strong phenotype prediction ability of EGRIN-PROM integrated network, by comparing the Matthews correlation coefficient (mcc) with YEASTRACT-PROM network · Performed optimization and simulation of the flux in yeast 7.00 metabolic network, aiming to improve the expression of acetoacetyl CoA · Proposed gene modification strategies including gene knockout and gene over-expression that could improve the expression of acetoacetyl CoA · Proves the accuracy of modification strategies based on pathway analysis
Synthetic Biology Software Design-“EASYBBK”
December 1, 2013 – October 1, 2014
· Participated in functional design of “EASYBBK” – an assistant tool achieving evaluation, visualization and simplification of Biobricks by requirements survey · Designed an assessment model of biobricks based on its status, reliability, feedback and relevant publication · Integrated relevant information and data of all extant biobricks in International Genetically Engineered Machine (iGEM) official Registry · Achieved sequence alignment function against our own database using NCBI Stand-alone BLAST · Won a gold medal in iGEM competition as an EASYBBK software team member Project Website: http://2014.igem.org/Team:SJTU-Software
Mechanism Exploration of Selectivity of PKB Inhibitor
October 1, 2012 – July 1, 2014
· Explored the selectivity mechanism for protein kinase B inhibitors with molecular dynamics simulation(MD) · Conducted 3D-QSAR modeling on PKB inhibitors and PKA inhibitors thus proving its high prediction ability · Revealed possible methods to improve the selectivity of inhibitors based on simulation results
Cultural Fit Analysis
The candidate's background is heavily skewed towards machine learning engineering and computational biology, with a strong research component. While the target role is 'Data Analyst', the candidate's experience is more aligned with advanced data science and ML engineering roles. The projects demonstrate a strong academic and research-driven approach, which might require adaptation to a purely analytical, business-focused data analyst role. The diversity of projects shows intellectual curiosity and a broad technical interest, but the direct relevance to typical data analyst tasks (e.g., SQL, dashboarding, A/B testing, business intelligence) is less apparent.
Soft Skills & Operational Fit
The candidate's project descriptions highlight problem-solving skills, a results-oriented approach (e.g., improving accuracy, reducing reading time), and experience in leading technical aspects of projects (e.g., 'group technical leader' for PyCorn). The experience at Google and Petuum suggests an ability to work in structured, product-focused environments. However, specific soft skills like collaboration, adaptability, or leadership are not explicitly detailed beyond project roles.