
AI Engineer with less than a year in Data Science & Machine Learning
AI is analyzing your overall score…
Identifying your key strengths…
Evaluating your skill match against the job requirements…
Assessing your cultural and operational fit
Data Science enthusiast with foundational experience in Statistical Analysis, Machine Learning, and Generative AI. Proven ability to develop and evaluate AI/ML models for diverse applications, including time series forecasting, network intrusion detection, and natural language processing. Skilled in Python, R, and various libraries for data manipulation, visualization, and model implementation. Eager to contribute technical expertise to innovative data-driven projects.
St. Xavier's College, Kolkata
M.Sc. in Data Science · Data Science
August 1, 2024 – May 1, 2026
Asutosh College, Kolkata
B.Sc. in Statistics · Statistics
July 1, 2020 – August 1, 2023
Indian Statistical Institute, Kolkata
Data Science Intern
May 1, 2025 – July 1, 2025
Kolkata, West Bengal, India
RAG-based PDF Question Answering System
January 1, 2026 – June 1, 2026
Built a Retrieval-Augmented Generation (RAG) application that lets users upload multiple research papers and ask natural-language questions, with answers generated by Google's Gemini LLM grounded strictly in the retrieved document context to minimize hallucination. Implemented full source traceability by displaying the exact source PDF, page number, and paragraph behind every generated answer, and deployed the app publicly on Streamlit Community Cloud for live demo access.
Toxic Content Detection using Fine-Tuned RoBERTa
January 1, 2026 – June 1, 2026
Fine-tuned RoBERTa-base on the Jigsaw Toxic Comment dataset (159K+ comments) to classify toxic and non-toxic online content, achieving 96.7% accuracy and 0.84 F1-score. Developed and evaluated an end-to-end NLP pipeline, improving performance over a TF-IDF + Logistic Regression baseline (F1: 0.745 → 0.840) through transformer-based transfer learning.
AI-Driven Network Intrusion Detection System (NIDS)
January 1, 2025 – December 31, 2025
Built an end-to-end machine learning pipeline that classifies 15 types of cyberattacks within 2.8M records to achieve a 99.95% Recall rate and a 0.99 F1-score by implementing SMOTE to resolve extreme class imbalances. Carried out a comparative analysis of XGBoost, RF, and Deep Learning (MLP) models, with the dimensionality of the features decreased by 30% using correlation analysis, and concluded that the production model of choice is indeed XGBoost for the remarkable 10x improvement in training efficiency.
Deloitte job simulation experience involving data analysis and forensic technology
Deloitte
June 1, 2026 – Present
Presented a paper on Snowball Sampling in Statosphere
Statosphere
June 1, 2026 – Present
MS Excel Specialization on Coursera
Coursera
June 1, 2026 – Present
Cultural Fit Analysis
The candidate's project portfolio shows a strong interest in diverse AI applications, from RAG systems and NLP to network intrusion detection and time series forecasting. This breadth of interest aligns well with a dynamic AI engineering environment that values continuous learning and exploration of new domains. The academic background and certifications suggest a proactive approach to skill development. However, the limited professional experience means that adaptability to corporate culture and established development workflows is an area that would need further assessment.
Soft Skills & Operational Fit
The candidate's project descriptions indicate a problem-solving mindset, particularly in addressing challenges like class imbalance (SMOTE) and model efficiency (XGBoost vs. others). The deployment of applications on Streamlit Community Cloud suggests an understanding of making solutions accessible. The academic background in Data Science and Statistics provides a strong analytical foundation. However, the lack of professional experience beyond a single internship and the academic nature of most projects mean that operational fit in a fast-paced industry environment, including collaboration within larger teams and handling production-grade systems, is yet to be fully demonstrated.