Data Scientist | Machine Learning & Deep Learning
My learning began with numbers — structured, logical, and certain. But it was in data, with all its imperfections and hidden patterns, that I found real curiosity. Over the last 2+ years, I have worked extensively on machine learning and deep learning projects, not as academic tasks, but as practical explorations of how models learn, fail, improve, and generalize.
I approach data science as a problem-solving discipline — combining mathematical reasoning, algorithmic thinking, and experimentation to extract insight from complex data. I am actively seeking opportunities where I can contribute meaningfully, learn from real-world challenges, and grow as a data scientist in a rigorous environment.
View My Work
I am a mathematics graduate with a strong inclination toward applied problem-solving. After completing my B.Sc. (Hons) in Mathematics from Ramanujan College, University of Delhi, I am currently pursuing an M.Sc. in Mathematics with Computer Science from Jamia Millia Islamia. This academic background has trained me to think rigorously, reason logically, and approach problems with precision.
While theory built my foundation, working with real data shaped my understanding. Over the past 2+ years, I have independently developed multiple machine learning and deep learning projects, gaining hands-on experience in data preprocessing, exploratory data analysis, feature engineering, model training, evaluation, and deployment.
My work spans supervised learning, deep learning-based image classification, NLP-driven sentiment analysis, and semantic search systems using vector embeddings. Through these projects, I have learned how algorithms behave on imperfect data, how assumptions break, and how iterative experimentation leads to meaningful insight.
I am motivated by environments that value clarity of thought, intellectual honesty, and practical impact. I am looking for opportunities where I can apply my skills responsibly, continue learning at depth, and contribute to data-driven decision-making.
Programming: Python, Java, SQL, C, HTML, CSS
Core CS: Object-Oriented Programming, Data Structures & Algorithms
Data Science: Data Cleaning, Feature Engineering, Encoding, Scaling, EDA
Machine Learning: Regression, Classification, Cross-Validation, Hyperparameter Tuning
Algorithms: Logistic Regression, Decision Trees, Random Forest, KNN, SVM, PCA, XGBoost
Deep Learning: MLP, CNN, Transfer Learning (VGG16)
NLP & RAG: BERT, Sentiment Analysis, FAISS, LangChain
Tools: Pandas, NumPy, Scikit-learn, TensorFlow, Keras, FastAPI
Internboot | Sep 2025 – Dec 2025
Worked on end-to-end data science workflows using real-world business datasets. Performed data cleaning, preprocessing, and exploratory data analysis (EDA) to identify trends, anomalies, and data quality issues. Applied feature engineering techniques to improve model input representation and predictive performance. Built and evaluated regression models using Scikit-learn, focusing on error metrics, residual analysis, and model interpretability. Gained hands-on experience in applying machine learning concepts beyond academic datasets, understanding practical constraints and trade-offs. Collaborated with mentors to translate analytical results into actionable insights for decision-making.
International Institute of SDGs and Public Policy Research | Jul 2025 – Oct 2025
Executed the complete machine learning lifecycle on structured datasets aligned with sustainability and public policy research. Conducted detailed exploratory data analysis to uncover patterns, correlations, and data-driven insights relevant to policy outcomes. Performed feature engineering, encoding, and normalization to enhance model robustness and generalization. Developed and compared multiple classification models using Scikit-learn, applying cross-validation and evaluation metrics such as accuracy, precision, recall, and F1-score. Created visualizations and analytical summaries to communicate findings effectively to non-technical stakeholders. Strengthened understanding of applying data science techniques in socially impactful and research-oriented domains.
Developed a supervised machine learning pipeline to predict personality traits (Introvert/Extrovert) from behavioral and survey-based data. Performed extensive data preprocessing including missing value imputation, categorical encoding, feature scaling, and outlier handling to ensure model robustness. Conducted exploratory data analysis (EDA) to identify key behavioral indicators influencing personality classification. Implemented and compared multiple classifiers including Support Vector Machines (SVM) and Multilayer Perceptron (MLP), using K-Fold cross-validation to prevent overfitting and ensure generalization. Achieved a 92% F1-score and 93% recall, indicating strong performance on imbalanced class distributions. Deployed the final model using FastAPI, enabling real-time predictions through RESTful APIs and improving accessibility for downstream applications.
Designed and implemented a deep learning-based image classification system to identify apparel brands such as Nike, Adidas, and Converse from product images. Utilized Convolutional Neural Networks (CNNs) with VGG16 transfer learning, leveraging pretrained ImageNet weights to extract high-level visual features. Fine-tuned upper convolutional layers to adapt the model to domain-specific patterns while reducing training time and computational cost. Applied data augmentation techniques to improve model generalization across varied lighting and image orientations. Used training callbacks such as EarlyStopping and ModelCheckpoint to prevent overfitting and stabilize convergence. Evaluated performance using validation accuracy and loss curves across multiple epochs.
Built a semantic search system to retrieve contextually relevant information from PDF documents such as academic notes and technical material. Extracted and chunked text from PDFs while preserving semantic coherence across sections. Generated dense vector embeddings using HuggingFace sentence transformers and indexed them using FAISS for efficient similarity search. Enabled semantic retrieval based on contextual similarity rather than keyword matching, improving relevance and search quality. Designed the system to be scalable and fast, achieving low-latency retrieval without relying on large language models. Demonstrated practical understanding of vector databases and semantic information retrieval pipelines.
Developed an end-to-end Natural Language Processing pipeline for sentiment analysis under academic supervision. Performed comprehensive text preprocessing including tokenization, lemmatization, stop-word removal, and normalization. Implemented transformer-based models such as BERT to generate contextual embeddings capturing semantic meaning beyond surface-level text. Compared transformer-based approaches with lexicon-based methods like VADER to evaluate accuracy, interpretability, and performance trade-offs. Applied sentiment classification and aspect-based opinion extraction to analyze user opinions and emotional polarity in text datasets. Strengthened understanding of NLP model evaluation and real-world text variability.