Kushagra Singh

Hi, I am Kushagra Singh

I am a Final Year Computer Science Engineering student at MIT World Peace University, Pune, passionate about building impactful tech solutions at the intersection of Artificial Intelligence, Machine Learning, Web Development and Embedded Systems.

As the Technical Head of the IRIS Tech Club, I lead a team of developers to deliver innovative projects, including the club's official website and autonomous vehicle modules.

I have hands-on experience as a Machine Learning Research Associate at IIMT University, contributing to PhD-level healthcare analytics with advanced ML models and interactive web applications. My internship at Infosys Springboard deepened my expertise in deep learning and computer vision, where I developed high-accuracy neural networks and user-friendly applications.

I thrive in collaborative, fast-paced environments, demonstrated by my achievements as a Top 25 Finalist in the Smart India Hackathon 2024 and a Top 4 Finalist in the Bosch BOROSA Hackathon 2025. My technical toolkit spans Python, Java, C++, PyTorch, TensorFlow, Next.js, Spring Boot, AWS, Docker and more.

Driven by curiosity and a commitment to real-world impact, I enjoy tackling complex challenges, whether it's developing autonomous driving systems, AI-powered healthcare tools, or scalable cloud-native applications. I'm always eager to connect, collaborate and create solutions that make a difference.

Experience

Infosys Springboard Logo

Infosys Springboard

Oct 2024 – Dec 2024

ML Project Intern

Remote

  • Designed and implemented a handwritten digit recognition application using neural networks (MLP, CNN, LeNet5) for MNIST dataset classification.
  • Developed custom PyTorch models with dropout, activation functions, and convolutional layers for efficient feature extraction and classification.
  • Built an interactive web application using Streamlit to predict digits from uploaded images, with support for model selection and real-time predictions. Achieved 90.04%, 98.93% & 98.95% accuracies respectively for each model on the test dataset.
  • Utilized image preprocessing (grayscale conversion, normalization, resizing) and dataset augmentation for robust model performance.
  • Created a digit visualization tool to save and display images from the MNIST dataset using Matplotlib.
  • Deployed models with pre-trained weights and integrated custom UI design for a seamless user experience during prediction.
IIMT University Logo

IIMT University

Jan 2025 – Mar 2025

Machine Learning Research Associate

Remote

  • Led an AI-driven machine learning research project for Cardiovascular Disease risk prediction, contributing to a PhD study in healthcare analytics.
  • Developed a predictive pipeline utilizing ensemble ML models (Logistic Regression, SVM, KNN, Random Forest, XGBoost, LightGBM, CatBoost) for multi-disease risk assessment.
  • Built and deployed an interactive Streamlit-based web application with SHAP value visualizations and model performance comparisons.
  • Implemented a multi-output Random Forest model to predict multiple disease types simultaneously, optimizing feature engineering for higher accuracy.
  • Delivered a fully functional ML solution tailored to healthcare research objectives.
IRIS, MIT WPU Logo

IRIS, MIT WPU

Aug 2024 – Present

Technical Head

Pune

  • Spearheaded the development of the official I.R.I.S. club website, taking the lead in designing, coding, and deploying it for live hackathon event registrations for 200+ people, integrated with Razorpay payment gateway to enable seamless fee collection.
  • Currently leading and managing the tech team to oversee website updates, changes, & new feature implementations. Provide mentorship and technical guidance while also directly contributing to challenging tasks, such as backend development for the blogging system, voting system authentication, & comment section functionality.
  • Collaborate with faculty and peers to identify and initiate new tech-based projects, fostering innovation within the club.
  • Led the website’s successful deployment during live events, ensuring smooth operation and scalability for real-time registrations and payments.
  • Also developing an autonomous vehicle module for non-ADAS cars using YOLOv8 deep learning models and sensor-based simulations for decision-making and steering control

Projects

IRIS Club RAG Chatbot

IRIS Club RAG Chatbot

Conversational AI & NLP
LLaMA-3 70BLangChainFAISSHuggingFaceNext.jsGroq APIDistilBERT
  • Engineered a RAG chatbot using LLaMA-3 70B, FAISS-based semantic search, and LangChain for club info retrieval.
  • Optimized for 10ms query latency and 200+ daily queries.
  • Benchmarked against fine-tuned DistilBERT (0.92 BERTScore).
  • Developed a hybrid RAG-finetune architecture for improved contextual accuracy.
DocsVerse - Document Research & Theme Identification Chatbot

DocsVerse - Document Research & Theme Identification Chatbot

Document Intelligence & RAG
ReactFastAPIChromaDBSQLAlchemyTypeScriptMaterial UILLM IntegrationOCR
  • Developed interactive web application for document upload, AI-powered chat with citations, and theme identification across document sets.
  • Built FastAPI backend with SQLAlchemy ORM, ChromaDB vector database for semantic search, and integration with external LLM services.
  • Created React.js frontend with Material UI, React Query for state management, and drag-and-drop document upload functionality.
  • Implemented document processing pipeline with OCR support, chunking, embedding generation, and theme analysis capabilities.
LangGraph Researcher

LangGraph Researcher

Multi-Agent Systems & Research Automation
LangChainLangGraphTavily APIOllamaPythonStreamlitMulti-Agent System
  • Implemented dual-agent system for deep research using Tavily, LangChain, and LangGraph with Research Agent for web crawling and Answer Drafter Agent for synthesis.
  • Built modular agent architecture with LangGraph workflow orchestration, supporting multiple LLM backends (Ollama, phi3, llama3, mistral).
  • Developed clean Streamlit web UI for interactive research queries with real-time results display and source citations.
  • Created extensible system architecture allowing easy addition of new agents (fact-checker, summarizer) and integration with different LLM providers.
SynapTrack - Parkinson's Disease Detection using EEG

SynapTrack - Parkinson's Disease Detection using EEG

Healthcare & Biomedical ML
PythonPyTorchEEG AnalysisRandom ForestSVMCNNRNNSHAPLIMEStreamlit
  • Developed an advanced machine learning-based EEG analysis platform combining signal processing and AI for brain activity pattern decoding.
  • Implemented ensemble of state-of-the-art classifiers (Random Forest, SVM, KNN, LDA, CNN, RNN, ANN) with optimized hyperparameters.
  • Built comprehensive feature engineering pipeline with SHAP values, LIME explanations, and statistical significance testing.
  • Created interactive Streamlit dashboard for real-time EEG visualization, model performance metrics and feature importance analysis.
Data Orchestrate - Distributed File Sync

Data Orchestrate - Distributed File Sync

Distributed Systems & Cloud Infrastructure
JavaSpring BootKubernetesDockerApache KafkaMongoDB AtlasMinikubeCI/CD
  • Built a cloud-native, microservices-based file sync system in Java (Spring Boot) with Docker/Kubernetes deployment.
  • Integrated Kafka for real-time cross-device replication and MongoDB Atlas for metadata storage.
  • Automated infra provisioning (Kubernetes manifests, Docker Compose), health checks and PDF text extraction.
  • Achieved 99.9% uptime in Minikube.
Airfield Wildlife Risk Classification

Airfield Wildlife Risk Classification

Computer Vision & Real-time Detection
YOLOv8OpenCVFlaskPythonTensorFlowReal-time DetectionRisk Assessment
  • Built comprehensive real-time bird detection and classification system for airport bird strike prevention using multi-model ensemble detection.
  • Implemented species classification for 207 bird species (200 CUB + 7 Airport Birds) with size, behavior, altitude, and speed estimation.
  • Developed comprehensive risk assessment model (0-10 scale) with collision probability calculation and real-time alert system.
  • Created interactive Flask dashboard with live video streaming, risk trend analysis, and RESTful API endpoints for integration.
ForVis - Formula 1 Analytics

ForVis - Formula 1 Analytics

Sports Analytics & Big Data
PythonPyQt5Apache SparkHDFSFastF1 APIRandom ForestLinear RegressionData Visualization
  • Developed a dynamic PyQt5 GUI for real-time and historical Formula 1 telemetry analysis.
  • Integrated FastF1 APIs, Apache Spark, and HDFS to reduce processing time by 30%.
  • Implemented Random Forest and Linear Regression for pit stop prediction and strategy optimization (86% accuracy).
  • Added multi-driver comparison, lap time analysis, and tyre strategy dashboards.
Cardiovascular Diseases Prediction

Cardiovascular Diseases Prediction

Predictive Healthcare & Ensemble Machine Learning
PythonStreamlitXGBoostLightGBMCatBoostSHAPMulti-Output MLROC Analysis
  • Built comprehensive machine learning pipeline for cardiovascular disease risk prediction using ensemble models (Logistic Regression, SVM, KNN, Random Forest, XGBoost, LightGBM, CatBoost).
  • Developed custom multi-output Random Forest model to predict multiple disease types simultaneously with feature engineering and polynomial features.
  • Created interactive Streamlit web application with risk assessment, SHAP value visualizations, and emergency warning system for critical conditions.
  • Implemented comprehensive model evaluation with ROC curves, confusion matrices, learning curves, and feature correlation analysis.
Tarzan - Autonomous Vehicle Module

Tarzan - Autonomous Vehicle Module

Autonomous Systems & Sensor Fusion
YOLOv8MATLABC++ArduinoLiDARPure-Pursuit AlgorithmSensor FusionDeep Learning
  • Developing an autonomous vehicle portable module for non-ADAS-enabled cars using custom deep learning models (YOLOv8).
  • Implements vision-based real-time obstacle detection, path planning using pure-pursuit algorithm in MATLAB.
  • Designing multi-modal sensor fusion combining camera, LiDAR, and ultrasonic sensors for robust perception.
  • App-based image input for decision making like steering, acceleration, braking.
  • Simulates real-world scenarios (cars, potholes, barricades, etc.) for safe navigation.
IRIS Club Website

IRIS Club Website

Full-Stack Web Development
Next.jsSupabaseRazorpayVercelTypeScriptReal-time PaymentsBlogging System
  • Developed an official website for the club to provide a centralized platform to share IRIS updates, event details, recruitments and resources.
  • Handling multiple concurrent real-time payments and updating entries for events.
  • Features include event registrations with a payment gateway, dynamic blogging with a voting system and comment section, club project highlights, recruitment and contact forms.
  • Utilized Razorpay SDK, Supabase Database, and continuous deployment on Vercel with GitHub CI/CD integration.
PlantWise - Ayurvedic AI Companion

PlantWise - Ayurvedic AI Companion

Healthcare AI & Natural Language Processing
PythonCohere APIPyQtRAG PipelineNLPAyurvedic MedicineDisease Prediction
  • Built an AI-driven health assistant using LLMs (Cohere API) for disease prediction and Ayurvedic remedy recommendation.
  • Implemented RAG pipeline with PyQt GUI for dynamic responses.
  • Top 25 Finalist at Smart India Hackathon 2024.
  • Achieved 89% user satisfaction across 500+ curated mappings.
Driver Safety Monitoring System

Driver Safety Monitoring System

IoT & Real-time Monitoring
ArduinoC++MQ3 SensorGPS ModuleGSM ModuleAccelerometerReal-time Monitoring
  • Integrated MQ3 alcohol sensor, accelerometer/gyroscope, GPS module, LED screen, GSM module and buzzer into a vehicle's onboard system using C++ and Arduino.
  • Developed a comprehensive system to monitor driver behavior, detect alcohol presence, and deliver real-time alerts to promote safe driving practices.
  • Enabled timely feedback for enhanced driver safety, utilizing sensors for real-time monitoring and instant notifications.

Publications

Domain-Specific Conversational AI for IRIS MITWPU: From Research Paper to Production

Domain-Specific Conversational AI for IRIS MITWPU: From Research Paper to Production

Kushagra Singh, Brandon Cerejo, Samanyu Bhate, Taksh Dhabalia

IEEE International Conference on Information, Communication and Computing Technology (ICoICC) 2025

Developed and compared Retrieval-Augmented Generation (RAG) and Fine-Tuned Transformer approaches for domain-specific chatbot implementation. Implemented RAG pipeline using LangChain, HuggingFace embeddings, FAISS, and LLaMA-3 70B model via Groq API. Built fine-tuned DistilBERT model optimized for question-answering tasks with comprehensive evaluation metrics. Successfully deployed hybrid approach on official IRIS MIT-WPU website, handling real queries with 0.92 BERTScore accuracy.

View on IEEE Xplore

Skills

Languages

Python Logo

Python

C Logo

C

C++ Logo

C++

Java Logo

Java

JavaScript Logo

JavaScript

Databases

MySQL Logo

MySQL

MongoDB Logo

MongoDB

PostgreSQL Logo

PostgreSQL

MongoDB Atlas Logo

MongoDB Atlas

Cloud Dev Ops

AWS Logo

AWS

Kubernetes Logo

Kubernetes

Docker Logo

Docker

CI/CD Pipelines Logo

CI/CD Pipelines

Minikube Logo

Minikube

Libraries

PyTorch Logo

PyTorch

OpenCV Logo

OpenCV

Pandas Logo

Pandas

NumPy Logo

NumPy

TensorFlow Logo

TensorFlow

Scikit-Learn Logo

Scikit-Learn

Tkinter Logo

Tkinter

StreamLit Logo

StreamLit

PyQt5 Logo

PyQt5

NLTK Logo

NLTK

spaCy Logo

spaCy

LangChain Logo

LangChain

HuggingFace Logo

HuggingFace

Big Data Technologies

Cloudera Logo

Cloudera

HDFS Logo

HDFS

Apache Pig Logo

Apache Pig

Hive Logo

Hive

HBase Logo

HBase

Apache Spark Logo

Apache Spark

Kafka Logo

Kafka

Web Development

Next.js Logo

Next.js

React Logo

React

Node.js Logo

Node.js

Express.js Logo

Express.js

Spring Boot Logo

Spring Boot

FastAPI Logo

FastAPI

HTML Logo

HTML

CSS Logo

CSS

Bootstrap Logo

Bootstrap

WebSockets Logo

WebSockets

Visualization Tools

Tableau Logo

Tableau

Matplotlib Logo

Matplotlib

Seaborn Logo

Seaborn

Softwares

AutoCAD Logo

AutoCAD

TinkerCAD Logo

TinkerCAD

Jupyter Logo

Jupyter

Linux Logo

Linux

Selenium Logo

Selenium

Electronics

Arduino Logo

Arduino

Raspberry Pi Logo

Raspberry Pi

STM32 Logo

STM32

ESP32 Logo

ESP32

Ultrasonic Sensors Logo

Ultrasonic Sensors

Generative AI

LLaMA-3 Logo

LLaMA-3

Retrieval-Augmented Generation Logo

Retrieval-Augmented Generation

Roles and Achievements

Roles

IRIS [Student Club, MIT-WPU] – Technical Head logo

IRIS [Student Club, MIT-WPU] – Technical Head

Aug 2024 – Present
Infosys Springboard [Remote] – ML Intern logo

Infosys Springboard [Remote] – ML Intern

Oct 2024 – Dec 2024
IIMT University [Remote] – ML Research Associate logo

IIMT University [Remote] – ML Research Associate

Jan 2025 – Mar 2025

Achievements

Achievement LogoMitsubishi UFJ Financial Group (MUFG) Hackathon 2025 – Winner: Led development of an AI-powered financial assistant for personalized retirement planning using XGBoost, KMeans, and Google Gemini. Enabled voice-first interaction via Azure Speech Services and real-time insights with NewsAPI + Gemini. Deployed with Docker on AWS App Runner and frontend on Vercel.
Achievement LogoAdobe India Hackathon 2025 – Top 100 out of 2.6L+ participants: Built a containerized AI pipeline for PDF understanding, including multilingual outline extraction (PyMuPDF, K-Means), persona-driven content ranking (Ollama), and a semantic insight platform with PDF.js integration and Azure TTS-based multi-voice podcasting.
Achievement LogoBosch BOROSA Hackathon 2025 – Top 4 Finalist: Built an intelligent traffic safety system using YOLOv8 for real-time signal & crosswalk detection (95–98% accuracy). Integrated ESP32S3 and MQTT for edge automation and decision logic.
Achievement LogoSmart India Hackathon (SIH) 2024 – Top 25 Finalist: Developed PlantWise, an LLM-powered Ayurvedic health companion for disease prediction and natural remedies.
Achievement LogoHackMITWPU’24 Ideathon – Finalist: Proposed DermDetect, an AI-powered tool for preliminary dermatological diagnosis using image processing for remote consultations and personalized skincare solutions.