Spring 2025 Projects Mini Conference

Project Presentations

Tuesday, April 29

3:30 - 3:35 PM : Bone fractures Graphical Chatbot
-- Caslow Chien, Serena Theobald, Sindhuja Kumar

3:36 - 3:41 PM : AI-Based Car Crash Detection Using Deep Learning
-- Yuchen Li, Shiyi Chen, Yuanchen Yin

3:42 - 3:47 PM : Enhanced Relationformer for Linear Detection
-- Chen-Yu(Erioe) Liu, Yuanhao Shen

3:48 - 3:53 PM : AI-Powered Storyboard Generator Using Prompt-Based Diffusion Models
-- Rishabh Reddy Suravaram, Amruth Devineni

3:54 - 3:59 PM : Real-Time Localized Image Enhancement via YOLOv8 and Lightweight Super Resolution Model
-- Huihao Xing, Youran Geng, Yuhao Zhang

4:00 - 4:05 PM : Legal Contract Dataset
-- Ethan Chang, Heng Chang, Josh Yip

4:06 - 4:11 PM : Smart Multimodal Classroom Video Recorder
-- Bhavya Surana, Grace Chong, William Clavier

4:12 - 4:17 PM : Machine Learning And Physics
-- Hieu Nguyen, Yiren Wang, Mi-Ru Youn, Gukai Chen

4:18 - 4:23 PM : Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning
-- Carlos Garcia, Anthony Huang, Daniel Strick

4:24 - 4:29 PM : DL4DS AI-Powered Cricket Commentary: Event Detection, GPT-4o Narration, and gTTS Synthesis
-- Bhuvan Shivalingaiah Gowda, Sumanth Hosdurg Kamath, Samritha Aadhi Ravikumar

4:30 - 4:35 PM : Deep Learning Approach on Music Recommendation System
-- Christine Sangphet, Namika Takada, Ann Liang

Thursday, May 1

3:30 - 3:35 PM : Explaining Deepseek with Manim
-- Declan Young

3:36 - 3:41 PM : Enhancing AI-Generated Image Detection: A Comparative Study of CNNs, Transformers, and Contrastive Learning
-- Viktoria Zruttova, Junhui Cho, Cordell Cheng

3:42 - 3:47 PM : Reproduction of DeepSeek R1 Zero: A case study in Math Task
-- Ziqi Tang, Mingyu Chen

3:48 - 3:53 PM : Fine-Grained Visual Classification of Bird Species
-- Kunshu Yang, Shiheng Xu, Renjie Fan

3:54 - 3:59 PM : Efficient Open-Vocabulary Models for Low-Power Computer Vision
-- Hsiang Yu Huang, Zainab Alhaddad, Winni Tai

4:00 - 4:05 PM : EfficientMonocular Depth Estimation
-- Neel Gangrade, Rachita Singh

4:06 - 4:11 PM : Efficient Computer Vision: Pushing the Frontier of Edge Computing
-- Zachary Gentile, Michael Krah, Alex Lavaee

4:12 - 4:17 PM : Predicting Alzheimer’s Disease using Structural MRIs: Activation Map Analysis of Memory Related Brain Regions
-- Rajdeep Singh, John Salloum, Atul Aravind Das

4:18 - 4:23 PM : ProtMotifGen: Protein Motif Generation
-- Sicheng Yi

4:24 - 4:29 PM : ChainThought: Enhancing Reasoning Capabilities in Language Models Through Step-by-Step Problem Solving
-- Eric Gulotty, Brandon Wong, Derek Laboy

4:30 - 4:35 PM : Deep Learning for Autism Screening Using Transfer Learning on Video Data
-- Zhamshidbek Abdulkhamidov, Zhansaya Taszhanova, Saya Atchibay

4:36 - 4:41 PM : Kilter Board Beta and Problem Generation
-- Kailen Richards

Filter by Tag:

Deep Learning for Autism Screening Using Transfer Learning on Video Data

Zhamshidbek Abdulkhamidov, Zhansaya Taszhanova, Saya Atchibay

We propose to develop a deep learning-based classifier for early autism screening by analyzing video recordings of children’s behavior and gaze. By leveraging transfer learning from pre-trained visual models and integrating temporal modeling techniques, our approach aims to provide a robust probability estimate of Autism Spectrum Disor- der (ASD) from standard video recordings. This project has the potential to improve early screening accuracy while using accessible, low-cost hardware.

Tags: medical, CNN, LSTM, Transfer Learning

DL4DS AI-Powered Cricket Commentary: Event Detection, GPT-4o Narration, and gTTS Synthesis

Bhuvan Shivalingaiah Gowda, Sumanth Hosdurg Kamath, Samritha Aadhi Ravikumar

This project aims to develop a deep learning model for real-time cricket commentary using video highlights. We leverage Whisper for speech-to-text transcription of existing commentary and LLaVA for generating context-aware commentary. Training on structured cricket event data ensures accuracy and engagement, advancing automated sports broadcasting with vision-language modeling.

Tags: multimodal, LLM

Legal Contract Dataset

Ethan Chang, Heng Chang, Josh Yip

We aim to create a labeled dataset of contract clauses that will serve as a foundation for training and evaluating AI models in legal contract NLP. By curating and annotating contracts with a focus on clause categorization and summarization, our dataset will contribute to the development of AI-driven tools for contract analysis and verification.

Tags: Dataset

Bone fractures Graphical Chatbot

Caslow Chien, Serena Theobald, Sindhuja Kumar

Bone fractures are a significant healthcare challenge, requiring timely and accurate diagnosis for effective treatment. However, radiographic interpretation remains prone to variability and depends heavily on specialized expertise, limiting access to quality care in resource-constrained settings. Recent advances in LLM and computer vision offer promising solutions for automating fracture detection and classification. In this study, we develop an LLM-powered ChatBot integrated with a graph segmentation model for fracture diagnosis, leveraging computer vision to analyze medical images.

Tags: medical, LLM

EfficientMonocular Depth Estimation

Neel Gangrade, Rachita Singh

This project aims to develop a lightweight, efficient model for monocular depth estimation that runs smoothly on mobile systems. By using techniques like quantization, pruning, and knowledge distillation, along with efficient architectures such as MobileNetV3 and depth-wise separable convolutions, we strive to achieve fast inference without compromising depth accuracy.

Tags: LMM, LPCVC

Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning

Carlos Garcia, Anthony Huang, Daniel Strick

Deep learning for radiologic image analysis is a rapidly growing field in biomedical research and is likely to become a standard practice in modern medicine. The authors will use a publicly available dataset from Kaggle containing X-Ray images that are classified by the presence or absence of pneumonia to make a neural network that is as accurate at diagnosing pneumonia as trained radiologists. Success of the model will be determined by the algorithm’s F1 score as it is a standard parameter to evaluate machine learning models.

Tags: CV, CNN, medical

Efficient Computer Vision: Pushing the Frontier of Edge Computing

Zachary Gentile, Michael Krah, Alex Lavaee

The Low Power Computer Vision Challenge (LPCVC) advances three critical directions in computer vision by focusing on model optimization for edge devices. Leveraging the Qualcomm AI-Hub ecosystem, this competition enables developers to deploy efficient vision models on mobile phones and AI PCs. Participants can submit models in various formats (PyTorch, TensorFlow, TFLite, ONNX), making the challenge globally accessible.

Unlike traditional competitions that rely on cloud computing, LPCVC emphasizes practical applications that run directly on edge devices with minimal hardware requirements. This approach democratizes participation for developers, researchers, and students worldwide. The challenge provides open-source sample solutions as qualification benchmarks, with winning solutions and test datasets being publicly released to foster continued innovation in efficient computer vision.

In this paper, we propose a solution to the LPCVC challenge that uses a combination of model architecture improvements such as quantization, pruning, token-efficient methods. In addition, we use enhancements to the training pipeline including 8-bit General Matrix Multiplication (GEMM) and Multi-head Latent Attention (MLA). Finally, we propose techniques for edge device optimization. Our solution is designed to be lightweight, efficient, and run directly on edge devices with minimal hardware requirements.

Tags: LMM, LPCVC

ChainThought: Enhancing Reasoning Capabilities in Language Models Through Step-by-Step Problem Solving

Eric Gulotty, Brandon Wong, Derek Laboy

This project aims to develop a fine-tuned language model that demonstrates ex- plicit chain-of-thought reasoning capabilities for solving mathematical problems. By training a Deepseek-7B model on the GSM8K dataset with step-by-step reasoning an- notations, we seek to create an AI system that not only produces correct answers but also transparently explains its problem-solving process, similar to how students show their work in educational settings.

Tags: LMM

Efficient Open-Vocabulary Models for Low-Power Computer Vision

Hsiang Yu Huang, Zainab Alhaddad, Winni Tai

This project aims to improve open-vocabulary grounding segmentation using the LPCVC 2025 dataset and model framework. We will start by reproducing the sample code provided and then enhance it using Focal-T / ViT-b models trained on COCO and evaluated on RefCOCOg. Our goal is to refine segmentation performance by experimenting with different model improvements, such as fine-tuning strategies and data augmentation.

Tags: CV, LPCVC

AI-Based Car Crash Detection Using Deep Learning

Yuchen Li, Shiyi Chen, Yuanchen Yin

Car crash detection is a critical application of AI-driven traffic surveillance, enabling real-time accident identification and response. This project aims to develop a deep learning-based system that detects car crashes from video and audio data using multimodal approaches. We will integrate computer vision techniques for video analysis, optical flow for motion tracking, autoencoders for anomaly detection, and audio classification for crash sound recognition. Our approach will leverage multiple datasets, including DoTA, UCF-Crime (road accidents), and BDD100K (normal driving footage). The final system will be evaluated based on precision, recall, and the Spatial-Temporal Area Under Curve (STAUC) metric.

Tags: CV, CNN, Autoencoders, Audio Classification, Optical Flow

Enhanced Relationformer for Linear Detection

Chen-Yu(Erioe) Liu, Yuanhao Shen

Linear detection is crucial for tasks in computer vision. While Relationformer has advanced in inter-object relationship modeling, it struggles with endpoint prediction and matching stability. To solve these issues, we plan to reproduce and validate Relationformer. And then we will propose a distance-weighted approach, which replaces the Dirac delta distribution to enhance endpoint localization. Moreover, inspired by General Focal Loss, we will mitigate errors arising from negative samples.

Tags: geospaital, transformer

Machine Learning And Physics

Hieu Nguyen, Yiren Wang, Mi-Ru Youn, Gukai Chen

In this project, we will explore research on the interface of machine learning and Physics. More specifically, we will attempt to recreate the plots of the papers published by Sharma et al. and Nguyen et al., and then test their models on unseen astrophysical data. To make sure that our recreations are accurate, we will compare them qualitatively with the plots from the papers and show consistency with the theory when we test this on unseen experimental data. The end goal of our project will be a report on the success of our model recreation as well as a pedagogical presentation to shed light on this emerging field of study.

Tags:

Kilter Board Beta and Problem Generation

Kailen Richards

Rock climbing is a sport of problem-solving, especially in bouldering. The Kilter board is a standardized training board specifically for bouldering, and for many climbers, reading where to put their hands and feet is a struggle at first: this tool will read the routes and give the optimal path for the climber to attempt. In addition, based on the given routes, the tool will generate similar routes based on a given grade that a climber selects.

Tags: LMM

Deep Learning Approach on Music Recommendation System

Christine Sangphet, Namika Takada, Ann Liang

This project aims to develop a deep learning-based music recommen- dation system using content-based filtering to suggest relevant music to users according to distinct characteristics, such as danceability, tempo, and energy. By training neural networks on these features, the model will map songs to a latent space where similar music is categorized. Essentially, this project demonstrates the feasibility of personalizing music discovery using AI and contributes to the improvement of overall recommendation systems.

Tags: music, recommendation

Predicting Alzheimer’s Disease using Structural MRIs: Activation Map Analysis of Memory Related Brain Regions

Rajdeep Singh, John Salloum, Atul Aravind Das

Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder known for being the primary cause of dementia, characterized by significant memory loss and cognitive decline. This project aims to develop a deep learning model, trained on 3D T1-weighted structural MRI scans, that analyzes structural patterns in memory related regions of the brain in order to predict AD onset. The model will be refined using insights from activation maps of established models in order to enhance accuracy and consistency, with an end goal to improve early detection and provide an more interpretable and reliable approach to predicting Alzheimer’s.

Tags: medical

Smart Multimodal Classroom Video Recorder

Bhavya Surana, Grace Chong, William Clavier

This project proposes a smart multimodal classroom video recording system that automatically composes multiple content streams—camera feeds, slides, and white-board—based on real-time cues like gestures and spoken references. By leveraging computer vision, automatic speech recognition (ASR), and content analysis, it can dynamically pan, zoom, and switch between sources to create a more engaging, context-aware lecture recording. The goal is to overcome the limitations of static cameras and provide a richer, more immersive experience for both live and recorded viewers.

Tags: LMM

AI-Powered Storyboard Generator Using Prompt-Based Diffusion Models

Rishabh Reddy Suravaram, Amruth Devineni

Storyboarding is an essential process in films, animations, and comics, providing a structured visual representation of narratives before production begins. However, traditional storyboard creation is time-consuming and requires artistic expertise. This project introduces an AI-powered storyboard generator that leverages prompt-based diffusion models and large language models (LLMs). The system will automate the creation of frame-by-frame storyboards by collecting and processing comic and storyboard datasets, generating scene descriptions using image captioning models, and fine-tuning a diffusion model for sequential image generation. The approach follows an agent-based architecture, where frames are generated iteratively to ensure narrative coherence. This project builds upon research in script-to-storyboard automation, generative storytelling, and AI-assisted animation, enhancing efficiency, accessibility, and creative possibilities in storyboard creation.

Tags: LMM

Reproduction of DeepSeek R1 Zero: A case study in Math Task

Ziqi Tang, Mingyu Chen

This project aims to reproduce the improvement of reasoning ability in DeepSeek R1 Zero. Specifically, we will fine-tune the Qwen-2.5 1.5B models using PPO and GRPO. Our main focus is on exploring the impact of hyperparameters on LLM fine-tuning performance.

Tags: LMM

Real-Time Localized Image Enhancement via YOLOv8 and Lightweight Super Resolution Model

Huihao Xing, Youran Geng, Yuhao Zhang

This project proposes a real-time framework for enhancing image clarity in localized regions by integrating YOLOv8 for object detection and a lightweight super-resolution (SR) model. Targeting applications such as video conferencing and live streaming, the system dynamically identifies regions of interest (e.g., human subjects) and applies adaptive SR enhancement while maintaining a processing speed of 30+ FPS on consumer-grade GPUs. Our approach balances computational efficiency and perceptual quality through model optimization and/or hardware acceleration.

Tags: super-resolution, YOLOv8

Fine-Grained Visual Classification of Bird Species

Kunshu Yang, Shiheng Xu, Renjie Fan

This project explores Fine-Grained Visual Classification (FGVC) using deep learning, focusing on the CUB-200-2011 dataset. We aim to develop a model leveraging ResNet50 as a baseline, with improvements from attention mechanisms and Supervised Contrastive Learning to enhance classification accuracy and interpretability.

Tags:

scPerturb: single cell perturbation

Sicheng Yi

This project will aim to model differential expression (DE) for the PBMC peripherial white blood molecular cell acorss the 144 biological compounds. PBMCs are a primary, disease-relevant tissue that contains multiple mature cell types (including T-cells, B-cells, Myeloid cells, and NK cells) with established markers for annotation of cell types. We want to estimate the impact of an experimental perturbation on the expression level of every 18211 gene in the transcription.

Tags: life science

Explaining Deepseek with Manim

Declan Young

New AI models, such as Deepseek, typically use models on the cutting edge of technology and thus may not have easily digestible resources explaining how the model itself works. This project aims to create a video using the open sourced library manim to explain the math behind Deepseek as well as how it works to a viwer without any prior knowledge of the subject.

Tags: LMM

Enhancing AI-Generated Image Detection: A Comparative Study of CNNs, Transformers, and Contrastive Learning

Viktoria Zruttova, Junhui Cho, Cordell Cheng

AI has reached a point where it can generate highly realistic faces, scenes, and objects. This study addresses the problem of distinguishing AI-generated visuals from authentic photographs using a unique dataset, ”AI vs. Human-Generated Images,” from a Kaggle competition. Unlike conventional datasets, this dataset provides paired images where each real image has a corresponding AI-generated counterpart, allowing for direct comparative analysis. We leverage this structured pairing within a deep learning framework, incorporating convolutional neural networks (CNNs) and transformer-based architectures to develop robust classifiers. In addition, we explore contrastive learning to enhance feature discrimination, hypothesizing that it improves generalization by enforcing a more distinct separation between real and AI-generated images

Tags: Deep Fake Detection, CNN, Transformers, Contrastive Learning