Spring 2024 Project Mini Conference

Students Group 1 Students Group 2

BUilding Insights

Any Yang, Weining Mai

In today’s digital era, we are often struggle to find the desired location with only an image from internet. Leveraging the open source multi-modal LLaVA model, our project transforms street-view images into vision chatbot that answer questions re- garding BU’s campus, unlocking new potentials for location-based services and virtual tour guide.

Tags: LMM

A Deep Learning Solution for Precise Subtitle Segmentation

Anush Veeranala, Lilin Jin, Xinyu Zhang

Despite increased adoption, effective subtitle segmentation poses a challenge, par- ticularly in libre-software contexts. Our research addresses this gap by proposing a reliable subtitle segmentation solution, leveraging deep learning techniques to advance comprehension, streamline academic processes, and reduce dependence on proprietary solutions and volunteer efforts.

Tags: NLP, subtitling

Sudan Agricultural Advising Consultant

Jessica Cannon

This project report describes the development of a question-and-answer (Q&A) chatbot, empowered by deep learning-based Language Model technology, to address the challenges faced by Sudanese farmers. I adapted and fine-tuned the existing GPT- 4 LLM to the Sudanese agricultural context via RAG implementation, leveraging four large publicly available datasets from reputable sources. Evaluation of the chatbot’s performance was conducted using the TruLens evaluation functions for Groundedness, Context Relevance and Answer Relevance [3], as well as a qualitative comparison of the RAG model’s responses against the original model’s responses to relevant queries. The results showed that the RAG model did well in both Context and Answer Relevance, as well as being domain specific and versatile within the domain. It also had a higher accuracy compared to the baseline when compared to ground truth data. However, the RAG model struggled with Groundedness and was less detailed with advice than the baseline model, which is most likely attributed to the source datasets not including enough specific contextual detail. Future work will ideally focus on further refining the LLM’s capabilities and evaluating its long-term impact on agricultural productivity and livelihoods in Sudan.

Tags: LLM, RAG, RAG Evaluation

Local Business Demand Forecasting Using State-of-the-Art Deep Learning Methods

Carmen Pelayo Fernandez

Abstract here.

Tags: Time Series Analysis

A Multi-scale Fusion Deep Learning Approach on Brain Tumor Segmentation

Cassie Lee, Xinyi Hu, Yuke Zhang

The objective of the research is to propose a novel Multimodal and Multiscale model for the classification and segmentation of brain tumors on Magnetic Resonance Imaging (MRI) images. We carefully analyze the Brain Tumor Segmentation(BraTS) 2021 Dataset with the intention of leveraging these images for diagnostic purposes. Here is the link to our project repository: https://github.com/cassielee04/smart brains. We found that the multi-modality model performed better than the single modality model for the who brain tumor segmentation task in BraTS.

Tags: Segmentation

Chess Anti-Cheat Detection Using an Adversarial Network Framework

Osama Dabbousi

The creation of powerful chess programs in past years has made cheating in the game easy and prevalent. This fact has facilitated a need for high quality chess anticheats capable of detecting AI play. While such algorithms exist, they often focus on better-than-human play as opposed to non-human play. To address this issue, in this paper we introduce ChessGAN. ChessGAN is a generative adversarial network whose generator learns to play human-like chess, while its discriminator is trained to distinguish human moves from AI moves. We find that while such a discriminator was capable of distinguishing its generator, this training did not generalize to all human or AI play. However, we did find that the use of this discriminator aided in the training of a generator which played high-quality chess with a move distribution closely resembling that of human players.

Tags: GAN, RL

VizWiz Visual Question Answering (VQA)

Ishan Ranjan, Jack Campbell, Rani Shah

Visual Question Answering (VQA) is a crucial emerging technology that has many real-world use cases, from assisting those who are visually impaired to providing accurate information about image features that humans might not be able to discern. To promote the development and testing of VQA models, VizWiz has formulated a set of challenges to develop models to perform VQA tasks[3]. We have implemented a multimodal transformer model that accurately predicts answers to questions about a given set of images. Our model provides a high accuracy of 0.751 as well as an answerability score of 0.798.

Tags: Transformer, LLM, Vision

A Two-Pronged Approach to ASL Recognition

Jasmine Pham, Farid Karimli

We propose VideoASL, an innovative approach to isolated sign language recognition (ISLR) from video. Using a two-pronged approach, our model uses video features with hand landmark locations to recognize large-scale American Sign Language (ASL). We aim to address challenges in dictionary retrieval, which are essential tools for language learners and users. Potential applications include live sign transcription, as well as creating a reliable ASL-to-English dictionary.

Tags: Vision, CNN

Predicting Machine Failures in Toothbrush Tufting Machines based on Temperature, Shock, and Vibrations Data

Bill Jiao

This project proposes a machine-learning method to optimize dental industry manufacturing systems, specifically for toothbrush tufting machines. The project optimized a Long Short-Term Memory Model to generate machine states predictions based on vibrations, shock, and temperature data. To achieve improved performance, the project adjusted learning rates, training strategies, training loss functions (and weights). The model achieved a loss value of 1.1462 on the regular cross entropy loss function and a loss value of 0.1629 on the binary cross entropy function. On the testing set, the cross entropy loss trained model achieved a macro F1 score of 0.3342, a micro F1 score of 0.9081, and a weighted F1 score of 0.9383. This means that the model does a good job creating general predictions and correctly identifies normal operating states. However, the model does not do as well at failure state predictions.

Tags: Time Series Analysis, RNN, LSTM

Suggested Academic Reading Sequence Recommendation Using Graph Neural Networks

Bowen Li

Increasing output in the volume of academic publications over the years will inevitably lead to problems when it comes to allowing a user to be able to pick out papers that are relevant and/or useful to them, with citations being the only sense of structure one can rely on. Incidentally, graph neural networks has seen a recent surge in interest and developments, presenting new ways to apply deep learning techniques on data that can be represented in the structure of a graph. This project explores graph neural networks and the task of link prediction in creating a recommendation system that, given a paper/topic, will provide users with a suggested sequence of papers to read in order to further understand the paper/topic, ultimately assisting users in navigating the ever increasingly complex landscape of academic publications. The code can be found at this repository or at the url: https://github.com/lib250/publication-recommender.

Tags: GNN

Deep Learning to Predict Neurodegenerative Diseases

Nikhita Mantravadi

Neurodegenerative diseases, such as Alzheimer’s and Parkinson’s are characterized by the progressive loss of structure or function of neurons, including death of neurons. These diseases are among the most challenging conditions to diagnose and treat, primarily because they often develop silently and become apparent only when significant neurological damage has occurred. The diagnosis is crucial as it can lead to more effective management of the disease, potentially slowing its progression and significantly improving the quality of life for patients . This early detection is pivotal as it could allow for timely intervention, potentially altering the disease’s trajectory.

Tags: tag1, tag2

Predicting Global Agricultural Crop Yield

Sungjoon Park

This research proposes a deep neural network model that predicts global wheat crop yield using time-series data of global weather, aiming to address the inefficiencies of current region-specific models. To this end, a combination of CNN (Convolutional Neural Network) and LSTM (Long Short-Term Memory) networks is implemented, where the CNN block processes encode input images and the LSTM block play a role in predicting wheat crop yield in an auto-regressive manner. This model is meaningful in that it predicts the global crop field and uses a spatiotemporal approach.

Tags: CNN, LSTM

An Exploration of the effects of Spurious Correlations in Deep Learning

Kevin Quinn

How do deep learning models select important features and when can they be wrong? Spurious correlations arise when models learn to rely upon features which are correlated with the target but ultimately don’t have any general predictive value. In this project I explore this effect in synthetic datasets with the goal of creating simple settings where models can fail, and understanding current methods for fixing them.

Tags: MNIST-1D

Predictive Modeling of Critical Care Patient Mortality Rates Using Neural Networks: A Study on US Healthcare Data

Yuta Tsukumo

This study aims to develop predictive models for mortality rates among critical care patients using neural networks, utilizing large-scale healthcare data from the United States. Additionally, we conduct comparative analysis with classical machine learning techniques. While acknowledging the limitations of generalizing findings to other healthcare systems, we emphasize the significance of accurately predicting mortality rates for informed resource allocation and quality assessment in intensive care settings

Tags: tag1, tag2

Fake news detection using augmented LLM

Savannah Wang, Yi Liu, Zhuoyan Ma

The proliferation of fake news in modern media compromises information integrity. By incorporating professionally verified fact-checking data, our project seeks to leverage Large Language Models’ extensive knowledge, reasoning abilities and accessibility to external sources to identify and combat fake news effectively.

Tags: LLM, tag2

Human Fall Detection System

Hang Yu, Yinzhou Lu

Deep learning has significantly advanced how it solve complex problems in various areas like image recognition, language processing, and healthcare. Particularly, convolutional neural networks and recurrent neural networks can analyze large amounts of data to recognize patterns and make decisions, improving tasks like diagnosing diseases or translating languages. Our project applies these advanced technologies to help elderly people stay safe in their homes. The elderly is at risk of falling, and if the falling situations are not detected on time, they can lead to serious health problems or even be fatal. By creating a system that uses deep learning to monitor and detect when an elderly person falls, it can alert family members or emergency services right away, potentially saving lives.

Tags: Vision, CNN

Automatic Capture and Collection of Poker Players’ Micro-Expressions

Zhengxiong Zouxu

This report presents an in-depth exploration of advanced Optical Character Recognition (OCR) technologies tailored for text detection and recognition in poker videos, a challenging domain due to dynamic visual elements and diverse textual representations. We address significant hurdles such as varying font styles, background complexity, and multilingual text present in real-time video streams. The research utilizes a dual-phase approach where text is first detected using DBNet, a robust detector capable of identifying text within highly cluttered images, and subsequently recognized through a Convolutional Recurrent Neural Network (CRNN), which excels in decoding text from irregular patterns. Our findings demonstrate that integrating these technologies can significantly enhance the accuracy and efficiency of text recognition in poker videos. By refining detection and recognition phases, the system achieves superior performance compared to traditional OCR systems. The practical implications of this study extend beyond gaming, offering potential applications in various multimedia and real-time video processing tasks where accurate text recognition is critical.

Tags: Vision, CNN