Presentation: UND, NDSU, & ND-ACES bio and biomedical computation networking seminar
November 20, 2024, Alerus Center, Grand Forks, North Dakota
Recognizing Cancer Vaccine Adjuvant Names from Clinical Trial Data with Large Language Models
Hasin
Rehana
Doctoral Student
University of North Dakota
Co-authors: Brett McGregor, UND; Yongqun He, University of Michigan; Junguk Hur, UND
Session
Poster Presentation
An adjuvant is a substance added to vaccines to boost their effects by enhancing immune response. Identifying adjuvant names in cancer vaccine clinical trials is vital for advancing research and improving treatments, but manual curation from rapidly growing biomedical literature is challenging. This study explores the automatic identification of vaccine adjuvant names using Generative Pretrained Transformers (GPT) and Large Language Model Meta AI (Llama), two Large Language Models (LLMs). We have used two distinct subsets of cancer vaccine trials from https://clinicaltrials.gov/ in this study. The first subset comprised 97 clinical trial records annotated by the researchers of the AdjuvareDB website. GPT-4 demonstrated an F1-score of 77.5% on this dataset. The second subset included 367 cancer vaccine clinical trials annotated by our team to encompass a diverse range of cancer vaccine adjuvant information and their contextual applications. GPT-4 achieved an F1-score of approximately 85.4% on this dataset. Llama-3-8B-Instruct performed similarly on AdjuvareDB (77.5% F1-score) but improved with fine-tuning, reaching 100.0% recall and a 91.9% F1-score. Our findings show that LLMs excel at accurately recognizing adjuvant names, including rare and novel ones, outperforming traditional methods. They also effectively reduce false positives by distinguishing adjuvants from other biomedical terms. This study highlights LLMs' potential to advance cancer vaccine research by efficiently extracting insights from clinical trial data. Our future goal is to extend the focus to include a broader range of clinical trials and refine the model to improve its generalizability across different types of vaccines and adjuvants.