Presentation: UND, NDSU, & ND-ACES bio and biomedical computation networking seminar
November 20, 2024, Alerus Center, Grand Forks, North Dakota
Advancing Protein Function Prediction for Flavobacterium covae Using Graph Neural Networks and Sequence Embeddings
M Mishkatur
Rahman
Doctoral Student
North Dakota State University
Co-authors: Harun, Pirim, Assistant Professor, NDSU; Yusuf, Akbulut, Graduate Student, NDSU; Zaidur, Rahman, Graduate Student, University of Arkansas; Hasan, Tekedar, Assistant Research Professor, Mississippi State University; Larry, Hanson, Interim Department Head and Professor, Mississippi State University; Matt, Griffin, Research Professor, Mississippi State University
Session
Presentation Session 2
Accurately identifying protein function is crucial for downstream analyses, such as drug discovery, vaccine development, and biotechnology applications. Computational models that reliably predict protein function can accelerate this process. Flavobacterium covae, the causative agent of columnaris disease in channel catfish, a key species in U.S. aquaculture, has its proteome available, though many protein sequences lack experimentally identified functions. Each protein may be linked to multiple functions, represented by Gene Ontology (GO) terms. We propose a Graph Neural Network (GNN) model to address this multilabel classification problem. GNNs are designed to handle graph-structured data, making them ideal for capturing complex relationships in biological networks. A graph was constructed where nodes represent protein sequences, and edges were based on a predefined sequence similarity threshold. ESM-2 embeddings were used for node features, integrating sequence data, protein localization, essential gene properties, and physicochemical characteristics. Model performance was evaluated using macro/micro accuracy, precision, and F1 score, offering new insights for improving disease management in aquaculture.