top of page

Presentation: UND, NDSU, & ND-ACES bio and biomedical computation networking seminar 

November 20, 2024, Alerus Center, Grand Forks, North Dakota

Advancing Protein Function Prediction for Flavobacterium covae Using Graph Neural Networks and Sequence Embeddings

M Mishkatur

Rahman

Doctoral Student
North Dakota State University

Co-authors: Harun, Pirim, Assistant Professor, NDSU; Yusuf, Akbulut, Graduate Student, NDSU; Zaidur, Rahman, Graduate Student, University of Arkansas; Hasan, Tekedar, Assistant Research Professor, Mississippi State University; Larry, Hanson, Interim Department Head and Professor, Mississippi State University; Matt, Griffin, Research Professor, Mississippi State University

Session

Presentation Session 2

Accurately identifying protein function is crucial for downstream analyses, such as drug discovery, vaccine development, and biotechnology applications. Computational models that reliably predict protein function can accelerate this process. Flavobacterium covae, the causative agent of columnaris disease in channel catfish, a key species in U.S. aquaculture, has its proteome available, though many protein sequences lack experimentally identified functions. Each protein may be linked to multiple functions, represented by Gene Ontology (GO) terms. We propose a Graph Neural Network (GNN) model to address this multilabel classification problem. GNNs are designed to handle graph-structured data, making them ideal for capturing complex relationships in biological networks. A graph was constructed where nodes represent protein sequences, and edges were based on a predefined sequence similarity threshold. ESM-2 embeddings were used for node features, integrating sequence data, protein localization, essential gene properties, and physicochemical characteristics. Model performance was evaluated using macro/micro accuracy, precision, and F1 score, offering new insights for improving disease management in aquaculture.

bottom of page