Edge-Aware Graph Neural Network Baselines for Protein Function Prediction on OGBN-Proteins

Authors

  • Aleksandar Stankovic Faculty of Technical Sciences, University of Novi Sad
  • Dejan Lisica Faculty of Technical Sciences, University of Novi Sad

DOI:

https://doi.org/10.7251/

Keywords:

graph neural networks, protein function prediction, OGNB-Proteins, edge features, calibration, PyTorch Geometric

Abstract

This paper presents an engineering-oriented study of edge-aware graph neural network baselines for protein function prediction on the OGBN-Proteins benchmark. The benchmark represents a large protein-protein interaction graph whose edges carry eight-dimensional association evidence and whose proteins have 112 functional labels. The study focuses on practical design choices that strongly affect reproducibility and deployment cost: construction of node features from edge evidence, use of scalar edge weights inside message passing, normalization under species-level distribution shift, and post-hoc decision calibration. We compare multilayer perceptrons, GraphSAGE, and GIN baselines in PyTorch Geometric, using mean, sum, and max edge-to-node aggregation, Batch Normalization, Layer Normalization, and a species-conditioned Layer Normalization variant. Results are reported over three seeds with ROC-AUC, micro-F1, calibrated micro-F1, expected calibration error, training time, memory use, and parameter count. Sum aggregation is consistently the strongest edge-to-node construction. GraphSAGE with sum-based features forms the best accuracy-cost trade-off, with Batch Normalization reaching the highest ROC-AUC and conditional Layer Normalization retaining stronger fixed-threshold behavior. Per-label temperature scaling and per-label thresholds substantially improve multi-label decision quality with negligible change in ROC-AUC, while light label-correlation smoothing yields small additional gains. The resulting protocol provides a compact, reproducible baseline for large edge-attributed biological graph settings

Author Biographies

  • Aleksandar Stankovic, Faculty of Technical Sciences, University of Novi Sad

    Aleksandar Stanković is a fourth-year student of Software Engineering and Information Technologies at the Faculty of Technical Sciences, University of Novi Sad. His research interests include graph machine learning, machine learning systems, GPU-accelerated graph analytics, temporal graphs, and social computing. In 2025, he was a research intern at the DataNet Lab at Fudan University in Shanghai, where he continues to collaborate as an external associate. He is the author of several research papers in the areas of graph analytics, temporal motifs, and efficient systems for graph machine learning. He is the recipient of the Annual Award of the Mathematical Institute of the Serbian Academy of Sciences and Arts in computer science for the best original student paper.

  • Dejan Lisica, Faculty of Technical Sciences, University of Novi Sad

    Dejan Lisica is a software engineering and information technology student at the Faculty of Technical Sciences, University of Novi Sad. His interests lie at the intersection of computer science and the life sciences, specifically focusing on machine learning, graph neural networks, and AI applications in drug discovery and protein design. He has developed projects involving molecular machine learning, protein function prediction, and graph-based modeling. Notably, his team achieved a high-ranking finish in the global DREAM Target 2035 Challenge, with their work featured in a poster presentation at the Mainframe Symposium. A recipient of the Fund for Young Talents of the Republic of Serbia scholarship, his current research focuses on applying modern ML methods to complex biological and chemical data.

Downloads

Published

2026-06-28

Issue

Section

Original Research Papers