Edge-Aware Graph Neural Network Baselines for Protein Function Prediction on OGBN-Proteins
DOI:
https://doi.org/10.7251/Keywords:
graph neural networks, protein function prediction, OGNB-Proteins, edge features, calibration, PyTorch GeometricAbstract
This paper presents an engineering-oriented study of edge-aware graph neural network baselines for protein function prediction on the OGBN-Proteins benchmark. The benchmark represents a large protein-protein interaction graph whose edges carry eight-dimensional association evidence and whose proteins have 112 functional labels. The study focuses on practical design choices that strongly affect reproducibility and deployment cost: construction of node features from edge evidence, use of scalar edge weights inside message passing, normalization under species-level distribution shift, and post-hoc decision calibration. We compare multilayer perceptrons, GraphSAGE, and GIN baselines in PyTorch Geometric, using mean, sum, and max edge-to-node aggregation, Batch Normalization, Layer Normalization, and a species-conditioned Layer Normalization variant. Results are reported over three seeds with ROC-AUC, micro-F1, calibrated micro-F1, expected calibration error, training time, memory use, and parameter count. Sum aggregation is consistently the strongest edge-to-node construction. GraphSAGE with sum-based features forms the best accuracy-cost trade-off, with Batch Normalization reaching the highest ROC-AUC and conditional Layer Normalization retaining stronger fixed-threshold behavior. Per-label temperature scaling and per-label thresholds substantially improve multi-label decision quality with negligible change in ROC-AUC, while light label-correlation smoothing yields small additional gains. The resulting protocol provides a compact, reproducible baseline for large edge-attributed biological graph settings
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Journal of Electrical Engineering and Computing

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.