❄️ IceBerg: Debiased Self-Training for Class-Imbalanced Node Classification

If you like our project, please give us a star ⭐ on GitHub for the latest update.

This is the official implementation of the following paper:

IceBerg: Debiased Self-Training for Class-Imbalanced Node Classification (WWW'25) [Paper]

Zhixun Li, Dingshuo Chen, Tong Zhao, Daixin Wang, Hongrui Liu, Zhiqiang Zhang, Jun Zhou*, Jeffrey Xu Yu*

In this work, we propose IceBerg, a debiased self-training framework to address the class-imbalanced and few-shot challenges for GNNs at the same time. We find that leveraging unlabeled nodes can significantly enhance the performance of GNNs in class-imbalanced and few-shot scenarios, and even small, surgical modifications can lead to substantial performance improvements.

💫 Key Features

Plug-and-play: Largely improve the performance of existing baselines as a plug-and-play module.
Simplicity: You only need to add a few lines of code.
Versatility: State-of-the-art performance in both class-imbalanced and few-shot node classification tasks.
Lightweight: Achieve similar or even better efficiency compared to BASE balancing methods.

🚀 Get Start

This code needs the following requirements to be satisfied beforehand:

python>=3.9
torch==2.4.0
torch-geometric==2.6.1
ogb==1.3.6
scikit-learn==1.5.2

If you want to use our proposed Double Balancing, you only need to add the following lines of code:

# Double Balancing
if not self.args.no_pseudo and epoch >= self.args.warmup:
    # Estimate pseudo class distribution
    self.class_num_list_u = torch.tensor([(self.pred_label[self.pseudo_mask] == i).sum().item() for i in range(self.num_cls)])
    # Unsupervised loss
    loss += criterion_u(output[self.pseudo_mask], self.pred_label[self.pseudo_mask], self.class_num_list_u) * self.args.lamda

If you want to try reproducing the baseline methods, simply run:

bash run_baseline.sh

If you want to try reproducing the performance of IceBerg, simply run:

bash run_iceberg.sh

🧰 Experimental Settings

We have incorporated several baseline methods and benchmark datasets:

Baseline	Paper	Code
RW	-	-
BS	Balanced meta-softmax for long-tailed visual recognition	BalancedSoftmax
RN	Topology-imbalance learning for semi-supervised node classification	ReNode
MIX	-	-
ENS	Graphens: Neighbor-aware ego network synthesis for class-imbalanced node classification	GraphENS
SHA	Graphsha: Synthesizing harder samples for class-imbalanced node classification	GraphSHA
TAM	TAM: topology-aware margin loss for class-imbalanced node classification	TAM
BAT	Class-Imbalanced Graph Learning without Class Rebalancing	BAT

Statistic of benchmark datasets is as follows:

Dataset	Type	#nodes	#edges	#features	#classes
Cora	Homophily	2,708	10,556	1,433	7
CiteSeer	Homophily	3,327	9,104	3,703	6
PubMed	Homophily	19,717	88,648	500	3
CS	Homophily	18,333	163,788	6,805	15
Physics	Homophily	34,493	495,924	8,415	5
ogbn-arxiv	Homophily	169,343	1,116,243	128	40
CoraFull	Homophily	19,793	126,842	8,710	70
Penn94	Heterophily	41,554	1,362,229	5	2
Roman-Empire	Heterophily	22,662	32,927	300	18

⚙️ Experimental Results

Our proposed DB and IceBerg are able to achieve significant improvements conbined with several BASE balancing methods.

Due to IceBerg's outstanding ability of leverage unsupervised signals, it also achieves state-of-the-art results in few-shot node classification scenarios.

Acknowledgements

We acknowledge these excellent works for providing open-source code: GraphENS, GraphSHA, TAM, BAT, D2PT.

🤗 Citation

Please consider citing our work if you find it helpful:

@article{li2025iceberg,
  title={IceBerg: Debiased Self-Training for Class-Imbalanced Node Classification},
  author={Li, Zhixun and Chen, Dingshuo and Zhao, Tong and Wang, Daixin and Liu, Hongrui and Zhang, Zhiqiang and Zhou, Jun and Yu, Jeffrey Xu},
  journal={arXiv preprint arXiv:2502.06280},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
baselines		baselines
figures		figures
losses		losses
nets		nets
README.md		README.md
args.py		args.py
data_utils.py		data_utils.py
main.py		main.py
run_baselines.sh		run_baselines.sh
run_iceberg.sh		run_iceberg.sh
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

❄️ IceBerg: Debiased Self-Training for Class-Imbalanced Node Classification

If you like our project, please give us a star ⭐ on GitHub for the latest update.

💫 Key Features

🚀 Get Start

🧰 Experimental Settings

⚙️ Experimental Results

Acknowledgements

🤗 Citation

About

Releases

Packages

Languages

ZhixunLEE/IceBerg

Folders and files

Latest commit

History

Repository files navigation

❄️ IceBerg: Debiased Self-Training for Class-Imbalanced Node Classification

If you like our project, please give us a star ⭐ on GitHub for the latest update.

💫 Key Features

🚀 Get Start

🧰 Experimental Settings

⚙️ Experimental Results

Acknowledgements

🤗 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages