You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This depends on how you arrange the data. e.g. data matrix D, D could be arranged as [# of data, data dimension], and D^T is [data dimension, # of data]. So, some do row normalization, and some use column normalization.
I was wondering the same thing when I first studied graph neural networks. I believe this is mainly due to the dataset's inputs. Cora uses one-hot bag-of-words as input features. Therefore the input features are very sparse and only have 1 and 0. The first linear layer will effectively act like an embedding layer nn.Embedding(), learn an embedding vector for each word in the one-hot input features. Without row normalization, effectively it is a sum of embeddings, thus the magnitude (after the first layer) can vary widely among the samples. With row normalization, effectively it is an average of embeddings, therefore preserving the magnitude among the samples. I have tried training a GCN model on Cora without row-normalization, and it performs poorly.
Hope it helps. I do realize this kind of thing is rarely mentioned anywhere, so I hope this is a logical explanation.
When we normalized features, it seemed that the columns were features and the rows were samples, so why not normalize the columns?
Could someone give me some suggestions?
The text was updated successfully, but these errors were encountered: