0

I am training GAT using a custom loss function(PU Loss) on the Cora and Citeseer dataset. My training file looks like

f1_scores = []  
N_ITER = 10
seeds = np.random.randint(1000, size=N_ITER)

for i in range(N_ITER): seed_value = seeds[i] np.random.seed(seed_value) random.seed(None) torch.manual_seed(seed_value) model = GAT().to(device) # train it # find f1 score f1_scores.append(f1)

print(np.mean(f1_scores))

When I run this file multiple times by doing

 for i in `seq 1 10`; do python train.py; done

I am getting high variance in the values (for e.g 0.43 and 0.76). I don't understand why this is happening even after taking the mean.

  1. Is this the right way to take the mean of the model's F1 scores?
  2. How to reduce this variance?

I have followed the steps mentioned here. I must use a NN. I increased the weight decay (L2) values without any success.

1 Answers1

0
  1. The seeds themselves were randomly sampled. So each time I run the script, I get a different value of the average F1 score. Now I am not sampling the seeds each time. I have fixed those values.

  2. The high variance is because of the model's sensitivity to the training data. In PU-Learning, we sparsely label(~1%) the positive data for training. Especially in the case of graph datasets, the model performance is sensitive to which nodes have been labelled.

So, variance is at least fixed for the given set of seeds. Now I get identical values each time I run the script.