0

I'm just starting out in the world of machine learning and I really like Rust. I've been testing and learning more. I'd like to thank you for your support in advance.

I took the example of transfer training and did some tests, but I can't understand why I have high accuracy in training and low accuracy in testing using the same validation base. Can anyone understand why? I studied overfit, but it doesn't seem to be the case because I'm using the same validation base without new data.

use std::env;
use std::error::Error;
use std::path::PathBuf;

use anyhow::{ bail, Result }; use tch::nn::{ self, ModuleT, OptimizerConfig, VarStore }; use tch::vision::{ imagenet, resnet }; use tch::{ Device, Kind, Tensor }; pub fn bee_test() -> Result<(), Box<dyn Error>> { tch::manual_seed(123); let manifest_dir = env::var("CARGO_MANIFEST_DIR")?; let project_dir = PathBuf::from(manifest_dir);

let dataset_path = project_dir.join(&quot;data/hymenoptera_data&quot;);
let dataset = imagenet::load_from_dir(dataset_path)?;
println!(&quot;{dataset:?}&quot;);

let model_path = project_dir.join(&quot;data/bee.ot&quot;);
println!(&quot;Caminho do modelo: {:?}&quot;, model_path);

let device = Device::cuda_if_available();
let mut vs = VarStore::new(device);
vs.load(model_path.as_path()).map_err(|op| {
    format!(&quot;Erro ao carregar o modelo: {:?}&quot;, op);
    op
})?;

let net = resnet::resnet34_no_final_layer(&amp;vs.root());
let linear = nn::linear(vs.root(), 512, 2, Default::default());

let net2: nn::Sequential = nn
    ::seq()
    .add_fn(move |xs| net.forward_t(xs, false))
    .add(linear);

let predicted = net2.forward_t(&amp;dataset.test_images, false);
let probabilities = predicted.softmax(-1, tch::Kind::Float);
probabilities.print();

let class = predicted.argmax(-1, false);
class.print();

let test_accuracy = predicted.accuracy_for_logits(&amp;dataset.test_labels);

println!(&quot;Test Accuracy: {:.2}%&quot;, 100.0 * f64::try_from(test_accuracy)?);

Ok(())

}

pub fn bee_train() -> Result<()> { tch::manual_seed(123);

let manifest_dir = env::var(&quot;CARGO_MANIFEST_DIR&quot;)?;
let project_dir = PathBuf::from(manifest_dir);

let dataset_path = project_dir.join(&quot;data/hymenoptera_data&quot;);
let model_path = project_dir.join(&quot;data/resnet34.ot&quot;);

// Load the dataset and resize it to the usual imagenet dimension of 224x224.
let dataset = imagenet::load_from_dir(dataset_path)?;
println!(&quot;{dataset:?}&quot;);

// Create the model and load the weights from the file.
let mut vs = tch::nn::VarStore::new(tch::Device::Cpu);
let net = resnet::resnet34_no_final_layer(&amp;vs.root());
vs.load(model_path)?;

// Pre-compute the final activations.
let train_images = tch::no_grad(|| dataset.train_images.apply_t(&amp;net, false));
let test_images = tch::no_grad(|| dataset.test_images.apply_t(&amp;net, false));
println!(&quot;Train images shape: {:?}&quot;, train_images.size());
println!(&quot;Test images shape: {:?}&quot;, test_images.size());

// Initialize the linear layer and optimizer
let linear = nn::linear(vs.root(), 512, dataset.labels, Default::default());
let mut sgd = nn::Sgd::default().build(&amp;vs, 1e-3)?;

for epoch_idx in 1..6000 {
    let predicted = train_images.apply(&amp;linear);
    let loss = predicted.cross_entropy_for_logits(&amp;dataset.train_labels);
    sgd.backward_step(&amp;loss);

    let test_accuracy = test_images.apply(&amp;linear).accuracy_for_logits(&amp;dataset.test_labels);
    println!(
        &quot;Epoch {}: Train Loss = {:.4}, Test Accuracy = {:.2}%&quot;,
        epoch_idx,
        f64::try_from(loss)?,
        100.0 * f64::try_from(test_accuracy)?
    );
}

let save_model_path = project_dir.join(&quot;data/bee.ot&quot;);
vs.save(save_model_path)?;
Ok(())

}

Using tch-rs

Results obtained in training Epoch 5999: Train Loss = 0.0148, Test Accuracy = 96.75%

Results obtained in test Test Accuracy: 46.10%

I would like to understand the reasons for low accuracy, how can I improve it, what would be the paths?

1 Answers1

0

The significant gap between training accuracy (96.75%) and test accuracy (46.10%) indicates overfitting, even though you are using the same validation set. Here's why this might be happening and some steps you can take to improve:

1. Reasons for Low Test Accuracy:

Overfitting Despite Same Dataset: Even when using the same dataset for training and validation, overfitting can occur if the model "memorizes" specific features in the training set rather than generalizing. Transfer learning models, such as ResNet34, are very powerful, and if you train them for many epochs on a small or imbalanced dataset, the model can overfit to small nuances in the training data, leading to poor generalization.

Imbalanced Dataset: If the dataset you're working with (e.g., bees vs. ants) is imbalanced (i.e., more examples of one class than the other), the model might favor the dominant class, resulting in poor test accuracy for the minority class.

Excessive Epochs: Training for 6000 epochs could lead to severe overfitting. If the test accuracy starts to drop after some epochs, that’s a strong indicator of overfitting.

Too Few Parameters in Final Layer: The final linear layer may not have enough capacity or maybe under-regularized, which can lead to poor performance during inference. Alternatively, the weights might not be getting updated sufficiently during training due to a suboptimal learning rate or gradient-related issues.

2. How to Improve Test Accuracy:

Early Stopping: Implement early stopping to stop training when the validation accuracy starts to decrease, thus preventing overfitting.

Regularization:

Dropout: Add a dropout layer before the final linear layer to reduce overfitting. This prevents the model from becoming too reliant on specific neurons during training. Weight Decay: Add weight decay (L2 regularization) in the optimizer to penalize large weights, forcing the model to generalize better. Reduce Epochs: Try reducing the number of training epochs. A typical rule of thumb for transfer learning is fewer epochs (e.g., 10-50), especially since you’re fine-tuning on top of a pre-trained ResNet model.

Learning Rate Tuning: Try adjusting the learning rate. The rate of 1e-3 you are using may be too high for fine-tuning. Reduce the learning rate (e.g., 1e-4 or 1e-5) to fine-tune the pre-trained model more delicately.

Data Augmentation: Even though you're using the same dataset for both training and validation, augmenting the dataset (e.g., random crops, flips, rotations) can help the model generalize better by exposing it to different variations of the same images. In Rust, you can use libraries such as image or opencv for this purpose.

Balanced Sampling: If your dataset is imbalanced, you can either oversample the minority class (in this case, possibly ants) or undersample the majority class (bees) during training.

3. Path Forward:

Monitor Validation Loss/Accuracy: Print both training and validation accuracy/loss during each epoch and observe when the validation accuracy starts to plateau or drop while training accuracy keeps rising.

Start with a Smaller Number of Epochs: Test with smaller epoch ranges (e.g., 50-100) and observe test accuracy patterns.

Transfer Learning Best Practices:

Freeze earlier layers (those closer to the input) of ResNet initially, and only fine-tune the last few layers (those closer to the output). Gradually unfreeze more layers as needed to avoid overfitting early on.

Few suggestion for improvement

=> Add Dropout Layer:

let net2: nn::Sequential = nn::seq()
    .add_fn(move |xs| net.forward_t(xs, false))
    .add(nn::dropout(vs.root(), 0.5))  // Dropout added
    .add(linear);

=> Add Learning Rate Scheduling:

let mut optimizer = nn::Sgd::default()
    .build(&vs, 1e-4)?  // Lower learning rate for fine-tuning
    .weight_decay(1e-5);  // Add weight decay to prevent large weights
Keval Pandya
  • 646
  • 1
  • 3
  • 12