Highest Voted 'validation-datasets' Questions - Artificial Intelligence Stack Exchange

4

votes

1 answer

What are "development test sets" used for?

This is a theoretical question. I am a newbie to artificial intelligence and machine learning, and the more I read the more I like this. So far, I have been reading about the evaluation of language models (I am focused on ASR), but I still don't get…

asked Mar 13 '18 at 10:04

little_mice

143
2

3

votes

4 answers

How is MNIST only providing the training and the test sets? What about the validation?

I was taught that, usually, a dataset has to be divided into three parts: Training set - for learning purposes Validation set - for picking the model which minimize the loss on this set Test test - for testing the performance of the model picked…

datasets training-datasets mnist validation-datasets

asked Oct 22 '22 at 08:45

tail

167
7

2

votes

2 answers

Why not make the training set and validation set one if their roles are similar?

If the validation set is used to tune the hyperparameters and the training set adjusts the weights, why don't they be one thing as they have a similar role, as in improving the model?

neural-networks deep-learning comparison training-datasets validation-datasets

asked Jul 28 '21 at 10:08

Omar Zayed

43
5

2

votes

1 answer

What is the difference between validation percentage and batch size?

I'm doing transfer learning using Inception on Tensorflow. The code that I used for training is https://raw.githubusercontent.com/tensorflow/hub/master/examples/image_retraining/retrain.py If you take a look at the Argument Parser section at the…

comparison tensorflow cross-validation batch-size validation-datasets

asked Jan 29 '19 at 01:57

iv67

215
3
12

1

vote

1 answer

How to perform PCA in the validation/test set?

I was using PCA on my whole dataset (and, after that, I would split it into training, validation, and test datasets). However, after a little bit of research, I found out that this is the wrong way to do it. I have few questions: Are there some…

machine-learning principal-component-analysis test-datasets validation-datasets

asked Oct 30 '18 at 13:02

LVoltz

131
1
6

1

vote

2 answers

How to choose validation data?

To train a deep learning model, I work with a dataset that is divided into train and test parts by constructors. I'm stuck on how to select some data for validation? From the train part or from the test part? It seems that dividing the test part…

deep-learning validation-datasets validation

asked Aug 03 '24 at 09:42

user153245

195
9

1

vote

1 answer

Is there validation data in K-fold cross-validation?

We know that in machine learning the dataset is divided into 3 parts: training data, validation data and test data. On the other hand, K-fold cross-validation is defined as follows: the dataset is divided into K number of different sectors. One…

machine-learning comparison cross-validation validation-datasets k-fold-cv

asked Oct 08 '23 at 12:22

DSPinfinity

1,223
4
10

1

vote

2 answers

Are the held-out datasets used for testing, validation or both?

I came across a new term "held-out corpora" and I confused regarding its usage in the NLP domain Consider the following three paragraphs from N-gram Language Models #1: held-out corpora as a non-train data For an intrinsic evaluation of a language…

natural-language-processing terminology books test-datasets validation-datasets

asked Jul 02 '21 at 01:59

hanugm

4,102
3
29
63

1

vote

1 answer

What is the theoretical basis for the use of a validation set?

Let's say we use an MLE estimator (implementation doesn't matter) and we have a training set. We assume that we have sampled the training set from a Gaussian distribution $\mathcal N(\mu, \sigma^2)$. Now, we split the dataset into training,…

machine-learning cross-validation validation-datasets

asked Apr 06 '20 at 22:47

user9947

0

votes

0 answers

I dont understand this way of having a stable train/test split even after updating the dataset

from zlib import crc32 def is_id_in_test_set(identifier, test_ratio): return crc32(np.int64(identifier)) < test_ratio * 2**32 def split_data_with_id_hash(data, test_ratio, id_column): ids = data[id_column] in_test_set =…

machine-learning testing validation-datasets

asked Jan 30 '24 at 19:54

samsamradas

101

0

votes

2 answers

Is it legitimate to train a model on a benchmark dataset and use this model only for labeling another datasets

To assess our deep learning models (CNN) we have labeled a big benchmark dataset (it was labeled by specialists so it is kind of Ideal). I of course know that we do not want to train new models using the benchmark dataset since it would be "cheating…

deep-learning benchmarks validation-datasets

asked Jan 28 '24 at 17:47

Igor

303
1
11

0

votes

1 answer

Datasets input at model.fit produce unexpected results of training loss vs validation loss

Im trying to train a neural network (VAE) using tensorflow and Im getting different results based on the type of input in the model.fit. When I input arrays I get normal difference between the validation loss and the total loss. When I input a…

tensorflow datasets variational-autoencoder training-datasets validation-datasets

asked Aug 01 '22 at 09:13

user56546

-1

votes

1 answer

how to decide the optimum model?

I have split the database available into 70% training, 15% validation, and 15% test, using holdout validation. I have trained the model and got the following results: training accuracy 100%, validation accuracy 97.83%, test accuracy 96.74% In…

deep-learning test-datasets validation-datasets validation

asked Nov 04 '21 at 19:50

user50778

1
1

Questions tagged [validation-datasets]