Questions tagged [data-preprocessing]

For questions related to the concept of data pre-processing, which includes, for example, cleaning, instance selection, normalization, transformation, feature extraction or selection.

For more info, see e.g. https://en.wikipedia.org/wiki/Data_pre-processing.

165 questions
10
votes
2 answers

How can I encode angle data to train neural networks?

I am training a neural network where the target data is a vector of angles in radians (between $0$ and $2\pi$). I am looking for study material on how to encode this data. Can you supply me with a book or research paper that covers this topic…
7
votes
1 answer

How to solve the problem of too big activations when using genetic algorithms to train neural networks?

I am trying to create a fixed-topology MLP from scratch (with C#), which can solve some simple problems, such as the XOR problem and MNIST classification. The network will be trained purely with genetic algorithms instead of back-propagation. Here…
7
votes
2 answers

Does data skew matter in classification problem?

I'm working on an image classification problem using a neural network. In the training data set, 90% of the samples fall into 10% of all categories, while 10% of the sample fall into the other 90% categories. So an example is not evenly distributed…
6
votes
1 answer

How should I deal with variable-length inputs for neural networks?

I am a very beginner in the field of AI. I am basically a Pharma Professional without much coding experience. I use GUI-based tools for the neural network. I am trying to develop an ANN that receives as input a protein sequence and produces as…
6
votes
2 answers

How to deal with images of different sizes, which need to be passed to a model of fixed input size, without losing details and spatial information?

I have the following problem while using convolutional neural networks to detect forgeries: Resizing the image to fit the required input size may not be a good way because the forgery detection largely relies on the details of images, for example,…
5
votes
1 answer

Does the term "data augmentation" imply increasing the training dataset?

I have a manuscript that has been reviewed and one of the reviewers commented on my use of the term " data augmentation", saying that it might not be the appropriate term in my case (explained below). I collected a large dataset of short audio files…
5
votes
1 answer

In OCR, how should I deal with the warped text on the sides of oval objects?

Consider an image that contains one can (or bottle, or any similar oval object), which has texts all over it. In the image below, I have many bottles, but you can assume that each image only contains one such object. As we can see, in each can, the…
5
votes
1 answer

What is "conditioning" on a feature?

On page 98 of Jet Substructure at the Large Hadron Collider: A Review of Recent Advances in Theory and Machine Learning the author writes; Redacted phase space: Studying the distribution of inputs and the network performance after conditioning on…
4
votes
3 answers

Would this relatively small dataset be enough to train a CNN?

Scenario: I am trying to create a dataset with images of choice for different animal classes. I am going to train those images for classification using CNN. Problem: Let's assume I somehow don't have the privilege to collect too many images and was…
4
votes
3 answers

Is pre-processing used in deep learning?

I'm new to deep learning. I wanted to know: do we use pre-processing in deep learning? Or it is only used in machine learning. I searched for it and its methods on the internet, but I didn't find a suitable answer.
Pablo
  • 283
  • 1
  • 5
4
votes
1 answer

How to fill missing values in a dataset where some properties can be inputs and outputs?

I have a dataset with missing values, I would like to use machine learning methods to fill. In more detail, there are $n$ individuals, for which up to 10 properties are provided, all numerical. The fact is, there are no individuals for which all…
4
votes
1 answer

How should I deal with variable input sizes for a neural network classifier?

I am currently working on a project, where I have a sensor in a shoe that records the $X, Y, Z$ axes, from an acceleration and gyroscope sensor. Every millisecond, I get 6 data points. Now, the goal is, if I do an action, such a jumping or kicking,…
3
votes
1 answer

What is the impact of scaling the features on the performance of the model?

I am trying to generate a model that uses several physicochemical properties of a molecule (including number of atoms, number of rings, volume, etc.) to predict a numeric value $Y$. I would like to use PLS Regression, and I understand that…
3
votes
1 answer

Why is the short-time Fourier transform used for preprocessing audio samples?

I've been told this is how I should be preprocessing audio samples, but what information does this method actually give me? What are the alternatives, and why shouldn't I use them?
3
votes
1 answer

How are sentences numerically encoded before passing them to neural networks?

I'm trying to understand NLP, how sentences can be used as input output in neural network architecture. As we know ANN is only compatible with number data. That's mean the sentences must be convert to number, right? Suppose I have this…
1
2 3
10 11