2

I'd like to ask you if feature engineering is an important step for a deep learning approach.

By feature engineering I mean some advanced preprocessing steps, such as looking at histogram distributions and try to make it look like a normal distribution or, in the case of time series, make it stationary first (not filling missing values or normalizing the data).

I feel like with enough regularization, the deep learning models don't need feature engineering compared to some machine learning models (SVMs, random forests, etc.), but I'm not sure.

nbro
  • 42,615
  • 12
  • 119
  • 217
Daviiid
  • 585
  • 5
  • 17

2 Answers2

1

From what I believe, feature engineering is important, it's a part of the job of ML network designer.

Network designing involves

  • Feature engineering: What should be in the input to the network, as processed from similar or totally different data
  • Deciding network shape, layer shapes, types of neurons in layers, etc.
  • Feature engineering again (but labels), in the output, what should the output be, either regression values or classes

And possibly also tasks rather simple as mentioned in the question: filling missing values, normalising data, create pre-feeding normalisation steps in code, etc.

Dan D
  • 1,318
  • 1
  • 14
  • 39
0

No, feature engineering is not an important step for deep learning (EDIT: compared to other techniques) provided that you have enough data. If your dataset is big enough (which varies from task to task), you can perform what is called an end-to-end learning.

To further clarify, according to this article, deep neural nets trained with backpropagation algorithm are basically doing an automated feature engineering.

I feel like with enough regularization, the deep learning models don't need feature engineering compared to some machine learning models (SVMs, random forests, etc.)

That is basically correct. Beware, you need a large dataset. When a large dataset is not available, you will do some manual work (feature engineering).

Nevertheless, it is always a good idea to look at your data first!

EDIT

I would also like to quote Rich Sutton here:

We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.

Perhaps this statement is more true with Deep Learning than with previous techniques, but we are not quite there yet. And as user nbro rightfully pointed out in the comments below, you may still need to normalise your data, pre-process it, remove outliers, etc. Thus in practice, you may still need to transform your data to a certain degree, depending on many factors.

penkovsky
  • 294
  • 1
  • 10