Questions tagged [backpropagation]

For questions about the back-propagation (aka "backprop", and often abbreviated as "BP") algorithm, which is used to compute the gradient of the objective function (e.g. the mean squared error) with respect to the parameters (or weights) of the neural network, when trained with gradient descent.

272 questions
45
votes
4 answers

What is the time complexity for training a neural network using back-propagation?

Suppose that a NN contains $n$ hidden layers, $m$ training examples, $x$ features, and $n_i$ nodes in each layer. What is the time complexity to train this NN using back-propagation? I have a basic idea about how they find the time complexity of…
19
votes
1 answer

Are these two versions of back-propagation equivalent?

Just for fun, I am trying to develop a neural network. Now, for backpropagation I saw two techniques. The first one is used here and in many other places too. What it does is: It computes the error for each output neuron. It backpropagates it into…
14
votes
2 answers

Is the mean-squared error always convex in the context of neural networks?

Multiple resources I referred to mention that MSE is great because it's convex. But I don't get how, especially in the context of neural networks. Let's say we have the following: $X$: training dataset $Y$: targets $\Theta$: the set of parameters…
13
votes
1 answer

Can non-differentiable layer be used in a neural network, if it's not learned?

For example, AFAIK, the pooling layer in a CNN is not differentiable, but it can be used because it's not learning. Is it always true?
12
votes
1 answer

Why use ReLU over Leaky ReLU?

From my understanding a leaky ReLU attempts to address issues of vanishing gradients and nonzero-centeredness by keeping neurons that fire with a negative value alive. With just this info to go off of, it would seem that the leaky ReLU is just an…
12
votes
2 answers

What exactly is averaged when doing batch gradient descent?

I have a question about how the averaging works when doing mini-batch gradient descent. I think I now understood the general gradient descent algorithm, but only for online learning. When doing mini-batch gradient descent, do I have to: forward…
12
votes
5 answers

What is "backprop"?

What does "backprop" mean? Is the "backprop" term basically the same as "backpropagation" or does it have a different meaning?
kenorb
  • 10,525
  • 6
  • 45
  • 95
10
votes
2 answers

What are the learning limitations of neural networks trained with backpropagation?

In 1969, Seymour Papert and Marvin Minsky showed that Perceptrons could not learn the XOR function. This was solved by the backpropagation network with at least one hidden layer. This type of network can learn the XOR function. I believe I was once…
10
votes
2 answers

How do evolutionary algorithms have advantages over the conventional backpropagation methods?

How does employing evolutionary algorithms to design and train artificial neural networks have advantages over using the conventional backpropagation algorithms?
10
votes
1 answer

Is back-propagation applied for each data point or for a batch of data points?

I am new to deep learning and trying to understand the concept of back-propagation. I have a doubt about when the back-propagation is applied. Assume that I have a training data set of 1000 images for handwritten letters, Is back-propagation…
8
votes
3 answers

How do I know if my backpropagation is implemented correctly?

I'm working on an implementation of the backpropagation algorithm for a simple neural network, which predicts a probability of survival (1 or 0). However, I can't get it above 80%, no matter how much I try to set the right hyperparameters. I suspect…
8
votes
1 answer

What do symmetric weights mean and how does it make backpropagation biologically implausible?

I was reading a paper on alternatives to backpropagation as a learning algorithm in neural networks. In this paper, the author talks about the disadvantages of backpropagation, and one of the disadvantages stated is that backpropagation requires…
0jas
  • 83
  • 4
8
votes
3 answers

How does backprop work through the random sampling layer in a variational autoencoder?

Implementations of variational autoencoders that I've looked at all include a sampling layer as the last layer of the encoder block. The encoder learns to generate a mean and standard deviation for each input, and samples from it to get the input's…
Luke Wolcott
  • 183
  • 4
8
votes
1 answer

Which loss function should I use in REINFORCE, and what are the labels?

I understand that this is the update for the parameters of a policy in REINFORCE: $$ \Delta \theta_{t}=\alpha \nabla_{\theta} \log \pi_{\theta}\left(a_{t} \mid s_{t}\right) v_{t}, $$ where $v_t$ is usually the discounted future reward and …
8
votes
1 answer

Why do we update all layers simultaneously while training a neural network?

Very deep models involve the composition of several functions or layers. The gradient tells how to update each parameter, under the assumption that the other layers do not change. In practice, we update all of the layers simultaneously. The above…
1
2 3
18 19