Most Popular

1500 questions
5
votes
1 answer

How does backpropagation work on a custom loss function whose components have magnitudes of different orders?

I want to use a custom loss function which is a weighted combination of l1 and DSSIM losses. The DSSIM loss is limited between 0 and 0.5 where as the l1 loss can be orders of magnitude greater and is so in my case. How does backpropagation work in…
5
votes
1 answer

Can we use the recursive least squares as a learning algorithm to an ADALINE?

I'm new to neural network, I study electrical engineering, and I just started working with ADALINEs. I use Matlab, and in their Documentation they cite : However, here the LMS (least mean squares) learning rule, which is much more powerful than…
Carter Nolan
  • 151
  • 5
5
votes
2 answers

Why do we need 10 bits to represent the 1000 classes in AlexNet?

I'm reading the AlexNet paper. In section 4, where the authors explain how they prevent overfitting, they mention Although the 1000 classes of ILSVRC make each training example impose 10 bits of constraint on the mapping from image to label".…
harupy
  • 53
  • 2
5
votes
1 answer

Why is Common Lisp, Python and Prolog used in artificial intelligence?

What are the advantages/ strengths and disadvantages/weakness of programming languages like Common Lisp, Python and Prolog? Why are these languages used in the domain of artificial intelligence? What type of problems related to AI are solved using…
Sanket Alurkar
  • 115
  • 1
  • 7
5
votes
3 answers

Is it possible to write an adaptive parser?

I am working on a js library which focuses on error handling. A part of the lib is a stack parser which I'd like to work in most of the environments. The hard part that there is no standard way to represent the stack, so every environment has its…
5
votes
1 answer

Proof of uniqueness of value function for MDPs with undiscounted rewards

How does one prove the uniqueness of the value function obtained from value iteration in the case of bounded and undiscounted rewards? I know that this can be proven for the discounted case pretty easily using the Banach fixed point theorem.
5
votes
1 answer

How to deal with padded inputs in a fully connected feed forward network?

I have a fully connected network that takes in a variable-length input padded with 0. However, the network doesn't seem to be learning and I am guessing that the high number of zeros in the input might have something to do with that. Are there…
silkAdmin
  • 209
  • 1
  • 3
5
votes
1 answer

In YOLO, when is $\mathbb{1}_{i j}^{\mathrm{obj}} = 1$, and what are the ground-truth labels for $x_i$ and $y_i$?

I'm trying to implement a custom version of the YOLO neural network. Originally, it was described in the paper You Only Look Once: Unified, Real-Time Object Detection (2016). I have some problems understanding the loss function they used. Basic…
5
votes
2 answers

Does NEAT require only connection genes to be marked with a global innovation number?

Does NEAT require only connection genes to be marked with a global innovation number? From the NEAT paper Whenever a new gene appears (through structural mutation), a global innovation number is incremented and assigned to that gene. It seems…
5
votes
2 answers

What is the intuition behind how word embeddings bring information to a neural network?

How is it that a word embedding layer (say word2vec) brings more insights to the neural network compared to a simple one-hot encoded layer? I understand how the word embedding carries some semantic meaning, but it seems that this information would…
5
votes
1 answer

Methods to tell if a question can be answered from a paragraph

I'm working on a project related to machine Q&A, using the SQuAD dataset. I've implemented a neural-net solution for finding answers in the provided context paragraph, but the system (obviously) struggles when given questions that are unanswerable…
5
votes
3 answers

Algorithms can be greedy. What are some other algorithmic vices?

Greedy algorithms are well known, and although useful in a local context for certain problems, and even potentially find general, global optimal solutions, they nonetheless trade optimality for shorter-term payoffs. This seems to me a good analogue…
DukeZhou
  • 6,209
  • 5
  • 27
  • 54
5
votes
2 answers

Can variations in microphones used in training set and test set impact the accuracy of speech recognition models?

If I train a speech recognition model using data collected from N different microphones, but deploy it on an unseen (test) microphone - does it impact the accuracy of the model? While I understand that theoretically an accuracy loss is likely, does…
baiduguy1
  • 51
  • 1
5
votes
1 answer

What does the argmax of the expectation of the log likelihood mean?

What does the following equation mean? What does each part of the formula represent or mean? $$\theta^* = \underset {\theta}{\arg \max} \Bbb E_{x \sim p_{data}} \log {p_{model}(x|\theta) }$$
5
votes
3 answers

Can I do deep learning with the 1060 or the 1070 ti?

Before I start, I want to let you know that I am completely new to the field of deep learning! Since I need a new graphics card either way (gaming you know) I am thinking about buying the GTX 1060 with 6GB or the 1070 ti with 8GB. Because I am not…
S.Matthias
  • 59
  • 1
  • 1
  • 2