23

I was able to find the original paper on LSTM, but I was not able to find the paper that introduced "vanilla" RNNs. Where can I find it?

nbro
  • 42,615
  • 12
  • 119
  • 217
Ahsan Tarique
  • 331
  • 1
  • 2
  • 5

4 Answers4

13

The two tech reports below both call RNNs explicitly "recurrent net(work)s".

  1. Rumelhart, David E; Hinton, Geoffrey E, and Williams, Ronald J (Sept. 1985). Learning internal representations by error propagation. Tech. rep. ICS 8504. San Diego, California: Institute for Cognitive Science, University of California.

  2. Jordan, Michael I. (May 1986). Serial order: a parallel distributed processing approach. Tech. rep. ICS 8604. San Diego, California: Institute for Cognitive Science, University of California.

Jordan was a student of Rumelhart, so I would lean on identifying 1 as the paper introducing RNNs, with the caveat that the first sentence in the section "Recurrent Nets" of 1 reads:

We have thus far restricted ourselves to feedforward nets. This may seem like a substantial restriction, but as Minsky and Papert point out, there is, for every recurrent network, a feedforward network with identical behavior (over a finite period of time).

This is interesting for two reasons:

  1. After this sentence, he then goes on to show how RNNs can be unrolled and the error propagated back. Not a full-fledged BPTT yet, though.
  2. The sentence shows that the idea of recurrence (and unrolling) has been around since at least 1969.

Unfortunately, I don't have access to Minsky and Papert (1969), so I cannot follow this line any further.

nbro
  • 42,615
  • 12
  • 119
  • 217
David Nemeskey
  • 231
  • 2
  • 2
7

Hopfield networks, a special case of RNNs, were first proposed in 1982: https://www.pnas.org/content/79/8/2554

Otherwise (shameless plug, I am the author) a non-technical timeline for NLP can be found here: https://blog.exxcellent.de/ki-machine-learning

enter image description here

AlDante
  • 311
  • 2
  • 4
4

Warren McCulloch and Walter Pitts talk about recurrent neural nets in their paper McCulloch, W.S., Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115–133 (1943). https://doi.org/10.1007/BF02478259.

They finish their introduction with the paragraph:

The nervous system contains many circular paths, whose activity so regenerates the excitation of any participant neuron that reference to time past becomes indefinite, although it still implies that afferent activity has realized one of a certain class of configurations over time. Precise specification of these implications by means of recursive functions, and determination of those that can be embodied in the activity of nervous nets, completes the theory.

Their paper contains a section titled:

  1. The Theory: Nets Without Circles.

in which they introduce feed-forward (nets without cycles) and recurrent (nets with cycles) networks, and the next section, titled

  1. The Theory: Nets with Circles.

in which they prove a few theorems about recurrent neural networks.

Marvin Minsky quotes them, and discusses recurrent neural networks extensively throughout his book, Computation: Finite and Infinite Machines (1967). Prentice Hall, ISBN: 0131655639,9780131655638

I am not sure, are there earlier references.

3

According to this meta paper, "vanilla" RNN of today are based on Elman's work on networks with dynamic memory: Finding structure in time