14

I was following some examples to get familiar with TensorFlow's LSTM API, but noticed that all LSTM initialization functions require only the num_units parameter, which denotes the number of hidden units in a cell.

According to what I have learned from the famous colah's blog, the cell state has nothing to do with the hidden layer, thus they could be represented in different dimensions (I think), and then we should pass at least 2 parameters denoting both #hidden and #cell_state.

So, this confuses me a lot when trying to figure out what the TensorFlow's cells do. Under the hood, are they implemented like this just for the sake of convenience or did I misunderstand something in the blog mentioned?

dimensions illustration

nbro
  • 42,615
  • 12
  • 119
  • 217
kuixiong
  • 241
  • 2
  • 4

3 Answers3

4

I had a very similar issue as you did with the dimensions. Here's the rundown:

Every node you see inside the LSTM cell has the exact same output dimensions, including the cell state. Otherwise, you'll see with the forget gate and output gate, how could you possible do an element wise multiplication with the cell state? They have to have the same dimensions in order for that to work.

Using an example where n_hiddenunits = 256:

Output of forget gate: 256
Input gate: 256
Activation gate: 256
Output gate: 256
Cell state: 256
Hidden state: 256

Now this can obviously be problematic if you want the LSTM to output, say, a one hot vector of size 5. So to do this, a softmax layer is slapped onto the end of the hidden state, to convert it to the correct dimension. So just a standard FFNN with normal weights (no bias', because softmax). Now, also imagining that we input a one hot vector of size 5:

input size: 5
total input size to all gates: 256+5 = 261 (the hidden state and input are appended)
Output of forget gate: 256
Input gate: 256
Activation gate: 256
Output gate: 256
Cell state: 256
Hidden state: 256
Final output size: 5

That is the final dimensions of the cell.

Recessive
  • 1,446
  • 10
  • 21
1

What I understand with a layer of LSTM composed of 4 cells is depicted in the following picture: LSTM layer with 4 cells

This would explain the fact that the hidden state of the whole layer has exactly the same dimension of the hidden states (or cells).

However, what I still don't fully understand is the 'return sequence' between LSTM layers, which changes the shape from [hidden_states] to [x_dimension, hidden_states]. This is explained because usually we only care about the state of the last cell, and when connecting multiple layers, all the states of the cells are passed into the next layer. Nevertheless, I still cannot make sense of it graphically.

e.g. model = keras.models.Sequential([ keras.layers.LSTM(20, return_sequences=True, input_shape=[None, 1]), keras.layers.LSTM(20, return_sequences=True), keras.layers.TimeDistributed(keras.layers.Dense(10)) ])

1

Look at the equation for computing the hidden state as a function of the cell state and output gate: $$ h_t = \tanh(C_t)\circ o_t $$ This equation implies that the hidden state and cell state have the same dimensionality.