DAY 49-100 DAYS MLCODE: LSTM

My Tech World

DAY 49-100 DAYS MLCODE: LSTM

December 30, 2018 100-Days-Of-ML-Code blog 0

In the last two blogs, we discussed RNN and RNN example, in this blog we’ll discuss the LSTM.

As per wikipedia LSTM is:

Long short-term memory (LSTM) units are units of a recurrent neural network (RNN). An RNN composed of LSTM units is often called an LSTM network (or just LSTM). A common LSTM unit is composed of a cell, an input gate, an output gateand a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell.

Wiki

As we discussed the cell in our previous blog of RNN, LSTM is also a cell-like basic cell of RNN but perform better like training converge faster and able to detect long-term dependencies in the data.

LSTM cell Architecture (Image from the book Hands-On Machine Learning with Scikit-Learn & TensorFlow)

If you see the above architecture, a status of RNN is divided into two vectors: h(t) and c(t). h(t) is like short-term state and c(t) is a long-term state of the cell.

LSTM at each stage drops some memory of C(t) by passing the states through forget gate and add the new memory via addition operation in the above image. Resultant C(t) is passed directly to the next level and same time output of Addition operation passed through Tanh function and results is passed through output gate producing the short-term state of a cell h(t).

Layers of LSTM architecture:

In the above image, we can see that LSTM structure has four fully connected layers.

  • Main Layers: A fully connected layers where we pass the input X(t) and previous state ( h(t-1)) and produce an output g(t). If you remember RNN basic structure has only one cell, unlike LSTM which has three more layers.
  • Other Layers: The other three layers are the gate controller. They use logistic function and produce output as 0 or 1.
    • The Forget Gate (f(t)): This controls which memory to drop and which to continue.
    • The Input Gate (i(t)): This controls which parts of g(t) should be added to the long-term state ( c(t)).
    • The Output Gate (o(t)): This controls which parts of the long-term states should be read as the short-term states (h(t)) and output (y(t)).

Below are the equations for calculating these values:

Equations for LSTM
Image 2 – LSTM

W(xi), W(xf), W(xo) and W(xg) are the weights of each of the four layers for their connections to the input vector X(t).

W(hi), W(hf), W(ho) and W(hg) are the weights of each of the four layers for their connections to the previous short-term state h(t-1)

B(i), B(f), B(o) and B(g) are the bias terms of each of the four layers .

In TensorFlow we can declare the LSTM like below ( We have to use BasicLSTMCell instead of BasicRNNCell) :

lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units = )

This is all about LSTM. You can find the example of sample code here.