DAY 50-100 DAYS MLCODE: GRU

My Tech World

DAY 50-100 DAYS MLCODE: GRU

December 30, 2018 100-Days-Of-ML-Code blog 0

In previous blog, we discussed the LSTM and there are other variants of LSTM and one of them is called GRU.

As per wikipedia GRU is:

Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al.[Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory (LSTM). However, GRUs have been shown to exhibit better performance on smaller datasets

Wikipedia

GRU cells are a simplified version of LSTM and required fewer parameters compared to LSTM as there is no output gate.

Fully Gated GRU cell
Fully Gated GRU cell

Initially for t = 0, the output vector h will be zero and the equation of parameters can be like below:

GRU computation
GRU computation

Where :

  • x(t): input vector
  • h(t): output vector
  • z(t): update gate vector
  • r(t): reset gate vector
  • W, U and b: parameter matrices and vector
  • Sigma(g): The original is a sigmoid function.
  • Sigma(h): The original is a hyperbolic tangent.

We can create the GRU cell using the GRUCell method of TensorFlow like below:

gru_cell = tf.contrib.rnn.GRUCell(num_units = )

Let’s create a simple example for GRU using TensorFlow:

Generate a Time series data for equation y = t sin(t)/3 + 2sin(3t)

t_min, t_max = 0, 30
resolution = 0.1
def sin_wave(t):
return (t * np.sin(t)/3 + 2 * np.sin(t*3))

Now create a function to generate the batch of time series data

def generate_batch(batch_size, n_steps):
t0 = np.random.rand(batch_size, 1) * (t_max – t_min – n_steps * resolution)
Ts = t0 + np.arange(0., n_steps + 1) * resolution
ys = sin_wave(Ts)
return ys[:, :-1].reshape(-1, n_steps, 1), ys[:, 1:].reshape(-1, n_steps, 1)

Plot the graph to display the time series data

t = np.linspace(t_min, t_max, int((t_max – t_min) / resolution))

n_steps = 25
time_instance = np.linspace(12.2, 12.2 + resolution * (n_steps + 1), n_steps + 1)

plt.figure(figsize=(11,4))
plt.subplot(121)
plt.title(“A time series (generated)”, fontsize=14)
plt.plot(t, sin_wave(t), label=r”$t . \sin(t) / 3 + 2 . \sin(3t)$”)
plt.plot(time_instance[:-1], sin_wave(time_instance[:-1]), “b-“, linewidth=3, label=”A training instance”)
plt.legend(loc=”lower left”, fontsize=14)
plt.axis([0, 30, -17, 13])
plt.xlabel(“Time”)
plt.ylabel(“Value”)

plt.subplot(122)
plt.title(“A training instance”, fontsize=14)
plt.plot(time_instance[:-1], sin_wave(time_instance[:-1]), “bo”, markersize=10, label=”instance”)
plt.plot(time_instance[1:], sin_wave(time_instance[1:]), “w*”, markersize=10, label=”target”)
plt.legend(loc=”upper left”)
plt.xlabel(“Time”)

Time Series Data
Time Series data

Declare the parameters for the model

n_steps = 25
n_inputs = 1
n_outputs = 1
n_neurons = 150
n_layers = 3
learning_rate = 0.001

Now declare the Input and Output place holder

tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

Declare the GRU cells and the output layers

gru_cells = [tf.contrib.rnn.GRUCell(num_units=n_neurons)
for layer in range(n_layers)]
multi_cell = tf.contrib.rnn.MultiRNNCell(gru_cells)
outputs, states = tf.nn.dynamic_rnn(multi_cell, X, dtype=tf.float32)
top_layer_h_state = states[-1]
logits = tf.layers.dense(top_layer_h_state, n_outputs, name=”softmax”)

Declare the loss function and Optimizer for the Model

loss = tf.reduce_mean(tf.square(outputs – y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

Initialize the Global Variables

init = tf.global_variables_initializer()

Now train the model using the data prepared by function and save the created model in disk

n_iterations = 1500
batch_size = 50
train_keep_prob = 0.5
saver = tf.train.Saver()
with tf.Session() as sess:
init.run()
for iteration in range(n_iterations):
X_batch, y_batch = generate_batch(batch_size, n_steps)
_, mse = sess.run([training_op, loss],
feed_dict={X: X_batch, y: y_batch})
if iteration % 100 == 0:
print(iteration, “Training MSE:”, mse)

saver.save(sess, “./my_time_series_model”)

Output:
0 Training MSE: 16.012484
100 Training MSE: 10.249619
200 Training MSE: 10.387716
300 Training MSE: 11.640218
400 Training MSE: 12.431163
500 Training MSE: 12.452735
600 Training MSE: 13.922672
700 Training MSE: 11.461917
800 Training MSE: 10.02124
900 Training MSE: 9.462717
1000 Training MSE: 8.341917
1100 Training MSE: 10.290158
1200 Training MSE: 12.999584
1300 Training MSE: 12.173523
1400 Training MSE: 11.961574

Predict the value using the test data generate in first section of code

with tf.Session() as sess:
saver.restore(sess, “./my_time_series_model”)

X_new = sin_wave(np.array(time_instance[:-1].reshape(-1, n_steps, n_inputs)))
y_pred = sess.run(outputs, feed_dict={X: X_new})

In conclusion, our new model using GRU does not looks good and we have to tune hyper parameters so that it will fit our data. you can find the code here and I’ll update parameters so that it can fit our data better.