DAY 50-100 DAYS MLCODE: GRU
In previous blog, we discussed the LSTM and there are other variants of LSTM and one of them is called GRU.
As per wikipedia GRU is:
Gated recurrent units (GRUs) are a gating mechanism in recurrent neural networks, introduced in 2014 by Kyunghyun Cho et al.[Their performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory (LSTM). However, GRUs have been shown to exhibit better performance on smaller datasets
Wikipedia
GRU cells are a simplified version of LSTM and required fewer parameters compared to LSTM as there is no output gate.
Initially for t = 0, the output vector h will be zero and the equation of parameters can be like below:
Where :
- x(t): input vector
- h(t): output vector
- z(t): update gate vector
- r(t): reset gate vector
- W, U and b: parameter matrices and vector
- Sigma(g): The original is a sigmoid function.
- Sigma(h): The original is a hyperbolic tangent.
We can create the GRU cell using the GRUCell method of TensorFlow like below:
gru_cell = tf.contrib.rnn.GRUCell(num_units =
Let’s create a simple example for GRU using TensorFlow:
Generate a Time series data for equation y = t sin(t)/3 + 2sin(3t)
t_min, t_max = 0, 30
resolution = 0.1
def sin_wave(t):
return (t * np.sin(t)/3 + 2 * np.sin(t*3))
Now create a function to generate the batch of time series data
def generate_batch(batch_size, n_steps):
t0 = np.random.rand(batch_size, 1) * (t_max – t_min – n_steps * resolution)
Ts = t0 + np.arange(0., n_steps + 1) * resolution
ys = sin_wave(Ts)
return ys[:, :-1].reshape(-1, n_steps, 1), ys[:, 1:].reshape(-1, n_steps, 1)
Plot the graph to display the time series data
t = np.linspace(t_min, t_max, int((t_max – t_min) / resolution))
n_steps = 25
time_instance = np.linspace(12.2, 12.2 + resolution * (n_steps + 1), n_steps + 1)
plt.figure(figsize=(11,4))
plt.subplot(121)
plt.title(“A time series (generated)”, fontsize=14)
plt.plot(t, sin_wave(t), label=r”$t . \sin(t) / 3 + 2 . \sin(3t)$”)
plt.plot(time_instance[:-1], sin_wave(time_instance[:-1]), “b-“, linewidth=3, label=”A training instance”)
plt.legend(loc=”lower left”, fontsize=14)
plt.axis([0, 30, -17, 13])
plt.xlabel(“Time”)
plt.ylabel(“Value”)
plt.subplot(122)
plt.title(“A training instance”, fontsize=14)
plt.plot(time_instance[:-1], sin_wave(time_instance[:-1]), “bo”, markersize=10, label=”instance”)
plt.plot(time_instance[1:], sin_wave(time_instance[1:]), “w*”, markersize=10, label=”target”)
plt.legend(loc=”upper left”)
plt.xlabel(“Time”)
Declare the parameters for the model
n_steps = 25
n_inputs = 1
n_outputs = 1
n_neurons = 150
n_layers = 3
learning_rate = 0.001
Now declare the Input and Output place holder
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
Declare the GRU cells and the output layers
gru_cells = [tf.contrib.rnn.GRUCell(num_units=n_neurons)
for layer in range(n_layers)]
multi_cell = tf.contrib.rnn.MultiRNNCell(gru_cells)
outputs, states = tf.nn.dynamic_rnn(multi_cell, X, dtype=tf.float32)
top_layer_h_state = states[-1]
logits = tf.layers.dense(top_layer_h_state, n_outputs, name=”softmax”)
Declare the loss function and Optimizer for the Model
loss = tf.reduce_mean(tf.square(outputs – y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
Initialize the Global Variables
init = tf.global_variables_initializer()
Now train the model using the data prepared by function and save the created model in disk
n_iterations = 1500
batch_size = 50
train_keep_prob = 0.5
saver = tf.train.Saver()
with tf.Session() as sess:
init.run()
for iteration in range(n_iterations):
X_batch, y_batch = generate_batch(batch_size, n_steps)
_, mse = sess.run([training_op, loss],
feed_dict={X: X_batch, y: y_batch})
if iteration % 100 == 0:
print(iteration, “Training MSE:”, mse)
saver.save(sess, “./my_time_series_model”)
Output:
0 Training MSE: 16.012484
100 Training MSE: 10.249619
200 Training MSE: 10.387716
300 Training MSE: 11.640218
400 Training MSE: 12.431163
500 Training MSE: 12.452735
600 Training MSE: 13.922672
700 Training MSE: 11.461917
800 Training MSE: 10.02124
900 Training MSE: 9.462717
1000 Training MSE: 8.341917
1100 Training MSE: 10.290158
1200 Training MSE: 12.999584
1300 Training MSE: 12.173523
1400 Training MSE: 11.961574
Predict the value using the test data generate in first section of code
with tf.Session() as sess:
saver.restore(sess, “./my_time_series_model”)
X_new = sin_wave(np.array(time_instance[:-1].reshape(-1, n_steps, n_inputs)))
y_pred = sess.run(outputs, feed_dict={X: X_new})
In conclusion, our new model using GRU does not