DAY 33-100 DAYS MLCODE: Implement Mini-Batch Gradient using Tensorflow

My Tech World

DAY 33-100 DAYS MLCODE: Implement Mini-Batch Gradient using Tensorflow

December 12, 2018 100-Days-Of-ML-Code blog 0

Mini-Batch Gradient descent is a form of gradient descent where the algorithm splits the training dataset into small batches and used these batches to calculate the loss and update the coefficient based on the outcome.

Mini-Batch Gradient descent is lies between the very robust stochastic gradient descent and a very efficient batch gradient descent. 

In previous blog we started using Tensorflow library and in this blog, we’ll use the Tensorflow to implement the mini-batch gradient descent.

Let’s download the Moon dataset of SciKit Learn library.

n_samples = 1000
from sklearn.datasets import make_moons
X_moon, y_moon = make_moons(n_samples=n_samples, noise=.1, random_state=42)

Visualize the dataset using matplotlib.

plt.plot(X[y == 1, 0], X[y == 1, 1], ‘go’, label=”Positive”)
plt.plot(X[y == 0, 0], X[y == 0, 1], ‘r^’, label=”Negative”)
plt.legend()
plt.grid(None)
plt.rcParams[‘axes.facecolor’] = ‘white’

Moon datasets
Moon datasets

Since our data should look like f(x) = WX + B where w is weight and B is bias. Let’s add the biases in the data as x(0). We already have two features x(1) and X(2)

X_with_bias = np.c_[np.ones((n_samples, 1)), X]

Verify the data after adding the bias

X_with_bias[:5]

Output: array([[ 1. , -0.05146968, 0.44419863], [ 1. , 1.03201691, -0.41974116], [ 1. , 0.86789186, -0.25482711], [ 1. , 0.288851 , -0.44866862], [ 1. , -0.83343911, 0.53505665]]) 

Since tensorflow takes labels as a 2D array with one column ( vector), convert the y into a 2D array

y_vector = y_moons.reshape(-1, 1)
y_vector.shape

Output: (1000, 1)

Now we have values of everything, let’s prepare the testing and training data using SciKit Learn library

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X_with_bias, y_vector, test_size=0.20, random_state=42)

Verify the size of test and train data

print(f”Train data size is {X_train.shape}, train data label size is {y_train.shape}”)
print(f”Test data size is {X_test.shape}, test data label size is {y_test.shape}”)

Output: Train data size is (800, 3), train data label size is (800, 1)
               Test data size is (200, 3), test data label size is (200, 1)

Now, lets create a simple function to select the random batches from the training data

def select_random_batch(X, y, batch_size): #Take Training data and Batch size as input
idx = np.random.randint(0, len(X), batch_size) #Get the random index number
X_batch = X[idx]
y_batch = y[idx]
return X_batch, y_batch

Before we use the above function, let’s test the function select_random_batch

select_random_batch(X_train, y_train, 5)

Output: (array([[ 1. , -0.77702017, 0.49329954], [ 1. , -0.04027286, 0.15633953], [ 1. , 0.43318391, 0.91495211], [ 1. , -1.04128339, 0.11138847], [ 1. , 1.87285957, 0.00337095]]), array([[0], [1], [0], [0], [1]]))

Let’s perfrom the logistic regression on data. This model first calculate the weighted sum of the inputs and then we’ll apply the sigmoid function to the result . This will help us to calculate the estimated probability for the positive class. p̂ =hθ(x)=σ(θTx).

Theta is the vector containing the bias theta(0) and the weight theta(1)…theta(n). X is x(0) which is 1 and the remaning feature of the training data x(1), x(2)

Let’s construct the model first

input_no =2 #No of features
#Placeholders
X = tf.placeholder(tf.float32, shape =(None, input_no + 1), name = ‘X’ ) #Place holder for training data X
y = tf.placeholder(tf.float32, shape =(None, 1), name = ‘y’ ) #Place holder for training label data y
theta = tf.Variable(tf.random_uniform([input_no + 1, 1], -1.0, 1.0, seed=42), name=”theta”) #Initialize the Random value for theta
#Operations
logits = tf.matmul(X, theta, name=”logits”) #Tensorflow operations
y_proba = tf.sigmoid(logits) #Calculate the probability
loss = tf.losses.log_loss(y, y_proba) #Loss function

Define the Optimizer for the model

learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

Now our model is ready, let’s execute the model. First initialize the variable

init = tf.global_variables_initializer()

Now train the model by executing the created graph

epochs = 1000
batch_size = 50
batches = int(np.ceil(n_samples / batch_size))

with tf.Session() as sess:
sess.run(init)

for epoch in range(epochs):
for idx in range(batches):
X_batch, y_batch = select_random_batch(X_train, y_train, batch_size)
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
loss_val = loss.eval({X: X_test, y: y_test})
if epoch % 100 == 0:
print(“Epoch:”, epoch, “\tLoss:”, loss_val)

y_proba_val = y_proba.eval(feed_dict={X: X_test, y: y_test})

Output:

Epoch: 0Loss: 0.8889945
Epoch: 100Loss: 0.35740235
Epoch: 200Loss: 0.30922142
Epoch: 300Loss: 0.2879143
Epoch: 400Loss: 0.2755315
Epoch: 500Loss: 0.2681894
Epoch: 600Loss: 0.262908
Epoch: 700Loss: 0.2594868
Epoch: 800
Loss: 0.25684857
Epoch: 900Loss: 0.25477588

Let’s assume that if the value is greater than .5 then its +ive class, check the predicted values

y_pred = (y_proba_val >= 0.5)
y_pred[:5]

Output: array([[ True],
[False],
[ True],
[False],
[ True]])

Let’s evaluate the performance of the model by computing model’s precision and recall:

from sklearn.metrics import precision_score, recall_score
precision_score(y_test, y_pred)

Output: 0.8653846153846154

recall_score(y_test, y_pred)

Output: 0.9

Visualize the predicted values

y_pred_idx = y_pred.reshape(-1) # a 1D array rather than a column vector
plt.plot(X_test[y_pred_idx, 1], X_test[y_pred_idx, 2], ‘go’, label=”Positive”)
plt.plot(X_test[~y_pred_idx, 1], X_test[~y_pred_idx, 2], ‘r^’, label=”Negative”)
plt.legend()

Predicted values of moon dataset
Predicted value of Moon dataset

Above plots looks okay except the +ive section does not looks like the moon.

Conclusion

In conclusion, the above model is just an example and performance is not better than linear model. We’ll complete this tomorrow by adding a higher degree of features. You can find the entire codes here.