DAY 33-100 DAYS MLCODE: Implement Mini-Batch Gradient using Tensorflow
Mini-Batch G
Mini-Batch Gradient descent is lies between the very robust stochastic gradient descent and a very efficient batch gradient descent.
In
Let’s download the Moon dataset of SciKit Learn library.
n_samples = 1000
from sklearn.datasets import make_moons
X_moon, y_moon = make_moons(n_samples=n_samples, noise=.1, random_state=42)
Visualize the dataset using matplotlib.
plt.plot(X[y == 1, 0], X[y == 1, 1], ‘go’, label=”Positive”)
plt.plot(X[y == 0, 0], X[y == 0, 1], ‘r^’, label=”Negative”)
plt.legend()
plt.grid(None)
plt.rcParams[‘axes.facecolor’] = ‘white’
Since our data should look like f(x) = WX + B where w is weight and B is bias. Let’s add the biases in the data as x(0). We already have two features x(1) and X(2)
X_with_bias = np.c_[np.ones((n_samples, 1)), X]
Verify the data after adding the bias
X_with_bias[:5]
Output: array([[ 1. , -0.05146968, 0.44419863], [ 1. , 1.03201691, -0.41974116], [ 1. , 0.86789186, -0.25482711], [ 1. , 0.288851 , -0.44866862], [ 1. , -0.83343911, 0.53505665]])
Since
y_vector = y_moons.reshape(-1, 1)
y_vector.shape
Output: (1000, 1)
Now we have values of everything, let’s prepare the testing and training data using SciKit Learn library
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X_with_bias, y_vector, test_size=0.20, random_state=42)
Verify the size of test and train data
print(f”Train data size is {X_train.shape}, train data label size is {y_train.shape}”)
print(f”Test data size is {X_test.shape}, test data label size is {y_test.shape}”)
Output: Train data size is (800, 3), train data label size is (800, 1)
Test data size is (200, 3), test data label size is (200, 1)
Now, lets create a simple function to select the random batches from the training data
def select_random_batch(X, y, batch_size): #Take Training data and Batch size as input
idx = np.random.randint(0, len(X), batch_size) #Get the random index number
X_batch = X[idx]
y_batch = y[idx]
return X_batch, y_batch
Before we use the above function, let’s test the function select_random_batch
select_random_batch(X_train, y_train, 5)
Output: (array([[ 1. , -0.77702017, 0.49329954], [ 1. , -0.04027286, 0.15633953], [ 1. , 0.43318391, 0.91495211], [ 1. , -1.04128339, 0.11138847], [ 1. , 1.87285957, 0.00337095]]), array([[0], [1], [0], [0], [1]]))
Let’s perfrom the logistic regression on data. This model first calculate the weighted sum of the inputs and then we’ll apply the sigmoid function to the result . This will help us to calculate the estimated probability for the positive class. p̂ =hθ(x)=σ(θTx).
Theta is the vector containing the bias theta(0) and the weight theta(1)…theta(n). X is x(0) which is 1 and the remaning feature of the training data x(1), x(2)
Let’s construct the model first
input_no =2 #No of features
#Placeholders
X = tf.placeholder(tf.float32, shape =(None, input_no + 1), name = ‘X’ ) #Place holder for training data X
y = tf.placeholder(tf.float32, shape =(None, 1), name = ‘y’ ) #Place holder for training label data y
theta = tf.Variable(tf.random_uniform([input_no + 1, 1], -1.0, 1.0, seed=42), name=”theta”) #Initialize the Random value for theta
#Operations
logits = tf.matmul(X, theta, name=”logits”) #Tensorflow operations
y_proba = tf.sigmoid(logits) #Calculate the probability
loss = tf.losses.log_loss(y, y_proba) #Loss function
Define the Optimizer for the model
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
Now our model is ready, let’s execute the model. First initialize the variable
init = tf.global_variables_initializer()
Now train the model by executing the created graph
epochs = 1000
batch_size = 50
batches = int(np.ceil(n_samples / batch_size))
with tf.Session() as sess:
sess.run(init)
for epoch in range(epochs):
for idx in range(batches):
X_batch, y_batch = select_random_batch(X_train, y_train, batch_size)
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
loss_val = loss.eval({X: X_test, y: y_test})
if epoch % 100 == 0:
print(“Epoch:”, epoch, “\tLoss:”, loss_val)
y_proba_val = y_proba.eval(feed_dict={X: X_test, y: y_test})
Output:
Epoch: 0 | Loss: 0.8889945 |
Epoch: 100 | Loss: 0.35740235 |
Epoch: 200 | Loss: 0.30922142 |
Epoch: 300 | Loss: 0.2879143 |
Epoch: 400 | Loss: 0.2755315 |
Epoch: 500 | Loss: 0.2681894 |
Epoch: 600 | Loss: 0.262908 |
Epoch: 700 | Loss: 0.2594868 |
Epoch: 800 | Loss: 0.25684857 |
Epoch: 900 | Loss: 0.25477588 |
Let’s assume that if the value is greater than .5 then its +ive class, check the predicted values
y_pred = (y_proba_val >= 0.5)
y_pred[:5]
Output: array([[ True],
[False],
[ True],
[False],
[ True]])
Let’s evaluate the performance of the model by computing model’s precision and recall:
from sklearn.metrics import precision_score, recall_score
precision_score(y_test, y_pred)
Output: 0.8653846153846154
recall_score(y_test, y_pred)
Output: 0.9
Visualize the predicted values
y_pred_idx = y_pred.reshape(-1) # a 1D array rather than a column vector
plt.plot(X_test[y_pred_idx, 1], X_test[y_pred_idx, 2], ‘go’, label=”Positive”)
plt.plot(X_test[~y_pred_idx, 1], X_test[~y_pred_idx, 2], ‘r^’, label=”Negative”)
plt.legend()
Above plots looks okay except the +ive section does not looks like the moon.
Conclusion
In conclusion, the above model is just an example and performance is not better than linear model. We’ll complete this tomorrow by adding a higher degree of features. You can find the entire codes here.
#100DaysofMLCode #logisticregression Gradient Descent mini-batch gradient descent tensorflow