DAY 21-100 DAYS MLCODE: Random Forest
In previous blog, we completed the the decision tree and in the blog, we’ll start working with Random Forest algorithm in this blog.
As per Wikipedia:
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees
Let’s create a simple example of random forest before going into detail study. Like previous blog, we are going to use the moons data of SciKit-Learns library
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=1000, noise=0.35, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
Let’s train our classifier using RandomForestClassifier class of Sci-Kit Learn.
from sklearn.ensemble import RandomForestClassifier
rnd_frst_clf = RandomForestClassifier(random_state=42)
rnd_frst_clf.fit(X_train, y_train)
Output:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion=’gini’, max_depth=None, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1, oob_score=False, random_state=42, verbose=0, warm_start=False)
Let’s predict the test data set so that we can measure the accuracy.
y_pred = rnd_frst_clf.predict(X_test)
Let’s measure the accuracy of our model:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
Output : 0.876
Not bad, our simple Random Forest algorithm is able to produce the accuracy of 87.6 %.
In conclusion, Random Forest is very powerful and fancy machine algorithm and we’ll discuss more about this algorithm in future blog. You can find today’s code here.