Tag: #ReinforcementLearning

My Tech World

DAY 68-100 DAYS MLCODE: Deep Q-Learning example

In the previous few blogs, we are discussing various Reinforcement Learning, in this blog let’s try to play Ms. Pac-Man game using Deep Q-Learning. We’ll use OpenAi gym environments to create Ms. Pac-Man . Below is the piece of code which will create the environment. We’ll use DQN to train our agent. Observation (obs) is nothing but…
Read more


January 17, 2019 0

DAY 66-100 DAYS MLCODE: Deep Q-Networks

In the previous few blogs, we are discussing various Reinforcement Learning, in this blog let’s discuss about Deep Q-Networks. On the challenges of Q-Learning is that it does not work well with a large MPD problem where we have multiple actions and states. Consider a game which has a matrix of 25* 10 with each state has…
Read more


January 16, 2019 0

DAY 65-100 DAYS MLCODE: RL-BipedalWalker-v2

In the previous few blogs, we are discussing various RL examples, in this blog let’s use the Reinforcement Learning to tackle BipedalWalker-v2 task. Let’s understand the task first, as per Gym documentation: Reward is given for moving forward, total 300+ points up to the far end. If the robot falls, it gets -100. Applying motor…
Read more


January 15, 2019 0

DAY 64-100 DAYS MLCODE: RL-Cart-Pole Task

In the last few blogs, we discussed Reinforcement learning and example, in this blog we discuss another RL-Cart-Pole Task. We’ll use the Open Ai Gym environment to create Cart-Pole environment and train our agent for Cart-Pole Task. Cart-Pole Task As per Open Ai Gym documentation, here is the Cart Pole Task: A pole is attached by an…
Read more


January 13, 2019 0

DAY 63-100 DAYS MLCODE: RL Example

In the last few blogs, we discussed Reinforcement learning and example, in this blog we discuss another RL example – Tic Tac game. Tic Tac Game – RL Example We all know the Tic tac game, two players take turns playing on a three-by-three board. One player plays Xs and the other Os until one…
Read more


January 13, 2019 0

DAY 62-100 DAYS MLCODE: MULTI-ARMED BANDIT PROBLEM – Part 2

In the previous blog, we discussed a simple armed bandit problem, in this blog, we’ll discuss the multi-armed Bandit problem. In the previous example, we did not consider the states of the bandit and we relied only on action. In this blog, we’ll try to solve the problem which neither fully RL problem or fully…
Read more


January 11, 2019 0

DAY 61-100 DAYS MLCODE: Multi-Armed Bandit Problem

In the previous blogs, we discussed Reinforcement Learning, in this blog we will try to solve the Multi-Armed Bandit Problem using reinforcement learning technique. Multi-armed Bandit problem is classic RL problem and it check the allocation of resource the maximize the gain. Our goal for the multi-armed bandit problem is to have a such strategy…
Read more


January 10, 2019 0

DAY 60-100 DAYS MLCODE: Markov Decision Process

In the previous blog, we discussed the REINFORCE algorithm, in this blog we’ll discuss Markov Decision Process. This will help us to understand other algorithms where Gradient Policy algorithm itself try to optimize the policy to maximize the reward. Markov Chain Markov process is named after the Russian Mathematician Andrey Markov. It is a stochastic process that…
Read more


January 9, 2019 0

DAY 58-100 DAYS MLCODE: RL Part 2

In the previous blog, we created a simple example of reinforcement learning using a simple policy, in this blog, we’ll use a neural network to decide the action. Since the frozen lake was having a shape of 4*4 that means the agent can be at one space at any time. That means out input will…
Read more


January 7, 2019 0

DAY 57-100 DAYS MLCODE: REINFORCEMENT LEARNING

In the previous few blogs, we discussed Autoencoders, now we’ll start working on Reinforcement Learning. As per Wikipedia, Reinforcement Learning is : An area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward Wikipedia Reinforcement Learning falls in between the supervise learning (where we have labeled data) and unsupervised…
Read more


January 6, 2019 0