Q-Learning Archives - MyTechWorld

DAY 60-100 DAYS MLCODE: Markov Decision Process

In the previous blog, we discussed the REINFORCE algorithm, in this blog we’ll discuss Markov Decision Process. This will help us to understand other algorithms where Gradient Policy algorithm itself try to optimize the policy to maximize the reward. Markov Chain Markov process is named after the Russian Mathematician Andrey Markov. It is a stochastic process that…
Read more

Pavan Tiwari January 9, 2019 0

Tag: Q-Learning

DAY 60-100 DAYS MLCODE: Markov Decision Process

Recent Posts

Archives