q learning
Initialize a table with values mapping states and actions to rewards, take a random action with some learning rate discount, receive a reward and update the q table. Use bootstrapping?
Initialize a table with values mapping states and actions to rewards, take a random action with some learning rate discount, receive a reward and update the q table. Use bootstrapping?