q learning

This note last modified March 18, 2021

Initialize a table with values mapping states and actions to rewards, take a random action with some learning rate discount, receive a reward and update the q table. Use bootstrapping?