q learning

This note last modified September 1, 2024

Initialize a table with values mapping states and actions to rewards, take a random action with some learning rate discount, receive a reward and update the q table. Use bootstrapping?