stop button problem

This note last modified September 1, 2024

AI won’t let you hit the stop button
- Add “allow stop to utility function”
AI hits its own stop button
- “tell it it can’t hit the stop button”
AI makes you hit the stop button OR convinces you that it is safe in times where it’s in a testing environment
Make the AI just not care about the button
- AI makes a subagent that isn’t bound by the same safety principles.
Keep the button a secret from the agent
- The AI would figure it out eventually.