The phenomenon is called reward hacking. It occurs when a machine learning model finds a way to maximize its reward signal by exploiting unintended behaviors or flaws in the objective function, which is the mathematical formula used to measure the agent's performance. Because the agent is optimized to satisfy the numerical rew....
Log in to view the answer