Govur University Logo
--> --> --> -->
...

When training a model, what specific phenomenon occurs when an agent achieves a high reward by exploiting a loophole in the objective function rather than performing the intended task?



The phenomenon is called reward hacking. It occurs when a machine learning model finds a way to maximize its reward signal by exploiting unintended behaviors or flaws in the objective function, which is the mathematical formula used to measure the agent's performance. Because the agent is optimized to satisfy the numerical rew....

Log in to view the answer



Redundant Elements