In standard reinforcement learning, the agent is provided with an explicit reward function, which is a mathematical formula that tells the agent the value of every state or action it takes. The agent’s objective is to learn a policy, or a strategy for behavior, that maximizes the cumulative sum of these rewards over time. In contrast, inverse reinfo....
Log in to view the answer