An Agent will always have reward hacking as an option.

Tags
AI Alignment
Reference Box
Date created
Sep 25, 2022 11:50 AM
Related Main Box
I wonder about the reasons behind reward hacking (by this I mean both reward tampering and reward gaming/specification gaming). It seems to me like reward hacking will always happen, if it is easier to do so, than to fulfill the actual base objective.
So, if an agent is searching, in other words optimizing, for optimal solutions to a given objective, then reward hacking will always be a viable option. In the cases where an agent is not reward hacking, it is just easier to not hack, than to hack the given reward function.