makes this important and obvious point: if I create an AI that is showing me pictures, videos and descriptions of possible worlds, asking me to rank them according to which world I value more, I don’t want the AI to optimize for my estimate of the value - I want it to optimize towards the real value of a world.
E.g., I want the AI to optimize for the human’s true utility of not my expectation of it.
I think this largely connects to the problem of “value inference”.