Valueing internal states versus objects

Human values
AI Alignment
Reference Box
Date created
Sep 21, 2022 07:24 PM
Related Main Box
I made the point that we have to draw a line between high-level and low-level objects, when defining what we value. E.g. we value the composition of molecules called chair, not the exact positions of molecules in a chair. This seems easier to abstract - there appears to be a middleground between the molecules in a chair and an instance of a chair that we want to abstract. This might be a fallible abstraction, meaning that it isn’t always useful in predictions (and it doesn’t have to be!).
But when we talk about valueing a smile instead of a muscle spasm that appears to be a smile, it seems harder to actually define what we mean. We want the AI to approximate the concept of “a smile” and in order to do that accurately, we want it to recognize the internal state that comes with “a smile”, being happy. We don’t want the AI to take the literal behaviour as the measure.