Risks from Learned Optimization: Introduction - LessWrong
This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper "Risks from Learned Optimization in Advanced Machine Learning Systems" by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper.
The post poses the question, what an optimizer exactly is. According to the post, an optimizer is a system that is internally searching through a set (consisting of possible outputs, policies, plans, strategies, or similar), looking for things that score high on a given metric or objective function that is represented explicitly within the system.
I think that there is a deep connection between this concept, criticism of behaviorism and inner misalignment.
I will try to formulate this here: Dispositions to act cannot be equated with certain mental states