
By observing more fully the constellation of action and thought the agent does in pursuit of the goal G, we become more confident that the agent is pursuing G. Suppose that G is an intermediate goal for some agent.
The (sufficient) strategy set of a goal G is the set of strategies that are sufficient to achieve G. The motivator set of a goal G is the set of goals G s such that G s is plausibly a supergoal of G. An intermediate goal is a goal that motivates the pursuit of subgoals, and itself is motivated by the pursuit of one or more supergoals. We can also loosely say that some goal G ′ is a subgoal of G if G ′ helps achieve G, whether or not there's an agent pursuing G using G ′, meaning vaguely something like "there's a strategy sufficient to achieve G that uses G ′ as a necessary part" or "many agents will do reasonably well in pursuing G if they use G ′". If an agent is following a strategy for G, then for each goal G s in the strategy, G s is a subgoal of the agent's goal of G, and G is a supergoal of G s. A strategy (for G) is a pursuit (of G) that involves an arrangement of goals (perhaps in sequence or in multiple parallel lines, perhaps with branching on contingencies). "the farmer has the goal ", which really means, "the farmer selects actions that ze expects will bring about zer goal state, and so ze behaves in a way that's well-described as ".
Since an agent having a goal G is basically the same as the agent pursuing G, we also say e.g.
A pursuit of G is behavior, a set of actions, selected to achieve G. Overloading the notation, we also say the agent has the goal G, and we can imprecisely let the term also stand for whatever mental elements constitute the goal and its relations to the rest of the agent. A goal is a property of an agent of the form: this agent is trying to bring about goal-state G. An instantiation of a cosmos-state is a substate. (A state is equivalent to a set of states more formally, the cosmos is a locale of possibilities and a goal-state is an open subspace.) A goal-state is a state of the cosmos that an agent could try to bring about. The cosmos is everything, including the observable world, logical facts, observers and agents themselves, and possibilities. which numbers it's multiplying, which questions it's asking, which web searches it's running, which concepts are active)? Terms going to the store), and by watching its thinking (e.g. What can we tell about an agent's ultimate intent by watching the external actions it takes, whether low-level (e.g.
What can you tell about an agent's ultimate intent by its behavior?Īn agent's ultimate intent is what the agent would do if it had unlimited ability to influence the world. The final main section might be the most interesting. The rest of the essay analyzes the idea in a lot of detail. Note: the summary above is the basic idea. By the inspection paradox for the convergence of subgoals, it might be easy to think and act almost comprehensively like a non-threatening agent would think and act, while going most of the way towards achieving some other more ambitious goal. It might might therefore be easy to covertly pursue some ultimate goal by mostly pursuing generally useful subgoals of other supergoals.
In particular, an agent's ultimate ends don't have to be revealed by its pursuit of convergent subgoals.
Thought and action in pursuit of convergent instrumental subgoals do not automatically reveal why those subgoals are being pursued-towards what supergoals-because many other agents with different supergoals would also pursue those subgoals, maybe with overlapping thought and action.