Teams of agents, defined as agents operating in the same environment with identical utility functions, are typically developed in a planned, coordinated fashion. However, such coordinated development is not always possible. Rather, as deployed agents become more common in robotics, e-commerce, and other settings, there are increasing opportunities for previously unacquainted agents to cooperate in ad hoc team settings. In such scenarios, it is useful for individual agents to be able to collaborate with a wide variety of possible teammates under the philosophy that not all agents are fully rational. This talk considers an agent that is to interact repeatedly with a teammate that will adapt to this interaction in a particular suboptimal, but natural way.
We formalize this "ad hoc team" framework in two ways. First, in a fully cooperative normal form game-theoretic setting, we provide and analyze a fully-implemented algorithm for finding optimal action sequences, prove some theoretical results pertaining to the lengths of these action sequences, and provide empirical results pertaining to the prevalence of our problem of interest in random interaction settings.
Second, we consider a cooperative k-armed bandit in which cooperating agents have access to different actions (arms). In this setting we prove some theoretical results pertaining to which actions are potentially optimal, provide a fully-implemented algorithm for finding such optimal actions, and provide empirical results.