Background
I am trying to write a simulator where multiple AI agents are competing and/or collaborating to achieve the goal of maximizing some utility function.
Each agent has the ability to interact with the world where it might alter the state of the environment, based on some actions it does. And as a result of such actions, a reward signal is transmitted from the environment to the actor (agent implementing the action).
Some agents are designed spectate other agents' actions and rewards, so it would not have to suffer all consequences while it learns optimal moves.
What I did initially, is defining the following methods on the environment class:
- Interact(action, actor) that return a tuple of both reward signal and new state
- GetState() returns current state
- Spectate() returns a collection of what happened in terms of actor, action, original state, new state, reward obtained.
But this seems to complicate my design and prevent me from scaling the system afterwards.
I was seeking some general way for different agents and environment(s) to interact without explicitly calling methods of certain type or sending out an identifier to the actor.
Proposed Solution
So I thought of having some mailing system, where an actor (agent) send a message to the environment through a mailbox, and the environment would read the incoming message, interact with it, and return a message to sender (the actor).
Meanwhile, curious agents would read a copy of the returned message that is published for whomever is interested.
This might sound like an observer pattern, except that each agent and the environment(s) are eligible for observing interactions, both for direct interaction and for learning from others' mistakes.
That means, notifications are bidirectional, so it will be an overhead for each object to maintain a list of subscribers to notify when some event occurs. Also, since this is an AI simulation, some processes might be stochastic, e.g. spectating might not be 100% of the time for curious agents.
So we have multiple client classes (not sharing the same super-class) that are capable of messaging one another via what is similar to an Enterprise Service Bus
And what I called PostOffice, would have a Factory method, spawning a mailbox object for each object attempting to message other objects.
So whenever a client object attempts to mail some other object, they would lookup some sort of a directory method, and send a message to the identified mailbox address through the associated mailbox object.
Mailbox object in turn, will notify the post office object, that will forward the message to the receiver mailbox, that will hold this message until the receiver client checks for a message and reads it.
It is some sort of message queuing but at the object level not at an enterprise level
Question
- Is there such design pattern?
- Are there drawbacks for such approach?