Pandamat

Controlling an Animat with Pandemonium

Introduction

This paper is about a project called "Pandamat." It uses John V. Jackson's Pandemonium system as a control structure for what Stewart W. Wilson called an "animat." Here I will describe the specifications for the animat, then I will present the Pandemonium concept and the design decisions that went into this particular implementation. Then I will give the results of the experiments and ideas for further improvement of the project. A sketchy image of an animat being controlled by Pandemonium

The Animat

An animat, is an artificial animal that exhibits learning behavior by adapting to its environment. Wilson defined a particular animat's environment, sensory channels, and repertoire of actions, which have been borrowed directly for this project, as follows (with certain omissions):

A rectangle on the computer terminal screen 18 rows by 58 columns and continued toroidally at its edges defines the environmental space. Alphanumeric characters at various positions represent objects; the animat itself is denoted by *. Some, possible many, positions are just blank.

* has been given the ability to pick up sensory signals from objects which happen to be one step (row and/or column) away, in any of the eight (including diagonal) directions; nothing is detected from more distant objects. Thus the "sense vector" has eight positions. With * located, for example, as shown below left, the sense vector would be as shown at the right:

TT
*F     TTFbbbbb,

where b stands for blank. To form the sense vector, the circle of positions surrounding * is mapped, clockwise starting at 12 o'clock, into a left-to-right string.

*'s actions are restricted to single-step moves in each of the eight directions. The directions are numbered 0-7 starting at 12 o'clock and proceeding clockwise; for example, a move in direction 3 would be south-easterly.

The animat may move, or attempt to move, to a position occupied by an object. The environment's response for each kind of object is predefined.

The WOODS7 environment.  An arrangement of F's each connected to two T's, with one *

Traditionally, the T designates a tree, or an unoccupiable position. Any attempt * makes to move to such a spot will be denied and the environment's response, called the gain, will be negative. The F designates food, or the objective. When * moves to a spot with an F the gain will be positive and * will be moved to some random blank position.

A "problem" in this model involves starting * at some random blank position and counting the number of actions it takes before it moves to a spot with an F. A large number of steps taken (e.g. 358) is considered bad, while a small number of steps (e.g. 1) is considered good. According to Wilson, the average best time is 2.2 steps in the environment "WOODS7" which was used to test Pandamat.

Wilson's approach (which is not used here) to controlling * involves a classifier system, which is a list of partially complete sensor vectors and actions associated with them. These association rules are scored according to their success and this score is used to control the creation of new rules and the deletion of old ones by a genetic algorithm.

Pandemonium

The control structure that has been used for this project, called Pandemonium, was designed by John Jackson to mimic the Pandemonium theory of perception:

It uses "demons," each of which is a kind of rule which responds immediately as it is stimulated. The Pandemonium theory suggests that we identify an object by applying its component details to a crowd of demons which "shout" to a degree determined by how well they match their input.

If we extend the Pandemonium theory beyond perception, we can envision a system consisting almost entirely of demons, each occasionally shouting. Some are involved with external perception, some cause external actions, and some act purely internally on other demons. Since the mind seems able to concentrate on a small number of things at once, we might consider the mass of demons to be the crowd in a stadium, and the selected few at any time down in the arena causing the crowd to shout.

At any moment the demon from the crowd shouting the loudest is selected to take its place in the arena; one already there being displaced, returning to the crowd.

Remembering from behaviorism that we want to vary the strength of links between entities, we will try to strengthen links between demons in the arena proportionally to the time they have been in it together. Since we want this strengthening to depend on the motivational levels, we could turn up the "gain" when things were going well, and turn it down if things got worse.

Jackson then goes on to describe a large number of modifications that could be made to this model.

Note how the three main ingredients, sensory input from the environment, actions taken in the environment, and the gain, or the environmental response, occur identically in Pandemonium and the classifier system discussed earlier. It is this similarity that allows the classifier system to be replaced so easily by Pandemonium, as the control structure for the animat. It is also a testimony to the naturalness of this model.

Pandamat

The environment used in Pandamat was exactly that of * described earlier. The environment returns a vector of eight characters describing the immediate surroundings as T, F, or blank, and it also returns a gain. The gain is one of three values depending on whether * tried to occupy a T, F, or blank. The control structure takes the sense vector and gain as input and generates an action, which is simply an integer indicating a direction of motion: 0-7.

The design decisions for Pandemonium as implemented in Pandamat were the simplest possible. There are three types of demons: sensory, action, and other. Exactly eight of twenty-four sensory demons occupy the arena at one time, corresponding to the eight directions in the sense vector, each capable of three possible values. One of eight action demons occupy the arena at one time, and it always corresponds to the action that is taken. The other demons are fixed at some constant number and are intended to increase the complexity of the associations among the action and sensory demons.

Here is the algorithm:

  1. Bring the loudest sensory demons into the arena.
  2. Bring the loudest other demons into the arena.
  3. Bring the loudest action demon into the arena.
  4. Take the action.
  5. Based on the gain returned from the environment, update the strengths of the connections between all demons in the arena.

If positive feedback from the environment is received then the connections between two demons in the arena are increased. If negative feedback results then the connections are decreased.

The initial connection strengths between the demons are set to small random values. Note that this is the only randomness in the system (other than the placement of * after eating an F). The selection of actions and the updating of connections are completely deterministic on any given problem, though the actions may appear random.

Experimental Results

I was delighted to discover that Pandamat actually learned the first time I got it to run, although its initial performance was not good. (Averaging between 14 and 18 steps per problem, compared with the 4 or 5 steps achieved by Wilson's classifier system.) But rather than making any fundamental changes, drastic improvements were made by simply adjusting a few internal constants.

The least important of these constants was the number of non-sensory, non-action demons, both in the arena, and outside. They seemed to have no influence on the behavior of the system in terms of learning rate or quality. They did, however, greatly influence the actual speed of the system so I decided to reduce them to the smallest possible number without changing the program logic.

The part of the program that made all of the difference was the ratio of values returned in the gain. Remember that there are three possible values, one for each of the three objects that could be in the destination spot: a T, an F, and a blank.

If a very large positive value were returned for an F, then the system would become persistent in going in that direction, neglecting the others. Usually some small number of directions would win out, and most possibilities would remain unexplored. For example, if * went north-east and ate an F, then a large positive gain might cause it bump enthusiastically into a T that was north-east on the next problem. This situation might continue for some time.

The same is true for large negative values returned for hitting a T. If * hit a T to the east and received a large negative value, then it would never, for the life of the system, try to take an F that lied to the east. This could have horrible consequences on the programs ability to learn. In some runs * would only take an F if it occurred in a particular direction, and would ignore it everywhere else.

Large negative values for blanks caused a similar problem: that would prevent the * from learning not to walk into T's. Positive or zero values for blanks, on the other hand, would cause * to walk endlessly in a straight line, and in the toroidial nature of WOODS7 there are, in fact, places where this could continue forever. Also, even small positive values for blanks would soon overwhelm the F's.

Experimenting with the system, I found these gain values to be about optimum: blank, -0.25; T, -1.0; F, 2.25. With these values the system quickly averages between 5 and 7 steps per problem. * learns to always take the F immediately if it can see it, and, surprisingly, it will even walk around groups of T's to find the F.

Since everything * can do, other than taking F's, gets a negative gain, it is impossible for it to get stuck in an endless cycle. It will forget everything it ever knew rather than continuously walk into a T or go in a endless straight line. This does have some negative side-effects, however. First, once the system learns its performance oscillates. It will go back and forth between better and worse as the system evolves. Also, it will sometimes walk in random directions when drifting in a single direction is preferable. Finally, since the weights are eternally and consistently subtracted from or added to, they will, in some distant future eon, go to some unrepresentable values. This should not ever happen in practice, but it still bugs me.

But how does Pandemonium compare with the classifier systems used before? It seems that the performance Pandamat has so far achieved does not quite measure up to what has been done in the past. Pandamat seems to be slower, on average, by one or two steps. But there are several other factors to keep in mind:

Possibilities

The classifier system took some tinkering to get it to work well. Here are a few ideas for improving Pandamat's performance:

References

Jackson, John V. Idea For A Mind. SIGART Newsletter, July 1987, Number 101 pp. 23-26

Wilson, Stewart W. The Animat Path to AI. From Animals to Animats: Proceedings of the First International Conference on the Simulation of Adaptive Behavior, Cambridge, Massachusetts: The MIT Press/Bradford Books

Wilson, Stewart W. Knowledge Growth in an Artificial Animal.

Back to introduction
Back to homepage


JEFF
    Jeff's Homepage

Modified 5/25/98

Copyright © 1998 Jeff Whitledge