Classic Computer Magazine Archive ANTIC VOL. 3, NO. 1 / APRIL 1984

MATCHBOX TIC-TAC-TOE

An introduction to artificial intelligence

SYNOPSIS
This program serves as an introduction to the fundamentals of artificial intelligence. The program requires 16K for cassette and 24K with a disk system, and runs on all Atari computers.

This program is an experiment in artificial intelligence. It's designed to simulate learning, the mysterious process of discovering the correct and incorrect responses to a given situation. It will teach your Atari computer to play the ancient and noble game of tic-tac-toe. Your Atari will improve as it, plays until it will seem to be unbeatable. And believe me, it's an uncanny experience to watch your computer get better at the game with practice-it's almost as if the machine were developing a personality of its own.

MENACING ORIGINS

This program is based on original research performed in 196O by a English biologist named Donald Michie. Michie used 300 matchbox and beads of nine different colors to build his Matchbox Educable Naughts and Crosses Engine (MENACE). Each of the matchboxes had a unique tic-tac-toe pattern on its cover. Each colored bead represented one of the nine possible moves, or squares, in the tic-tac-toe grid. (As you know in toe there are a number of different Iegal moves that can be made in any given situation - and some moves are better than others.)

At the start of the experiment, he placed an equal number of colored beads for each legal move into each matchbox. Because beads were removed from the matchboxes as the game progressed, more beads of each color were available for early moves than for later ones. When it was the matchboxes' turn to "play," Michie would locate the matchbox whose pattern matched the existing game board. He would then shake the box and remove a colored bead. This procedure was intended to result in a random choice.

"TEACHING" A MATCHBOX

The color of the selected bead indicated what move the matchboxes "wanted" to make. The beads selected by this process were then saved until the end of the game. At that point, one of three things would happen. If the matchboxes had won the game, three beads of a chosen bead's color would be added to each matchbox from which a bead had been taken. If Michie won, the saved beads were permanently removed from their matchboxes. This "punished" the matchboxes for making bad moves. If a game ended in a draw, all beads were returned to their original places. In this way, MENACE tended to punish or cancel bad moves and to strongly reinforce good moves.

A COMPUTER VERSION

In Listing 1, the matchboxes in Michie's experiment are represented by the string MCHBOX$. Each "matchbox" takes up 19 bytes of the string. The first nine bytes are used to store possible tic-tac-toe patterns. The next nine bytes store counters that represent MENACE's colored beads. The last byte represents the chosen bead (by means of an index that points to the chosen counter).

The game board itself is stored in the BOARD$ string. Actually, eight boards are stored here. This was done because almost all of the board patterns have a number of mirror images. These are identical to the original pattern, except that the board is "rotated" into different positions (or mirror images). If mirror imaging were not taken into account, over 4500 possible patterns would have to be stored by the program, and the game's playing time would be prohibitively long. The ALT array shows how the game boards are rotated to accommodate the mirror-image representations.)

LEAVE IT TO THE COMPUTER

If you examine my program carefully, you'll notice that some of the 300 possible tic-tac-toe patterns are missing. Instead of including all possible patterns in the program, I decided to let the computer determine if a pattern is a new one. When it comes across a new pattern, it stores it in the MCHBOX$ array. The machine actually "learns" to recognize new patterns when they are presented. This part of the program is handled in lines 1250 through 1460. Let's examine these lines more closely.

LOOKING AT THE LEARNING CODE

Line 1250 checks to see if a move is the last move of the game; in this case, only one move is possible. This move is found in lines 1260 and 1270.

Line 1280 starts the search to find a matchbox that corresponds to the current board pattern. If a match is found, a jump to line 1380 is made, and the process of randomly choosing a move begins. Otherwise, a new pattern is added to the MCHBOX$ string by lines 1330 through 1370. Since the new pattern must match the board, we move directly to the random-move selection code in lines 1380 through 1410.

This is the point at which I discovered a most interesting phenomenon. I beat the computer 32 times in a row, which left the poor machine without any counters in a certain patern. Therefore, during the 33rd game it could not find a move other than zero, which is not allowed. So I added line 1420 to the program. This does what any self-respecting learning machine would do in a similar situation--it resets the program and starts over. When I tested this feature, I beat the machine 55 times in a row before it discovered the proper moves and finally won.

In line 1430, we save the computer's chosen move in the 19th position (byte) of the matchbox. Then we reverse the mirroring of the board (so we're looking at it right-side-up) an get back into the game itself at line 1110.

The rest of the program support these few lines of code, and prompt the human player to make his or he choices. Lines 1480 through 1620 check for a win, loss or draw, an return a number in WIN to indicat what was found. This logic is used only after a move has been made. When the game is over, we end up a line 590, which can then send us on to a number of places, depending who won the game. Line 760 adds two new "beads" to the matchbox when the computer wins. Line 860 takes away one bead if the computer loses. Line 630 doesn't change a thing.

HOW TO PLAY

To play this game, use either a joystick plugged into controller jack 1 or the [SELECT] and [START] keys. Either the joystick or the [SELECT] key will move the cursor on the game board to a new position. When the cursor is in the square you want, press either the fire button or the [START] key and your choice will recorded. ("X" always goes first.) When it's your turn, you can see what's in the MCHBOX$ array by pressing the [OPTION] key. It can be listed either to your printer or to the screen (if your printer is turned off). You can also save the patterns stored in the computer by pressing the [OPTION] key before a new game has started. This period is indicated by the "Who goes first?" message. In this case, you'll be asked for a device and file specification (without beginning or ending quotes). When the patterns have been saved, respond normally to the prompt message at the top of the screen.

GETTING THE MOST FROM THE PROGRAM

To get the most from this program, you should keep statistics. Make a chart of who wins and loses each game in sequence, and plot the results. You'll be able to see your computer improve with practice.

Don't try every conceivable combination on the machine at the outset. The learning process will proceed slowly; there is too much for the computer to "learn" all at once. I prefer to use one or two patterns consistently until the machine has figured out how to handle them. Only then do I move on to a new pattern. Using this technique, the effects of practice on the computer's ability ought to be obvious after 20 games or so. And let the computer go first. It seems to learn much faster that way.

Finally, remember that this is an experimental program. If you don't like the way the computer is learning, restart the program and try again. You can also change the reward and punishment values in lines 760 and 860, to see if the machine learns faster under a different set of conditions. Or you can alter the random-selection weighting factor to provide for greater differentiation between beads of different colors. These parameters are set in line 310.

This proaram can teach you (and your computer) quite a bit about artificial intelligence. Have fun, experiment, and then sit back and let your Atari amaze your friends and relatives with its new-found skills.

Joseph Hafner is an electronics design engineer with ten years of hardware and software experience on both large and small computers. He has also taught hardware and software design to beginners, using BASIC, FORTRAN and Assembler.

Listing: MATCHBOX.BAS Download