INSIGHT: Atari: Atari Character Codes

INSIGHT: Atari

Bill Wilkinson

Atari Character Codes

Last month's discussion about where and how to place things in memory served as a good lead-in to this month's topic: character codes. If you've read the heftier reference material, including COMPUTE! Book's Mapping the Atari, you may have discovered that your eight-bit Atari computer actually uses three different types of codes to represent the various characters (letters, numbers, punctuation, graphics symbols) it works with. All of these codes assign a unique number to represent each character, but the three codes are incompatible with each other because they use different numbering schemes.

The most commonly encountered code is called ATASCH, which stands for ATari-version American Standard Code for Information Interchange. Except for the so-called control characters—such as carriage return, tab, and so on—ATASCII is compatible with standard ASCII. (Why Atari chose to modify the standard is anyone's guess.) ATASCII is the character code used by PRINT, INPUT, CHR$(), ASC(), and most external devices such as printers and modems.

For example, in ATASCII (and ASCII), the code for uppercase A is 65. You can verify this in BASIC:

PRINT CHR$(65)

PRINT ASCC"A")

Virtually every Atari BASIC book (even Atari's own) shows the character represented by each ATASCII code. You can also run Program 1 below to display each character and its code. (Press CTRL-1 to pause and continue the display.)

Screen Codes

The second character code found in your Atari is the keyboard code. The keyboard code for any character is actually the value read from a hardware register in memory when the key for that character on the keyboard is pressed. Program 2 below lets you find the keyboard code for any character. Just for fun, try some of the keys or key combinations which don't normally produce characters, such as CTRL-SHIFT-CAPS). Neat, huh?

Finally: screen codes. This term refers to the byte value you must store in memory to display the desired character on the screen. "What?" you ask, "How do those differ from the ATASCII codes?" After all, to put the string BANANA PICKLE PUDDING on the screen, all it takes is a simple BASIC statement:

PRINT "BANANA PICKLE PUDDING"

And besides, aren't the characters in quotes supposed to be ATASCII codes? Good questions. Now for some complicated answers.

Actually, if the original Atari designers had thought just a little harder and added just a few more logic gates to the thousands already in the ANTIC and GTIA chips, ATASCII and screen codes could have been one and the same. It's similar to the mistake of making ATASCII incompatible with ASCII. Sigh. But we're stuck with what we've got, so let's figure out how it works.

For starters, consider GRAPHICS 1 and GRAPHICS 2, the large-size character modes. You may have noticed that in either of these modes you can display only 64 different characters on the screen. Now, if you recall last month's demo programs, note that we can specify the base address of the character set. That is, we can tell ANTIC where the character set starts by changing the contents of memory location 756 (which is actually a shadow register of the hardware location which does the work—see Mapping the Atari for more on this).

In a sense, the ANTIC chip is fairly simplistic. When it finds a byte in memory which is supposed to represent a character on the screen, it simply adds the value of that byte (multiplied times eight, because there are eight bytes in the displayable form of a character) to the character set base address. This points to the memory address for that particular character. Except… well, let's get to that in a moment.

Exception To The Rule

Because we want GRAPHICS 1 and 2 (with their limited sets of 64 different characters) to display numbers and uppercase letters (omitting lowercase letters and graphics), for these two modes it makes sense that the character set starts with the dot representation of the space character and ends with the underline—codes 32 through 95, respectively.

But why are these 64 characters the only ones available in GRAPHICS 1 or 2? Because the upper two bits of a screen byte in these modes are interpreted as color information, not as part of the character (see the modification to Program 3 below). So only the lower six bits choose a character from the character set. Six bits can represent only 64 possible combinations, which is why these modes can display only 64 characters. Bit pattern 000000 becomes a space, 100101 is an E, and 111111 becomes an underline, and so on.

When you use GRAPHICS 0 (normal text), however, there is a strange side effect. In this mode, only the single upper bit is the color bit (actually, it's the inverse video bit). This leaves 7 bits to represent a character, so we can have values from 0 to 127 decimal (0000000 to 1111111 binary, $00 to $7F hex). Again, this value—after being multiplied by eight—is added to the value of the character set base address. But which numbers in that 0 to 127 range represent which characters?

Well, we already know what the first 64 characters are—since the Atari's hardware limitations dictate that they must be the same as in modes 1 and 2. So the next 64 are the other characters. Program 3 illustrates how the ATASCII character set is linked to the screen set. Note how all the characters are presented twice, once in screen code (i.e., character ROM) order and once in ATASCII order. For some additional fun and info on modes 1 and 2, change line 10 to GRAPHICS 1. (Do not change it to GRAPHICS 2 unless you put a STOP in line 65 after the first FOR-NEXT loop.) Do you see what I mean about the upper two bits being color information?

Now you know why there are three different character codes used in your computer. How can you take advantage of this information? Well, if you combine this knowledge with the programs I presented last month, you could invent your own character set and design a word processor for some foreign language. (If you come up with a good Cyrillic character set, let me know.)

Actually, if you own an XL or XE machine, you have a second character set already built in. Just add this line to Program 3:

20 POKE 756,204

This tells the operating system and ANTIC that the base of the character set is at $CC00, which is where the international character set resides. Someday you might find some use for these characters. How will you know until you try?

For Instructions on entering these listings, please refer to "COMPUTEI's Guide to Typing In Programs" In this Issue of COMPUTE!.

Program 1: ATASCII Codes

OC 10 SRAPHICS 0
HK 20 FOR 1-0 TO 235: PRINT I
ID 30 IF I=155 THEN PRINT " [RETURN]"GOTO 50
1C 40 PRINT CHR$(27); CHR$(I)
ON 50 NEXT I
HH 60 REM USE CONTROL-1 TO PAUSE

Program 2: Keyboard Codes

BC 10 DIM HEX$(16) :HEX$="0123456789ABCDEF"
PK 20 POKE 764,255
LL 30 KEYC0DE=PEEK(764)
BC 40 IF KEYCODE-255 THEN 30
HF 50 HI=INT(KEYCODE/16):LOW=KEYCODE-16*HI
LJ 60 PRINT "KEYCODE: HEX $";
OM 70 PRINT HEX$(HI+1,HI+1); HEX$(LOW+1,LOW+1);
ID 80 PRINT ", DECIMAL " ; KEY CODE
AE 90 BOTO 20

Program 3: Screen Codes

OC 10 BRAPHICS 0
BP 30 SCREEN=PEEK (88) +256*PEEK(89)
GB 40 REM FIRST: SCREEN CODE ORDER
EB 50 FOR C=0 TO 255: POKE SCREEN+C,C
0I 60 NEXT C
CO 70 REM THEN:{3 PACES}ATASCII ORDER
HA 80 SCR2=SCREEN+40*8
BH 90 FOR C=0 TO 255iCHAR=C
MP 100 IF C>127 THEN CHAR=C-127
EK 110 IF CHAR<32 THEN CHAR=C+64:G0T0 140
HB 120 IF CHAR>95 THEN CHAR=C;GOTO 140
HE 130 CHAR=C-32
JG 140 POKE SCR2+C.CHAR
BI 150 NEXT C
BH 999 GOTO 999:REM WAIT FOR BREAK KEY