Classic Computer Magazine Archive START VOL. 1 NO. 1 / SUMMER 1986

PROCEDURES
AL
And C
Routines...

Fast Memory Manipulations and More

BY DAN MATEJKA

These two AL routines, plus a full, C "Help" text window module will speed your programs and make them more professional. Both AL routines-very fast implementations of a standard memory move and memory initialization-are accessed as functions from your main C program. The C text window module provides a standardized method of displaying text to screen or printer with auto-word-wrap. All programs may be found on the START disk within the folder labelled ROUTINES.

When Stanley Crane and I were writing DB Master One, the database bundled with the ST last Christmas, we found frequent need for wholesale memory moves and initializations. DB Master One is a RAM-based system which loads the entire file from disk once each session, does its work all in memory, then writes it all hack to disk. Necessary-but boring-operations like memory moves should he completed as quickly as possible. Straight C wasn't speedy enough, so we wrote the code ourselves to handle these situations.

In developing Disk Doctor for Antic publishing, I needed a friendly method of displaying "help" information on the screen. Since this would be a useful-and reusable-routine, I wrote a stand-alone C module for the purpose.

This article describes two assembly language routines-accessed as linked C functions-which provide a fast block memory move and an equally fast memory initialization. Following this is an analysis and description of the C printout module, PRINTOUT, which reads standard text files from disk and displays them to either screen or printer, left-justified and with word wrap. PRINTOUT will also provide an example of practical usage of the two AL routines.

MEMORY ALTERATIONS, WHILE YOU WAIT

The two memory operations we're going to talk about are setmem() and movmem(). They can be found as 68000 assembly language source code in the file MEMOPS.S, on your START disk.

setmem() initializes memory to a chosen byte and movmem() grabs memory and copies it to some other location. Our version of these two functions is five times faster than their Alcyon C counterparts. This can make all the difference in the world to the person sitting in front of the terminal watching, for example, a scrolling screen. Screen scrolling is most painlessly accomplished by grabbing memory corresponding to each line of interest and putting it somewhere else.

Both functions were designed to be called from a C host- specifically a C host compiled by Digital Research's C compiler (the Alcyon compiler included in the developer's toolkit from Atari). This compiler allows unrestricted use of registers a0-a2 and d0-d2 within any procedure. movmem() and setmem() use no other registers, so no registers are pushed onto the stack for later retrieval. This compiler also uses two-byte integers. Pascal programmers will note the parameters are not cleaned off the stack at procedure's end, either.

If you are using the Alcyon compiler, all you need do to use these procedures is assemble the included file and link it with its host. Other compilers will doubtless have different linking and register usage conventions.

The complete C protocol for using these procedures is:

movmem(source ,destination,count)
    char * source, * destination;
    int Count;

setmem(destination,count,value)
    char *destination;
    mt count,value;

For movmem(), count bytes are moved from source to destination. For setmem(), count repetitions of value are placed at destination. Note that value is a byte, not a word, value. For example, given a declaration like:

char stuff[100];

the call setmem(stuff,sizeof(stuff),'a') sets every element of stuff to 'a's. The call movmem(&stuff[10] ,&stuff[ 6] ,20) takes elements 10 through 29 of stuff and copies them to elements 6 through 25.

More or less equivalent C source code looks like this:

/* initialize memory */
setmem(d,c,v)
    register char *d;
    register int c,v; {

    while (c--)
        *d++ = v
} /* end setmem */

/* move memory around */
movmem(s,d,c)
    register char *s,*d;
    register int c; {

    if(s > d)
        while (c--)
            *d + + = *s + +
 

    else {
        s + = c;
        d + = c;
        while (c--)
            *--d; = *--s;

    }
}/* end movmem */

Note that bad things happen if you send a negative value for count.

Setmem() is straightforward. Starting where it's told, it marches through memory, setting each consecutive location to the value it's told to use, until it's told to stop.

movmem() works similarly, but a twist is encountered if the source and destination areas overlap. For example, given the following call, in which we move from a low address to a high address:

movmem( &stuff[6], &stuff[10], 20 )

If you start with stuff[6], by the time you get to stuff[10] it has already been polluted by stuff[6]. Effectively, you will have copied the first four bytes five times.

To avoid this problem, when working with overlapping memory, you must manipulate the memory elements in reverse order of the move. In the above example, start with stuff[25] and finish with stuff[6]. movmem() is smart enough to handle this for you, so you do not have to think about it.

CLOCKING IT

Some time spent with the actual machine instructions generated from the above C code quickly shows the speed advantage of assembly language. The heart of the movmem() procedure,

while (c--)
    *d + + =s + +

is compiled by DRI's package into assembly code which looks like this:

* (line 7) - a5 is s, a4 is d, d7 is c

moveloop: move.b (a5)+,(a4)+ * *d++ = *s++
move d7,d0 * copy count
sub.w #1,d7 * decrement
* count
tst.w d0 * finished?
bne moveloop * nope

Because compilers are not as quick to grasp the big picture as people are, some extra steps are included. This fragment would be better written something like this:

bra movecheck * check for
* zero count
moveloop: move.b (a5)+ ,(a4)+ * *d++ = *s++
movecheck: dbf d7,moveloop * decrement
* count, continue

or, when possible, by longwords instead of bytes, like this:

bra movecheck
moveloop: move.1 (a5)+.(a4)+
movecheck: dbf d7,moveloop

A longword (four bytes) is the largest piece of memory the ST's processor can handle at once. Thus, it is the ST's most efficient chunk size for doing things like moving memory around. The Iongword moves in the above code fragments will move their four-byte chunks nearly three times faster than their byte-by-byte siblings.

The speed difference becomes telling when very large chunks of memory must be moved around. Using the above code fragments, the Atari 520ST moves a chunk of memory the size of its own screen at 150, 96 and 33 milliseconds. Those figures are not a straight 8 MHz multiplication of the number of clock cycles the MC68000 takes to perform the steps listed. Overhead slows the ST down to an apparent speed of about 7.3 MHz.

AND NOT ONLY THAT

Just a few more points before we plunge into our "printout" module.

Whenever possible, everything is done with longwords instead of the bytes the procedures are defined as using. "Whenever possible' means two conditions must be met: the move must be longer than four bytes, and it must involve even addresses. The MC68000 is incapable of doing longword memory I/O at an odd address. If the situation can't meet both conditions, or can't be rearranged to meet them, the moves must be done by the byte.

Thus, these procedures really do not represent the fastest way to do the job. They spend too much time setting themselves up and deciding the best way to go about their business. But they are completely general and work no matter what parameters they are given to work with-unless, of course, they are sent bogus addresses or negative counts or similarly impolite things. Once they do decide on the best way to go about their job, though, they do it as quickly as possible and are an excellent compromise between generality and performance. For an example of their usage, examine the code in PRINTOUT.C, which we talk about next.

CONFESSIONS OF A FILE PRINTER

PRINTOUT.C, on your START disk, is the source code for a self-contained, (nearly) stand-alone C module that reads standard text files from a disk and displays them to either a printer or the screen. The resulting display will be left-justified and word-wrapped on the right margin. It is completely independent of text size and Atari ST screen configuration. On the printer, PRINTOUT paginates when the page is full. On the screen, it begins at the top of its own window and continues to the bottom. When that bottom is reached, the window is scrolled and printing continues smoothly. The patient human watching all of this can pause or stop the display at any time.

PRINTOUT is a slightly modified version of a module in Antic's Disk Doctor. All printing in Disk Doctor is routed through this unit, but the modified version here is slanted toward displaying help files. All that need be done to read and show a help file is call showfile(), sending it the name of the help file it should show.

Again, PRINTOUT was developed using DRI's C Compiler. If you use some other compiler, beware that the Alcyon creature has some idiosyncrasies. .Paramount among these is that it does not exactly follow the C convention of interchangeable pointers and integers; integers being only two-byte entities.

PRINTOUT needs six things it does not define itself. These are: int schandle, the work screen handle returned by v_ opnvwk() at program initialization; int cellwidth, the cell width of the current text font and size; int cellheight, its cell height (see GEM's vst_height() for details on these); and ptsin[], a temporary int array which must be defined for any GEM application, anyway, so why not use it?

The fifth and sixth items are initializations, not variables. You must be sure the fill color is set to the background color, and text alignment must be set to bottom.

To provide an example of hooking PRINTOUT to an application, there is a simple shell program called PRTSHFLL.C on your START disk. PRTSHELL.C will open a GEM virtual workstation, take over the screen, arrange things the way PRINTOUT wants them, display a menu bar, then wait for menu messages. Two menus are included for your enjoyment. One is the standard Desk Menu, the other is a Help Menu.


Picking up the
screen at any arbitrary point on
a line and moving it somewhere
else presents a fairly ugly
problem.



The first item under the Help Menu is "Quit". The remaining two items are psuedo Help items titled "About Something" and 'About Something Else". They both call showfile() from PRINTOUT and display text from one of two files on the START disk (labeled FILE.ONE and FILE.TWO).

PRINTOUT FROM INSIDE OUT

With a handy listing in front of you, notice that gemdefs.h is included for its window manager #define's, and osbind.h for its library bindings. Note that PAGELENGTH and PAGEWIDTH are also #define's. They can just as easily be variables carried over from the main unit. They are used only when printing, not when displaying to the screen.

The first procedures of real interest are in the section headed "Printing Routines." The first of these is scroll(), which moves the entire display window up one line. It is the only abstruse algorithm in the entire module, and many allowances are made for it elsewhere. It is best explained by first examining the structure of the Atari ST's screen.

Complication number one is that three different screen configurations exist: low, medium and high-resolution. High-resolution screens, capable of only two colors, are mapped in memory very simply: bit by bit, a 0 in the screen memory means that pixel of the screen is background color. A 1 means it isn't background. At 640 pixels wide, each line of the screen is 640/8 or 80 bytes wide. Each consecutive line is stored consecutively in memory It's all easy enough, until you consider color screens.

Each pixel of a medium-resolution screen is capable of being one of four colors. In low-resolution, that's 16 colors. Strangely enough, that means each pixel of a medium-resolution screen is described by two bits of screen memory, and four bits in low-resolution. In fact, the screen is described in groups of 16 pixels. Each group of sixteen pixels is described by a group of two or four words of memory in medium of low-resolution. Specifically, the nth pixel is described by the nth bit of each word. In pictures, a memory map for a medium-resolution screen with pixels a through z looks like this:

---word one--- || ---word two--- || ---word three...
abcdefghijklmnopahcdefghijklmnopqrstuvwxyz...

Now, back to scroll(). Clearly, picking up the screen at any arbitrary point on a line and moving it somewhere else presents a fairly ugly problem. Picking it up at pixels which happen to fall on word boundaries is, however, fairly easy. For this reason, only a subset of the display window is actually scrolled. This is the part that begins at the leftmost part of the window that falls on a 16-pixel boundary and continues for the largest multiple of 16 pixels contained in the window. The scrolled portion of the window also omits a few lines on top and bottom for headers and messages and the like. Every screen line begins on a 16-pixel boundary, so vertical movement is not a problem. This is the origin of scrollrect[], a subset of the work area of the display window (windrect []). It was an intentional design consideration to use a subset of the window, rather than define the window itself to sit on a word boundary, because some border area was desired.

As mentioned before, a single line on a high-resolution screen is 80 bytes, or 40 words, wide. It turns out that both color screens are lines 80 words wide. Enter linesize.

To have the window show up anywhere on the screen, you need only have it show up somewhere else in the openwind() procedure. Since the scrolling area can then start nearly anywhere, scroll() needs to know how far from the beginning of the line (left edge of the screen) scrollrect[ ] is located. Enter linestart.

The window is also variable size, and a third thing scroll() needs to know is how wide scrollrect[] is, in bytes. Enter linewidth.

Note that linesize and linestart are defined as a number of words, while linewidth is a number of bytes. This is because the memory location of the screen is kept track of by an int pointer, and so offsets from it are done by the word. The movement of memory itself is accomplished by the earlier described movmem(), which works by the byte-so linewidth follows suit.

To calculate linewidth and linestart, the appropriate measurement in pixels is first divided by eight or sixteen, which converts to bytes and words, respectively. Each is then multiplied by the number of planes the current screen has: one for high-resolution, two for medium and four for low. scroll() accomplishes this with bit shifts instead of actual multiplication and division, because it is more convenient and ten times faster.

Once all this preliminary stuff is knocked out of the way, scroll() begins one line (dy pixels) from the top of the scroll area and, using sheer brute force (movmem()), grabs each line of the window and moves it up dy pixels.

EVERYTHING ELSE

The remainder of PRINTOUT is best understood by watching it print a hypothetical file. This is accomplished by calling showfile() which opens the file and then keeps track of its length. This is necessary because read() in some earlier versions of the operating system choked when asked to read more data than remained in a file. startprint() initializes all local variables that need initialization, and in general prepares the print session.

Our other previously described AL routine, setmem(), may be found in centerhdr(). This is a minor function which centers a string on the page. movmem() is used to move the string to the right. setmem() fills the leftmost-or first part of the string-with blanks.

Our file is now read sequentially and given to the mercies of printstr(), piece by piece. printstr() can alter the value of oktocontinue. Finally, endprint() releases all the memory that startprint() snatched and closes the window it opened. Other things are cleaned up, and the show is over.

printstr() does the printing. It takes a null-terminated string of arbitrary length and heritage and plasters it all over the printer or screen, as requested by startprint().

printstr() itself just searches for word boundaries and decides what will fit on each line of the display. All actual printing is done, one character at a time, by printchar().

Printchar() keeps track of the current column and line numbers, which are the globals curcol and curline. When curline gets too big it causes pagination. printehar() lets printstr() handle line breaks when curcol gets too big. If the printing destination chosen is the printer, printchar() calls the appropriate bios trap that tells the printer about it. If printing to the screen, however, it stores each character in a one-line buffer called prnbuffer, at the index pbindex. The buffer is only printed when a carriage return character is encountered. scpixel is an indicator of vertical position, like curline, but refers to the screen position instead of the line number. When scpixel gets too big, scrolling happens. scpixel is really used only for screen output and curline is really used only for printer output, since screen output isn't paginated, but it could be, and their uses aren't parallel; they are two separate variables.

As each newline is encountered, printchar() calls checkeydown(), which polls the keyboard. If checkeydown() finds a keystroke waiting patiently for attention, it eats it and pauses, stops or ignores it entirely depending on the key.

Pausing and stopping are accomplished through statusline(), which prints an appropriate message and waits for additional keystrokes to eat.

CRITIQUE AND OVERVIEW

PRINTOUT's most damning problem is that it is what mouse-and-menu programmers are fond of calling "modal." When printing, all control of the computer is taken away from the no doubt unappreciative human who paid for the thing. This is not so bad, but when the printing pauses, and the mouse returns from wherever it was and the computer is once more paying attention to the human, PRINTOUT is off in its own little world, waiting for one thing and one thing only: a keystroke telling it to continue. There is no accessing any menu or other control device on the screen.

The advantage corresponding to all this rampant modalism is that PRINTOUT is a self-contained module which works almost by itself with no supervision from the main one. Making PRINTOUT nonmodal would of course necessitate a very intimate intertwining of its code with the main module, and it could hardly be published in its current incarnation as generic printing code.