Machine Language: Stack Tricks

MACHINE LANGUAGE

Jim Butterfield, Associate Editor

Stack Tricks

The 6502 stack sits quietly in page 1 (typically addresses $01FA down to about $0140) and works behind the scenes. If you call a subroutine using JSR, a couple of entries push their way onto the stack; they pop back off when RTS is used. Everything is tidied up, and we don't need to think about the stack workings most of the time.

Once in a while, however, we want to squeeze a little more performance out of the stack. We may read the stack pointer by transferring it to the X register with TSX, or even set it by transferring the other way with TXS. We may set up a dummy return address by pushing values to the stack before an RTS. Often such tricks are more trouble than they are worth, but sometimes they can be useful.

A Subroutine Limitation

An early 6502 text suggested that an easy way to pass data to a subroutine would be to place it on the stack. It can be done, but it's not easy; I tend to discourage this kind of coding for beginners.

Here's the problem: You take one or more values and place them on the stack using the PHA (PusH A) command, then call a subroutine. The idea is that the subroutine can simply pull these values from the stack with PLA (PuLl A) and use them, but that won't work. When the subroutine is called with JSR, the last two values placed on the stack are the subroutine return address (to be exact, the address minus 1). So the pull command gets, not the data, but the return address. Annoying.

There are a couple of ways around the problem, but they are clumsy. First, you can pull the return address (two bytes) from the stack and save them. Then the data bytes are pulled and saved. Finally the return address is recalled and put back on the stack. That's a lot of work. It would be easier to have the calling routine store the data somewhere.

The second method is a little more workable, but still clumsy. If the stack pointer is transferred to the X register with TSX, we may now look directly at the stack as it lies in page 1. An instruction such as LDA $0100, X would look at the stack memory area, but would miss the real stack: The effective address would be of the first "empty" stack location. We'll have to climb a little higher to see the "live" stack. For example, LDA $0101, X would look at the last item on the stack; LDA $0102, X would look at the previous item, and so on.

Back to our original problem. There's a byte of data on the stack, behind a subroutine call. We can read it with TSX followed by LDA $0103, X. But we can't remove it from the stack without setting up a loop to repack everything. We can also change this stack item with a STA command. When the subroutine returns, the main routine must pull the extra item back from the stack.

It's often more trouble than it's worth, but it does work. A small example will illustrate.

This routine prints a triangle of asterisk signs. There are better ways to do the job, but it does illustrate moderately advanced stack work.

033C A9 01	LDA #$01	;start count at 1
033E 48		PHA		;pass to the stack
033F 20 4B 03	JSR $034B	;call print subrtn
0342 68		PLA		;get back the count
0343 18		CLC 
0344 69 01	ADC #$01	;add one to count
0346 C9 10	CMP #$10	;stop at 16
0348 90 F4	BCC $033E	;else do it again
034A 60		RTS	
             ;SUBROUTINE TO CHECK STACK
034B BA		TSX		;get the pointer
034C BD 03 01	LDA $0103, X	;dig out the count
034F A8		TAY		;put it in Y
0350 A9 2A	LDA #$2A	;asterisk character
0352 20 D2 FF	JSR $FFD2	;print it
0355 88		DEY		;count down
0356 D0 FA	BNE $0352	;if more, go back
0358 A9 0D	LDA #$0D	;carriage return
035A 20 D2 FF	JSR $FFD2	;print it
035B 60		RTS		;quit

Call the above program from BASIC with SYS 828.

If you'd rather enter the program as BASIC DATA statements, the following program will do the job:

100 DATA 169, 1, 72, 32, 75, 3, 104, 24
110 DATA 105, 1, 201, 16, 144, 244, 96
120 DATA 186, 189, 3, 1, 168, 169, 42
130 DATA 32, 210, 255, 136, 208, 250
140 DATA 169, 13, 32, 210, 255, 96
200 FOR J = 828 TO 861
210 READ X
220 T = T + X
230 POKE J, X
240 NEXT J
250 IF T <> 3911 THEN STOP
260 SYS828

More Muscle

Perhaps a more useful task for the stack is to streamline frequently used subroutines. For example, if there's a popular subroutine that I call a dozen times or more, it will be in my interest to make the calling sequence as brief and easy as possible.

Here's a common one. I often need to print various messages, and expect to use a subroutine to do it. The normal calling sequence would be to load the address of the particular message into a couple of registers—say, A and Y—and then have the subroutine use this address to print the message. This means that the subroutine will have an overhead of two instructions: the LDA and LDY before the call. The overhead might in fact be greater: I might need to save previous values in A and Y in order to continue my program after the message is printed.

Suppose I could do this: just call the subroutine, and leave the message itself behind the calling routine. I could flag the end of the message text with a zero byte. Now, if I could make the subroutine smart enough to go after this message, I could save a lot of setup coding.

Not too hard. The subroutine would need to pull the return address from the stack and set it into an indirect address. The return address would need to be adjusted by a value of 1, since it has a built-in offset. Now the subroutine could walk through the message, printing out the characters as it found them. When it finds a zero, it's time to return; but we must adjust the return address so that we'll go to the address behind the message. All this takes a little careful work, but we can do it.

More Complex

Now let's make the task a little more complicated. Not only do we want our subroutine to print the message located behind the JSR instruction; we want it to do this without affecting any registers—A, X, or Y.

The natural thing to do is to push A, X, and Y to the stack, using the sequence PHA : TXA : PHA : TYA : PHA; just before we return, we'll pull everything back and restore the original register values. If we do this, however, we can't pull the return address from the stack, since it's buried beneath the new stuff we have just stacked. If we go this way, we must dig out the return address from midstack, using TSX and so on.

This kind of coding has been seen in various application programs; it's not new and revolutionary, just a little more careful work.

Commodore is using this technique for the first time in the ROM of its new computer series, the Commodore 16 and the Plus/4. You can track the coding in one of the machines by using the built-in machine language monitor. Start the disassembler at address $FBD8 with command DFBD8. You'll see code along the following lines:

Save all registers to the stack:

PHA: TYA : PHA : TXA : PHA

Copy the stack pointer, and adjust it to match the return address:

TSX : INX : INX : INX : INX

Copy the return address to zero page, so that it can be used as an indirect address:

LDA $0100, X : STA $BC : INX : LDA $0100, X : STA $BD

The indirect address in $BC and $BD is one too low, since a JSR return is offset by one. Add one to it:

BUMP INC $BC : BNE PASS : INC $BD

Get a character—it will come from behind the calling JSR instruction. If it's zero, we're finished and go to EXIT:

PASS LDY #$00
GETCH LDA ($BC), Y : BEQ EXIT

If it's not zero, print it; then go back to bump the address and get another one:

JSR $FFD2 : INY : BNE GETCH

Y will never reach 255 (no messages are that long), so the BNE is an "always" branch. If we reach EXIT, we must get the count of characters from Y:

EXIT TYA

Now we recompute the position of the return address in the stack:

TSX : INX : INX : INX : IMX

We add the count to the indirect address, and put the new return address directly into its place in the stack:

CLC : ADC $BC : STA $0100, X
LDA #$00 : ADC $BD : INX : STA $0100, X

And finally, we restore our three registers and return:

PLA : TAX : PLA : TAY : PLA : RTS

For many of us, this type of stack manipulation is overkill. It makes programs hard to disassemble for study purposes, and the memory saving on small programs is negligible. For that matter, what are you going to do with the few dozen bytes you save?

Nevertheless, it can be a great coding convenience to allow a programmer to simply "drop" his data in line with the coding. This can save extra coding for setup, extra labels—and possible mistakes.

And it can be satisfying and fun to know that you can get that extra ounce of control over the workings of your computer.