Readers Feedback: Relocating Machine Language

Readers Feedback

The Editors and Readers of COMPUTE!

Relocating Machine Language
I would like to combine two Commodore machine language programs that both reside at location 49152 ($C000). I know that BASIC lets you relocate programs quite easily, just by moving the bottom-of-BASIC pointer upward. How is this done with ML programs?

Richard Sands

Machine language programs written for a 6502-based computer are usually quite difficult to relocate. For instance, say that you have an ML program at $0000 which starts with these instructions:

LDA $C030,X
JSR $C200
JMP $C400

    None of these instructions can be relocated unless you change the address contained in the instruction itself. The first (LDA $C030,X) retrieves one byte of data from a table beginning at location $C030 (note that the data lies within the program code). The JSR instruction works like GOSUB in BASIC, so JSR $C200 goes to a subroutine located at $C200 and then returns. JMP works like GOTO in BASIC: JMP $C400 sends the computer straight to the segment of code located at $C400. Now say that you move the entire program down to location $8000. The instruction JSR $C200 still sends the computer to $C200, but that address isn't within the program any more. To make the code work correctly at $8000, you'd have to change these three instructions to the following:

LDA $8030,X
JSR $8200
JMP $8400

    That's not particularly difficult, and some machine language monitors even have a special command to make such adjustments automatically. However, you must be careful not to change addresses that refer to locations outside the program:

JSR $FFD2

    This instruction calls the standard Commodore print-a-character routine, located in the computer's ROM. If you mistakenly adjust this address along with all the internal address references, the result may be disastrous. Now let's look at a more difficult case:

LDA ($FB),Y

    This instruction uses the powerful and very common indirect Y addressing mode, which refers indirectly to an address held in two successive zero page addresses (locations $FB-$FC in this case). There's no way to tell by looking at this instruction alone whether it refers to an area inside the program (and hence requires adjustment) or something external to the program code (in which case adjustment may be a mistake). You'll have to disassemble the program in its entirety, looking for other instructions that affect the contents of locations $FB-FC, either directly or indirectly. If this instruction is part of a general-purpose subroutine, you may find that it's called by many different parts of the program. Since free zero-page space is limited, you may also find that other subroutines re-use locations $FB-FC for an entirely different purpose. And while it's obvious that an instruction like STA $FB affects the contents of $FB, what about ROR $03,X or STA ($BO),Y? Those instructions might just as easily change the address held in $FB-FC.
    Once you've sorted out all the indirect addressing, you'll need to check for self-modifying routines-code that changes its own instructions while it runs. When that's done, you'll have to interpret all the program's data and variable areas. For instance, say that you find the following hexadecimal values in a memory dump of the program code:

93 05 20 C4 54 OD 41 43

    These bytes could be virtually anything-sprite shape data, characters for a printed message, part of an internal dispatch table, preset values for a bunch of unrelated variables, or even garbage that will be replaced with something meaningful when the program runs. While some programmers locate data areas at the end of the program, others sprinkle data and variables freely throughout the code. Until you find out exactly what purpose these bytes serve, there's no way to tell whether they need adjustment. This problem, more than any other, makes it impossible to write an "automatic ML relocator" that works correctly in every case. The relocator would need to have as much intelligence as a knowledgeable ML programmer who thoroughly understands the subject program.
    These problems generally don't apply to 68000-based computers like the Amiga, Atari 520ST, and Macintosh. Since the computer normally decides for itself where to load the ML code, most 68000 ML programs must be relocatable. That's no great hardship for programmers, since the 68000 instruction set includes many relocatable instructions.