COMPUTE! ISSUE 58 / MARCH 1985 / PAGE 97
Threading
Disassembler
Elmer N. Keil
This machine language disassembler
follows a program's branches and jumps, rather than taking a linear
path. Written in Microsoft BASIC for the Commodore PET, it will also
work without changes on the 64 or on a VIC (with at least 16K memory
expansion). With limited conversion, it should work on any 6502-based
computer.
Most assemblers and disassemblers proceed through a machine language
program in a linear fashion from the lowest to the highest address,
which is fine as long as the program contains few jumps and branches.
However, when trying to find your way through
complex routines such as the built-in ROM, a linear disassembler is
almost useless. For example, the warm start entry point sets a flag in
page zero; loads the accumulator and Y register; and jumps (JSR) about
1700 bytes away, only to jump immediately to another location 2400
bytes away. It settles down for ten instructions before going into
several sets of compare-test-branch instructions which lead off in all
directions. It can be frustrating to list long routines, only to find
that the first instruction goes somewhere else. And trying to find your
way back after all this jumping can be a real challenge. One solution
is to use a threading disassembler,
which follows the execution thread as it weaves through various parts
of memory and keeps track of where to return after each jump.
An Efficient Structure
It is a common practice to place initialization at the end of a program
so that an interpretive BASIC will not have to continually search past
a block of one-time code. This program pushes that concept one step
further by placing the main loop at the beginning and then jumping far
back into the program and gradually working its way forward. This was
an experiment, and the results are not clearly evident. The program
stays ahead of my printer and can scroll listings off the CRT at a
moderate rate (use the RVS key on the PET, or CTRL key on the VIC and
64, to slow the scrolling).
The program starts by initializing some variables
and asking the user to select various options. Since this section is
only used once, it has been placed at the end of the program so the
main program loop will execute more efficiently.
The main loop gets an instruction, checks to see if
it changes the program flow, decodes and formats it for printing, and
then follows the flow. A dummy stack is maintained to keep track of the
return points in much the same manner as the hardware stack.
Once started, the program will loop continuously
until stopped. The loop contains a sequence which terminates the
program when the Q (quit) key is pressed.
User's Choice
It is impossible for the program to know how to interpret the
conditions associated with conditional branches. The program will
display the branch destination and ask the user if the branch should be
followed.
Although it is often possible to look at the
preceding instructions and determine what conditions should exist,
sometimes all you can do is take your best guess and see where it leads
you.
The Start-Up Routines
Initialization, called by the GOSUB at line 40, clears the screen and
homes the cursor for neatness. The pseudo stack pointer (SP), the stack
array (SS), and the pseudo program counter (PC) are allocated. Arrays
GO$, G1$, and GG$ are filled with the 6502 mnemonics (the mnemonic BAD
represents invalid opcodes). Variable TP is set to the highest
addressable memory location.
Since dividing by a power of two is the same as
shifting a binary number to the right, variables B3 and B6 are set up
to shift bit three or bit six respectively into the low order bit
position. LB is used to scale the left byte before adding the number to
the right byte when generating an address.
Hex Lookup Table
HX$ is a lookup table of valid hexadecimal numbers. Variable OP is set
to the screen device number, but it may be changed to the printer
device number depending on the answer to the first question, PRINTER
OUTPUT? OP is used only once in the opening of file number PR, and all
writes to the listing device are to file PR. (VIC users may want to
change some of the PRINT statements to better fit their 22-column
screens.)
The second question, TITLE?, lets you identify the
listing at some future time.
Although the program was intended as a threading
disassembler, it's possible to use it as a standard block disassembler,
depending on your answer to question three, BLOCK DISASSEMBLY?; the
program sets BD=0 for threading and BD =1 for block disassembly.
Select Decimal Mode
Normal input/output format for numbers is hexadecimal-you can select
decimal mode in answer to the fourth question, DECIMAL MODE?; the
program sets variable HX=1 for hexadecimal mode and HX=0 for decimal
mode.
And finally the last and most important question,
STARTING LOCATION?; respond with a decimal or hexadecimal number
according to the mode selected. The program then prints the remaining
header information and reminds you to press Q to quit at any time.
The main loop consists of lines 80-170. Line 80
looks for a keypress and will ignore anything other than a Q (including
no keypress); a Q will break the loop and terminate the program at line
970. Line 90 PEEKs a byte from the location pointed to by the pseudo
program counter (PC) and then calls a subroutine at 280 to convert the
PC to a hexadecimal string. Line 100 combines the hex string with a
decimal equivalent string and some blank spacers into PC$ for later
printing.
Lines 110-130 calculate the three parts of the
opcode that was PEEKed in line 90, and line 170 branches according to
the type of opcode.
Converting The Opcodes
Lines 700-1470 process all of the branches, jumps, and other opcodes
which change program flow. This is the heart of the program. Lines
700-710 do a table lookup for the opcode mnemonic and verify that it is
a valid opcode. Line 720 tests for conditional branches and jumps to
1330 to process any it finds.
All other opcodes which change program flow are
detected in line 730, which could transfer control to line 760. Line
740 branches to the appropriate routine to process and format the
operand according to the addressing mode indicated by the opcode.
Line 760 further checks for opcodes which change
program flow; these are processed at line 1010. JMP is detected at line
770 and processed at line 820.
Creating The Mnemonics
Once the program flow has been processed, the opcodes are processed.
The mnemonics are obtained by a table lookup, the addressing mode is
determined, and the operand is formatted accordingly. Lines 350 and 370
represent subroutines for fetching one-byte and two-byte operands
respectively.
Lines 250-310 represent a subroutine for converting
the operand value to a character string, and lines 430-650 may add
supplemental information to the operand string as well as generating a
comment string CM$ to identify addressing mode. Line 210 prints the
collection of information about this particular instruction and then
jumps to line 80 to start the loop for the next instruction.
If you don't have a printer, line 210 can be changed
by dropping the blank spacer B2$ and the addressing mode comment CM$
from the end of the PRINT command. This will shorten the print line to
under 40 characters and let you view more disassembled instructions at
one time. Use the RVS or CTRL key to slow down the scrolling.
Threading
Disassembler
Refer to
"COMPUTE!'s Guide To Typing In Programs" before entering this listing.
40 GOSUB 2070
:rem 173
60 REM{10 SPACES}MAIN LOOP STARTS
HERE
:rem 174
80 GET Z$:IF Z$=Q$ GOTO 970
:rem 152
90 B1=PEEK(PC):A=PC:GOSUB
280 :rem 193
100
PC$=RIGHT$(BL$+STR$(PC),5)+RIGHT$(BL$
+A$,6)+"{3
SPACES}" :rem 232
110 P1=INT(B1/B6):A=B1-P1*B6
:rem 33
120 P2=INT(A/B3)
:rem 115
130 P3=A-P2*B3
:rem 227
150 REM{3 SPACES}ANALYZE OP
CODE :rem 72
170 ON(P3+1)GOTO
700,1520,1670,1930
:rem 28
190 REM{3 SPACES}PRINT A
DISASSEMBLED LIN
E
:rem 228
210
PRINT#PR,PC$;OP$;B2$;LEFT$(ND$+BL$,14
);B2$;CM$:GOTO
80 :rem 53
230 REM{3 SPACES}CONVERT OPERAND
:rem 163
250 IF HX=1 GOTO 280
:rem 7
260 A$=STR$(A):RETURN
:rem 3
280 ZZ$="":A$="":IF A<0 THEN
A=-A:A$="-"
:rem 241
290
J=INT(A/16):ZZ$=MID$(HX$,A-(J*16)+1,1
)+ZZ$
:rem 25
300 A=J:IF A>0 GOTO 290
:rem 167
310 A$=A$+ZZ$:RETURN
:rem 184
330 REM{3 SPACES}GET
OPERAND :rem 99
350 A=PEEK(PC+1):PC=PC+2:GOSUB
250:RETURN
:rem 224
370
A=PEEK(PC+1)+LB*PEEK(PC+2):PC=PC+3:GO
SUB
250:RETURN :rem
44
400 REM{3 SPACES}ADDRESSING
MODES:rem 212
410 REM{8 SPACES}ZERO PAGE + INDEX
:rem 121
430 GOSUB
350:ND$=A$+",X":CM$="ZERO PAGE,
INDEX X":GOTO
210 :rem 2
450 GOSUB
350:ND$=A$+",Y":CM$="ZERO PAGE,
INDEX Y":GOTO
210 :rem 6
470 REM{8 SPACES}ZERO
PAGE :rem 220
490 GOSUB 350:ND$=A$:CM$="ZERO
PAGE":GOTO
210
:rem 25
510 REM{8 SPACES}ABSOLUTE +
INDEX:rem 124
530 GOSUB
370:ND$=A$+",X":CM$="ABSOLUTE,I
NDEX X":GOTO
210 :rem 7
550 GOSUB
370:ND$=A$+",Y":CM$="ABSOLUTE,I
NDEX
Y":GOTO210 :rem 11
570 REM{8
SPACES}ABSOLUTE :rem 223
590 GOSUB
370:ND$=A$:CM$="ABSOLUTE":GOTO
{SPACE}210
:rem 30
610 REM{8 SPACES}IMMEDIATE
:rem 10
630 A=PEEK(PC+1):PC=PC+2:GOSUB 280
:rem 202
640
ND$="#"+A$:CM$="IMMEDIATE" :rem 130
650 GOTO 210
:rem 103
670 REM{7 SPACES}GROUP ZERO OP
CODES
:rem 91
680 REM{8 SPACES}(SOME MOSTECH
GROUP 3)
:rem 218
700
OP$=MID$(GO$(Pl),P2*3+1,3) :rem 25
710 IF OP$=BD$ GOTO 1970
:rem 219
720 IF P2=4 GOTO 1330:{5
SPACES}REM
{13 SPACES}8
BRANCHES :rem 183
730 IF P1<4 GOTO 760:{6
SPACES}REM
{13
SPACES}SPECIAL FUNCTION :rem 117
740 ON(P2+1)GOTO
630,490,1720,590,1930,43
0,1720,530
:rem 56
760 IF P2=0 GOTO 1010:{5
SPACES}REM
{12
SPACES}BRK,JSR,RTI,RTS :rem 110
770 IF OP$="JMP" GOTO 820: REM{12
SPACES}
JMP
:rem 48
780 ON(P2+1)GOTO
1930,490,1720,590,1930,4
30,1720,530
:rem 112
800 REM{4 SPACES}JUMPS HANDLED
HERE
:rem 31
820
Bl=PEEK(PC+1)+LB*PEEK(PC+2):A=B1
:rem 35
830 GOSUB
250:ND$=A$:CM$=BL$ :rem 33
840 IF(BD=1)AND(Pl=2) THEN
PC=PC+3:GOTO 1
170
:rem 176
850 IF P1=2 THEN PC=BI:GOTO 1170
:rem 202
860 ND$="( " + ND$ + " )"
:rem 118
870 Bl=PEEK(B1) +
LB*PEEK(Bl+1):A=Bl:GOSU
B 250
:rem 220
880 PRINT#PR:PRINT#PR,"***
ENCOUNTERED IN
DIRECT
JUMP"
:rem 54
890 PRINT#PR,"{2 SPACES}THRU
";ND$;"
{2 SPACES}TO
";A$ :rem 89
900 IF(BD=1) THEN PC=PC+3:GOTO
1170
:rem 153
910 PRINT:PRINT"ENCOUNTERED
INDIRECT JUMP
":PRINT" THRU
";ND$;" TO ";A$:rem 253
920 PRINT:PRINT"IS THIS VALID
?":INPUT A$
:rem 229
930 IF LEFT$(A$,l)=YA$ THEN
PC=Bl:GOTO117
0
:rem 54
940 PRINT#PR
:rem 239
950 PRINT:PRINT"DO YOU WANT TO
CONTINUE ?
":INPUT
A$
:rem 118
960 IF LEFT$(A$,1)=YA$ THEN GOSUB
2320:GO
TO 80
:rem 220
970 CLOSE PR:END
:rem 201
990 REM{5 SPACES}HANDLES{2
SPACES}BRK,JSR
,RTI, AND RTS
:rem 146
1010 ON(Pl+1)GOTO
1020,1120,1060,1210
:rem 92
1020 A=PC:GOSUB
250:PRINT#PR:PRINT#PR,"**
**{2
SPACES}BREAK AT ";A$ :rem 239
1030 PRINT:PRINT"ENCOUNTERED
BREAK AT ";A
$
:rem 50
1040 GOTO 940
:rem 155
1060 A=PC:GOSUB
250:PRINT#PR:PRINT#PR,"**
**{2
SPACES}RTI AT ",A$ :rem 125
1070 PRINT:PRINT"ENCOUNTERED RTI
AT ";A$
:rem 192
1080 GOTO 940
:rem 159
1100 REM{33 SPACES}STACK{2
SPACES}(JSR)
:rem 92
1120 A=PEEK(PC+l) +
LB*PEEK(PC+2):rem 240
1130 LC=PC:IF(BD=1) GOTO
1150 :rem 50
1140 SP=SP+l:SS(SP)=PC+2
:rem 166
1150 PC=A:GOSUB 250:ND$=A$:CM$-BL$
:rem 152
1160 IF(BD=1) THEN PC=LC+3
:rem 136
1170 PRINT#PR,"-----":GOTO
210 :rem 114
1190 REM{33 SPACES}UNSTACK
(RTS) :rem 18
1210 IF(BD=1) THEN PC=PC+1:GOTO
1240
:rem 192
1220 IF SP<l GOTO 1270
:rem 103
1230 PC=SS(SP)+l:SP=SP-1
:rem 167
1240 PRINT#PR,"-----"
:rem 106
1250 ND$=BL$:CM$=BL$:GOTO 210
:rem 80
1270 A=PC:GOSUB
250:PRINT#PR:PRINT#PR,"**
* RTS AT
";A$;" - STACK EMPTY"
:rem 17
1280 PRINT:PRINT"NO STACK ENTRY
FOR RTS A
T ";A$
:rem 29
1290 GOTO 940
:rem 162
1310 REM{5 SPACES}BRANCHES - REL
ADDR
:rem 26
1330 A=PEEK(PC+1)
:rem 170
1340 IF A>127 THEN A=A-LB
:rem 25
1350 B1= PC+2+A:ND$="*":IF
A=>0 THEN ND$=
"*+"
:rem 224
1360 GOSUB 250:ND$=ND$+A$:CM$=BL$
:rem 49
1370 A=BI:GOSUB
250:ND$=LEFT$(ND$+BL$,7)+
RIGHT$(BL$+A$,7) :rem 147
1380 A=PC:GOSUB 250
:rem 46
1390 PRINT
:rem 90
1400 IF(BD=1) GOTO 1470
:rem 158
1410 PRINT OP$;"-- CONDITIONAL
BRANCH ENC
OUNTERED"
:rem 13
1420 PRINT" FROM ";A$;" TO ";ND$
:rem 127
1430 PRINT:PRINT"DO YOU WANT TO
FOLLOW TH
E BRANCH
?" :rem
110
1440 INPUT A$
:rem 190
1450 IF LEFT$(A$,l)=YA$ THEN
PC=BI:GOTO 1
170
:rem 100
1460 IF LEFT$(A$,1)=Q$ GOTO
970 :rem 71
1470 PC=PC+2:GOTO 210
:rem 146
1500 REM{6 SPACES}GROUP ONE OP
CODES
:rem 38
1520 OP$=MID$(Gl$,Pl*3+1,3)
:rem 120
1530 IF (P1=4)AND(P2=2) THEN
OP$=BD$:GOTO
1970
:rem 205
1540 ON(P2+1)GOTO
1580,490,630,590,1620,4
30,550,530
:rem 55
1560 REM{8 SPACES}(INDIRECT,X)
ADDRESSING
:rem 187
1580 GOSUB 350:ND$="( "+A$+",X
)":CM$="IN
DEXED
INDIRECT":GOTO 210 :rem 243
1600 REM{8 SPACES}(INDIRECT),Y{2
SPACES}A
DDRESSING
:rem 183
1620 GOSUB 350:ND$="( "+A$+"
),Y":CM$="IN
DIRECT
INDEXED":GOTO 210 :rem 239
1650 REM{9 SPACES}GROUP TWO OP
CODES
:rem 68
1670 OP$=MID$(G2$,Pl*3+1,3)
:rem 127
1680 IF P1<4 GOTO 1870{10
SPACES}REM
{11
SPACES}SHIFTS AND ROTATES :rem 2
1690 ON(P2+1)GOTO
630,490,1710,590,1830,1
740,1770,1810
:rem 215
1710
OP$=MID$(GG$,(Pl-4)*3+1,3) :rem 65
1720 ND$=BL$:CM$=BL$:PC=PC+1:GOTO
210
:rem 75
1740 IF P1<6 GOTO 450
:rem 32
1750 IF P1>5 GOTO 430
:rem 32
1770 OP$=MID$(GG$,Pl*3+1,3)
:rem 149
1780 IF OP$=BD$ GOTO 1970
:rem 19
1790 GOTO 1720
:rem 212
1810 IF P1=5 GOTO 550
:rem 31
1820 IF P1>5 GOTO 530
:rem 31
1830 OP$=BD$:GOTO 1970
:rem 186
1850 REM{10 SPACES}SHIFTS AND
ROTATES
:rem 120
1870 ON(P2+1)GOTO
1830,490,1890,590,1830,
430,1830,530
:rem 169
1890 ND$=BL$:CM$=BL$:PC=PC+1:GOTO
210
:rem 83
1910 REM{5 SPACES}VOID GROUP
CODE:rem 137
1930 OP$=BD$:GOTO 1970
:rem 187
1950 REM{5 SPACES}INVALID OP
CODE:rem 116
1970 ND$=BL$:CM$="BAD OP
CODE" :rem 102
1980 Z$="{2 SPACES}":FOR I=0 TO 10
:rem 172
1990 A=PEEK(PC+I):GOSUB
280:Z$=Z$+A$
:rem 37
2000 NEXT
:rem 1
2010 PRINT#PR:PRINT#PR,PC$;Z$;"
HEX"
:rem 161
2020 PC=PC+1:GOTO1170
:rem 191
2050 REM{22
SPACES}INITIALIZATION:rem 211
2070 CL$=CHR$(147):PRINTCL$:{2
SPACES}REM
{11
SPACES}CLEAR SCREEN AND HOME CUR
SOR
:rem 64
2080 SP=0:DIM SS(50):{9 SPACES}REM
{11
SPACES}POINTER AND STACK:rem 210
2090 PC=0:{20 SPACES}REM{11
SPACES}PROGRA
M
COUNTER
:rem 33
2110 DIM GO$(7):{14 SPACES}REM{11
SPACES)
OP
CODES
:rem 236
2120
GO$(0)="BRKBADPHPBADBPLBADCLCBAD"
:rem 245
2130
GO$(l)="JSRBITPLPBITBMIBADSECBAD"
:rem 62
2140
GO$(2)="RTIBADPHAJMPBVCBADCLIBAD"
:rem 29
2150
GO$(3)="RTSBADPLAJMPBVSBADSEIBAD"
:rem 70
2160
GO$(4)="BADSTYDEYSTYBCCSTYTYABAD"
:rem 144
2170
GO$(5)="LDYLDYTAYLDYBCSLDYCLVLDY"
:rem 164
2180
GO$(6)="CPYCPYINYCPYBNEBADCLDBAD"
:rem 88
2190
GO$(7)="CPXCPXINXCPXBEQBADSEDBAD"
:rem 98
2200
Gl$="ORAANDEORADCSTALDACMPSBC"
:rem 181
2210
G2$="ASLROLLSRRORSTXLDXDECINC"
:rem 33
2220
GG$="TXATAXDEXNOPTXSTSXBADBAD"
:rem 45
2230 TP=65535:{16 SPACES}REM(11
SPACES}ME
MORY
ADDRESS LIMIT :rem 44
2240 B3=4:B6=32:{14 SPACES}REM{11
SPACES}
SHIFTS
OP CODE RIGHT :rem 41
2250 LB=256:{18 SPACES}REM{11
SPACES}LEFT
BYTE MULTIPLIER :rem 181
2260 BL$="{14
SPACES}":YA$="Y":BD$="BAD":
B2$="{6
SPACES}" :rem 78
2270
HX$="0123456789ABCDEF":Q$="Q":rem 51
2280 OP=3:{20 SPACES}REM{11
SPACES}CRT DE
VICE
RETURN WITHOUT GOSUB :rem 38
2290 PRINT"DO YOU WANT PRINTER
OUTPUT ?":
INPUT A$
:rem 235
2300 IF LEFT$(A$,1)=YA$ THEN OP=4:
{5
SPACES}REM: PRINTER DEVICE RETURN
WITHOUT GOSUB :rem 176
2310 PR=5:OPEN PROP
:rem 179
2320 PRINT:PRINT"WHAT IS A GOOD
TITLE FOR
THIS ?":INPUT A$ :rem 168
2330 BD=0
:rem 187
2340 PRINT#PR:PRINT#PR
:rem 167
2350 PRINT:PRINT"DEFAULT IS TO
FOLLOW THE
PROGRAM THREAD :rem 8
2360 PRINT"DO YOU WANT A BLOCK
DISASSEMBL
Y
:rem 48
2370 INPUT Z$:IF
LEFT$(Z$,1)<>YA$ GOTO 24
00
:rem 85
2380 BD=1:PRINT#PR,"{2
SPACES}BLOCK DISAS
SEMBLY
OF":PRINT#PR,".. ";A$:rem 245
2390 GOTO 2410
:rem 206
2400 PRINT#PR,"{2
SPACES}THREADING DISASS
EMBLY
OF":PRINT#PR,"{3 SPACES}";A$
:rem 143
2410 PRINT#PR
:rem 25
2420 PRINT"DEFAULT IS HEX
MODE":PRINT"DO
{SPACE}YOU WANT TO USE DECIMAL ?"
:rem 215
2430 HX=1:INPUT A$
:rem 6
2440 IF LEFT$(A$,1)=YA$ THEN
HX=0:PRINT"D
ECIMAL
MODE SELECTED" :rem 90
2450 PRINT"DISASSEMBLY TO START
AT LOCATI
ON ?"
:rem 58
2460 GOSUB 2560:PC=A:IF PC>TP
GOTO 2450
:rem 166
2470 A=PC:GOSUB
250:PRINT#PR,"STARTING LO
CATION
=";A$ :rem 205
2480 PRINT#PR
:rem 32
2490 PRINT#PR,"LOC{12 SPACES}OP{5
SPACES)
OPERAND"
:rem 23
2500 PRINT#PR
:rem 25
2510 PRINT:PRINT" PRESS Q TO STOP
AT ANY
{SPACE}TIME":PRINT :rem 154
2520 RETURN
:rem 169
2540 REM{13 SPACES}SUBROUTINE TO
GET STAR
TING
LOCATION
:rem 7
2560 IF HX=1 GOTO 2590
:rem 115
2570 INPUT A:RETURN
:rem 185
2590 A=0:INPUT A$:IF LEN(A$)>4
THEN PRINT
"TOO
BIG-TRY AGAIN":GOTO2590 :rem 16
2600 OK=1:FOR I=1 TO
LEN(A$):Z$=MID$(A$,I
,l)
:rem 91
2610 BAD=1:FOR J=1TO16:IF
Z$<>MID$(HX$,J,
1) GOTO
2630 :rem
140
2620
BAD=0:A=A*16+J-1
:rem 91
2630 NEXT:IF(BAD=0)THEN NEXT:GOTO
2650
:rem 6
2640 PRINT:PRINT"INVALID HEX
CHAR":OK=0:N
EXT
:rem 40
2650 IF OK=1 THEN
RETURN :rem 115
2660 GOTO
2590
:rem 215