Some Assembly Required
by Robert Peck
This month's column provides a program designed for a bulletin board. It is used to determine if the messages left on the board contain any language the computer bulletin board owner might find objectionable. He may then decide not to accept the message, or perhaps to hang up on the caller, or simply use the output from the program to signal the system operator (SYSOP) that something questionable has just come in.
The reason I have published this program here is it uses several features I will be explaining in future columns. The primary feature is the way parameters are passed to and from Atari BASIC. This will be the topic of the next column. Another feature is the exclusive use of fully relocatable code. (This means no matter where it is placed in the memory, it still does the same job.) This too will be a topic of a future column.
How this Program Works:
The user saves some space for a string in his program, here called E$. It needs only 180 locations maximum. Then another string is defined called A$ (you can have many of these as you will see below). A$ contains, in all capital letters, the definitions for all of the words you might find objectionable or word combinations, such as:
BADWORD or BAD-WORD or BAD WORD
The individual word combinations are separated, in A$, by an exclamation point (!). The maximum length of A$ is 255 characters, including the leading and trailing dollar-sign ($) which tells the machine language program where the string ends. The leading dollar sign is only a space holder but must be there.
B$ is a string of up to 255 characters given to the program, along with its length. B$ will be modified by the program by squishing all of its printable characters (ATASCII values 32 thru 127 only) into a row, and then making the scan for objectionable words. The revised length of B$ is returned as the first character of A$, so that is why there must be a spaceholder in the first character.
If an asterisk (*) represents a nonprintable character, such as a line feed or other cursor move, then the string: "This is a sneaky B****A***D**word" will become: "This is a sneaky BADword**D**word." When all of the nonprinting characters are removed, ASC(A$(1, 1)) will show a value of 25 instead of the original string length. If "BADWORD" was in A$ somewhere to begin with, the value of X will point to the first character of that word in A$. Otherwise X will be zero if none of the bad words are found.
All letters are capitalized inside the compare routine before the compare happens, but the source string "case" is not changed.
Those users who have abused the bulletin board privilege also may read this and find creative ways around it. I hope this at least provides a building block others can use for future enhancements.
By the way, as a comparison to BASIC, the original version of this routine, in BASIC, took about 15-20 seconds to process a 200-character input line against a set of 20 bad-words. This version takes less than one second to operate.
00CB 10 PCB = $CB ;DEFINE SOME WORKSPACE IN PAGE 0 00CC 20 PCC = $CC ;WHICH WONT INTERFERE WITH BASIC 00CD 30 PCD = $CD ;USED ONLY TEMPORARILY ANYWAY 00CE 40 PCE = $CE 00CF 50 PCF = $CF 00D0 60 PD0 = $D0 00D1 70 PD1 = $D1 00D2 80 PD2 = $D2 00D3 90 PD3 = $D3 00D4 0100 PD4 = $D4 00D5 0110 PD5 = $D5 00D6 0120 PD6 = $D6 00D7 0130 PD7 = $D7 0000 0140 *= $5000 5000 68 0150 START PLA ;DISCARD COUNT OF PUSHES 0151 ; ;RELY ON CALLER TO PUT RIGHT 5001 68 0160 PLA 5002 85CF 0170 STA PCF 5004 85D7 0180 STA PD7 ;SAVE POINTER TO A$(1,1) 0190 ; ;GET HIGH BYTE OF A$ 5006 68 0200 PLA 5007 85CE 0210 STA PCE ;AND LOW BYTE ALSO 5009 85D6 0220 STA PD6 ;SAVE POINTER TO A$(1,1) 500B E6CE 0230 INC PCE ;SAVE SPACE FOR LENGTH 500D 68 0240 PLA 500E 85D1 0250 STA PD1 ;HIGH BYTE OF B$ POINTER 5010 85D5 0260 STA PD5 ;WILL GO THRU HERE ONCE 0270 ; ;FOR EACH WORD IN A$ 5012 68 0280 PLA 5013 85D0 0290 STA PDO ;LOW BYTE OF B$ POINTR 5015 85D4 0300 STA PD4 5017 68 0310 PLA ;DISCARD HI BYTE OF B$ LEN 5018 68 0320 PLA ;GET LOW BYTE 5019 85CB 0330 STA PCB ;SAVE IT (<255) 0331 ; 0332 ; ;NOW COMPRESS THE STRING BY 0333 ; ;REMOVING ALL NONPRINTABLE 0334 ; ;CHARACTERS 0335 ; 0336 ; ;PUT REVISED LENGTH OF THE 0337 ; ;STRING INTO A$(1,1) 0338 ; 501B A000 0340 KOMPRS LDY #0 501D 84CC 0350 STY PCC ;POINT TO SOURCE PART OF STRING 501F 84CD 0360 STY PCD ;POINT TO DEST PART OF STRING 5021 A4CC 0370 KGET LDY PCC 5023 C4CB 0380 CPY PCB ;GOT TO END OF STRING YET? 5025 B016 0390 BCS KDONE ;IF SO, CONTINUE PROCESSING 5027 B1D0 0400 LDA (PD0),Y 5029 C8 0410 INY 502A 84CC 0420 STY PCC ;BUMP SOURCE POINTER 502C C920 0430 CMP #$20 ;MAKE SURE DATA IS PRINTABLE 0440 ; ;PRINTABLE = ASCII 20-7F) 502E 90F1 0450 BCC KGET ;IF <20, SKIP COPYING 5030 C97F 0460 CMP #$7F ;OTHER END 5032 B0ED 0470 BCS KGET ;ALSO SKIP 5034 A4CD 0480 LDY PCD ;GET DESTINATION POINT 5036 91DO 0490 STA (PDO),Y 5038 C8 0500 INY 5039 84CD 0510 STY PCD ;BUMP DESTINATION PNTR 503B D0E4 0520 BNE KGET ;RELOCATABLE JUMP 503D A5CD 0530 KDONE LDA PCD ;SAVE NEW B$ LENGTH 503F 85D3 0540 STA PD3 ;IN D3 TEMPORARILY 5041 A000 0550 LDY #0 ;NOW START SEARCH 0560 ; ;ON MODIFIED STRING 5043 B1CE 0570 SRCH1 LDA (PCE),Y ;GET CHARACTER 0580 ; ;FROM THE BAD-WORD STRING 0590 ; ;LOOKING FOR THE END OF WORD 5045 C924 0600 CMP #$24 ;DOLLAR SIGN IS END 5047 F045 0610 BEQ STREND ;STRING END 5049 C921 0620 CMP #$21 ;STRING DELIMITER? 504B F003 0630 BEQ WRDFND ;FOUND A WORD 504D C8 0640 INY ;MAKE A COUNT OF CHARACTERS 0650 ; ;LOOKED AT SO FAR 504E D0F3 0660 BNE SRCH1 ;KEEP GOING TILL FIND 0670 ; ;A BLANK 5050 98 0680 WRDFND TYA 5051 AA 0690 TAX ;KEEP THE CHAR COUNT IN X 0700 ; 5052 A5CB 0710 LDA PCB ;MOVE B$ COUNT INTO 5054 85CC 0720 STA PCC ;A COUNT-DOWN LOCATION 0730 ; ;WILL USE AS SEARCH COUNTER 0740 ; 5056 8A 0750 CMP0 TXA ;MOVE COUNTER INTO Y 0760 ; ;FOR THE SEARCH 5057 A8 0770 TAY 5058 88 0780 CMPI DEY ;THIS IS THE ACTUAL STR 0790 ; ;COMPARE LOOP, STARTS 0800 ; ;ON LAST LETTER FIRST, 0810 ; ;DIES ON FIRST NONCMP 5059 303C 0820 BMI FOUND ;IF TRIED ALL AND NO 0830 ; ;MISCOMPARES, THEN DOES A FOUND 505B B1D0 0840 CMP2 LDA (PD0),Y ;GET A B$ PIECE 505D C961 0850 CMP #$61 ;SEE IF LOWER CASE LTR 505F 9002 0860 BCC CMP3 5061 29DF 0870 AND #$DF ;MAKE IT UPPER CASE 5063 D1CE 0880 CMP3 CMP (PCE),Y ;SEE IF CHARACTERS 0890 ; ;ARE MATCHING 5065 F0F1 0900 BEQ CMP1 ;IF SO, GO DO NEXT ONE 5067 C6CC 0910 DEC PCC ;IF DIDNT MATCH, BUMP 0920 ; ;THE B$ POINTER TO NEXT 0930 ; ;AND TRY THE SAME WORD 0940 ; ;AGAIN (LOOKING FOR 0950 ; ;EMBEDDED OCCURRENCE) 5069 F00A 0960 BEQ BSEND ;IF PCC=0 THEN DONE 0970 ; ;WITH THE INPUT STRING 0980 ; ;AND CAN GO ON TO THE 0990 ; ;NEXT WORD AND REPEAT 1000 ; ;TILL ALL BAD WORDS 1010 ; ;ARE CYCLED THRU. 506B E6D0 1020 INC PD0 ;BUMPS THE B$ POINTER 506D D002 1030 BNE NOTD1 506F E6D1 1040 INC PD1 5071 A000 1050 NOTD1 LDY #0 5073 F0E1 1060 BEQ CMP0 ;THIS FORCES A BRANCH 1070 ; ;ALWAYS, AND MAKES THE 1080 ; ;CODE FULLY RELOCATABLE 1090 ; ;FORCES A COMPARE TO ALL 1100 ; ;OF B$ 5075 A5D5 1110 BSEND LDA PD5 ;RESTORE THE B$ 1120 ; ;ORIGINAL POINTER 1130 ; ;FOR THE NEXT BAD 1140 ; ;WORD 5077 85D1 1150 STA PD1 5079 A5D4 1160 LDA PD4 507B 85D0 1170 STA PD0 507D E8 1180 INX ;X POINTS TO BLANK 1190 ; ;SPACE IN BAD WORD STR 1200 ; ;SO INX POINTS TO FIRST 1210 ; ;CHARACTER IN THE NEXT 1220 ; ;WORD 507E 8A 1230 TXA ;MOVE X WHERE USABLE 507F 18 1240 CLC 5080 65CE 1250 ADC PCE ;NOW BUMP POINTER OF 1260 ; ;BAD WORDS TO NEXT ONE 5082 85CE 1270 STA PCE ;USING THE X VALUE 5084 A5CF 1280 LDA PCF 5086 6900 1290 ADC #0 5088 85CF 1300 STA PCF ;16 BIT INCREMENT 508A A000 1310 LDY #0 ;HAVE TO SET Y TO 0 1320 ; ;ANYHOW, SO MAKE FULLY 1330 ; ;RELOCATABLE THIS WAY 508C F0B5 1340 BEQ SRCH1 ;JUMP ALWAYS. (RELOC) 1350 ; 508E A900 1360 STREND LDA #0 ;END OF THE STRING 5090 85D4 1370 STA PD4 5092 85D5 1380 STA PD5 ;WITH NOTHING FOUND 5094 F009 1390 BEQ FOUNDI ;RELOC JUMP 5096 60 1400 RTS 5097 A5CE 1410 FOUND LDA PCE ;GET LOW BYTE 5099 85D4 1420 STA PD4 ;OF POINTER TO THE 509B A5CF 1430 LDA PCF ;FIRST CHARACTER OF THE 509D 85D5 1440 STA PD5 ;WORD WHICH WAS THE 1445 ; 1450 ; ;ONE FOUND AND RETURN IT TO THE 1460 ; ;CALLER IN THE FP ACCUMULATOR. 1470 ; ;THIS WAY CAN SAY WHICH WORD 1480 ; ;WAS EMBEDDED. 509F A5D3 1490 FOUND1 LDA PD3 ;GET B$ LENGTH 50A1 A000 1500 LDY #0 50A3 91D6 1510 STA (PD6),Y ;PUT INTO A$(191) 50A5 60 1520 RTS 1530 ; 1540 ; 1550 ;CALLING SEQUENCE IS: 1560 ; 1570 ; X=USR(ADR(PROG$),ADR(A$), 1580 ; ADR(B$),LEN(B$)) 1590 ; 1600 ; ON RETURN, X=0 (FALSE) IF 1610 ; STRING IS NOT FOUND, 1620 ; 1630 ; X=POINTER TO FIRST 1640 ; ADDRESS OF FOUND 1650 ; WORD 1660 ; 1670 ; 1680 ; WHERE PROG$ IS THE STRING WHICH 1690 ; CONTAINS THIS PROGRAM, AND 1700 ; WHERE A$ IS LOOKS LIKE THIS: 1710 ; 1720 ; A$="$BAD1!BAD2!WORD WITH BLANKS!BAD3$" 1721 ; 1730 ; COMPARISON DATA CAN USE EMBEDDED BLANKS AS SHOWN. 1740 ; 1750 ; SOURCE STRING (B$) WILL BE AUTO 1760 ; SHORTENED TO REMOVE ALL NONPRINTING CHARACTERS. 1770 ; ACCEPTS ONLY ASCII $20-7F. 1780 ; 1790 ; NEW LENGTH OF B$ RETURNED IN A$ 1791 ; USER CAN ACCESS NEW LENGTH BY: 1800 ; VAL(A$(1,1)) OR PEEK(ADR(A$)) 1801 ; 1802 ; USER CAN SHORTEN TO NEW LENGTH 1803 ; BY: B$(N)=B$(N,N) WHERE N= 1804 ; VAL(A$(1,1) OR N=PEEK(ADR(A$)) 50A6 1810 .END
Listing: BBSCHECK.BAS Download