Classic Computer Magazine Archive COMPUTE! ISSUE 48 / MAY 1984 / PAGE 173

Commodore Word Wizard

Joe W. Rocke

"Word Wizard" improves your writing skills by checking the readability of any written material. For the VIC-20, Commodore 64, and PET/CBM computers.

The term foggy writing was originated by Robert Gunning. Seeking ways to improve the readability of written text, he developed a fog index formula. The formula is based on counting the number of words and sentences in a sample paragraph of text. Long words and long sentences produce a high index number. This type of writing is called foggy because it can be harder to read and understand. Writing that is easy to read (and understand) should have a low fog index.

The fog index formula uses a 100- to 200-word sample of text. Words of three syllables or more are considered "long." Dividing the word count by the number of sentences provides the average sentence length. Adding the number of long words and performing a simple computation produce the fog index. Although the index number is rather arbitrary, it does provide a standard for measuring text readability.

Researchers have since learned that people prefer to read below their educational level. Thus the fog formula has been expanded to produce a reading level index number. The result is a number that represents the approximate grade level at which written material can be read and understood.

People are comfortable reading text that has a reader index ranging from 6 to 8. Most of the writing in popular magazines and newspapers has an index in this range. People are capable of reading at a higher level, but the concentration required can make such writing tedious. Even college professors find it uncomfortable to read something with an index of 12 or higher.

Computerized Word Check

The computer is an ideal tool for checking text for readability. Large companies have developed programs of this type to check their product manuals. When used with word processing systems, this checking process takes little additional time.

Using "Word Wizard" is as simple as typing text onto a video screen instead of on paper, as with a typewriter. A 100-word sample is all that is required. Almost all text-reading analysis is based on this sample size.

The program begins with a prompt. There is no cursor, but whatever is typed appears on the screen. The left arrow can be used to correct a typo without affecting the program. Use the RETURN key only when you are finished entering the sample. The screen then clears, and the text that has been typed to memory will begin to march across the screen. The text display will then be formatted to improve readability.

Type in the text sample without worrying how it looks on the video display. The text will wrap around the screen, causing some words to be broken midway and to continue on the next line. The display is primarily for reference so you can see what was originally typed.

The Display Phase

Next, during its display phase, the program counts characters, words, and sentences. It also counts the number of words containing more than nine characters, which are presumed to consist of three or more syllables. Word groups ending with a semicolon or colon are counted as one sentence. This prevents a compound sentence from being counted as a single sentence. Naturally, any word group ending with a period, question mark, or exclamation mark is counted as a sentence.

The word-checking data is stored in simple variables and is then used to compute the reading index at the end of the display cycle. A continuation prompt concludes the display cycle to permit you to read the last display page.

Finally the word, sentence, and long word counts are displayed. The reading index, rounded to two decimal places, completes the text analysis. The program then asks you to repeat the analysis or exit the program.

An index of 6-9 indicates a good readability level. A higher index indicates that the text might benefit from some editing. You may want to use two shorter sentences which carry the same thought as a long one, or try to find shorter words. For example, it is easier to read city than the word metropolis.

Housekeeping Chores

Lines 10-30: Housekeeping chores are performed at the beginning of the program. The formula used to round the reading index is defined in line 10. Major variables are set to zero to prevent errors if the program is rerun. Variable MS in line 20 denotes the beginning memory storage address. A second variable is set to the same value for use in the display loop.

The value currently in the program works with an unexpanded VIC-20. Use MS = 2300 in line 20 if you have a PET/CBM or a 3K expanded VIC. (Ignore the color commands if you have a PET.) For a VIC with 8K or more of expansion memory, use MS = 5900. Try MS = 3300 for the Commodore 64. For other systems you will have to use an address above the BASIC program area.

Lines 35-150: The input cycle begins at line 60 with the GET A$ keyboard scan for a key input. When a key is pressed, the input is checked for a backspace (left cursor). If it is a backspace, the invisible cursor moves one space to the left, and the memory storage is decreased by one. This is to prevent counting the backspace as part of the text. The program then loops back for a new key input.

If the key pressed is a text character, the key is displayed and converted to its ASCII equivalent. The ASCII value is then POKEd in memory address MS for storage. The input is then tested for a carriage return (CR); if not a CR, storage address MS is incremented by one, and the program loops back for another key input. Note that a CR breaks the input loop, jumping program flow to the continuation GOSUB.

The Word Count

Line 110 performs a word count during the input cycle. The count value of 125 in line 120 limits input to a maximum of 125 words. These two lines are optional, but do insure keeping the input within sample limits. A smaller number of words can be used for a sample, of course.

Lines 160-300: The display and checking cycle begins upon user response to the continuation prompt. Variables used to accumulate word-checking data are set to zero to prevent errors if the program is repeated. A FOR-NEXT loop is used for the display cycle, since storage beginning address BE and ending address MS were established during the input cycle.

The stored ASCII data is PEEKed from each memory address, converted to a string, and temporarily stored in string variable C$ for display. C$ now represents the keyboard character entered during the input cycle. The individual characters are counted and the count is stored in C. L is used to count characters for line display formatting.

Word-checking functions are performed by IF statements. These lines check for the space character that denotes a word end, or punctuation indicating a sentence end. A space increments the word count, W. A sentence end increments the sentence count stored in S and decreases the character count by one. The decrease prevents the punctuation from being counted as a word character. If the character count in C is equal to or greater than 9, and a space indicates a word, then long word counter LW is incremented. The character counter is returned to zero value whenever a space or sentence end is encountered.

Screen Formatting

Line 220 formats the text to reduce word wraparound.

Lines 320-400: The text analysis is performed in this portion of the program. The reading index is computed in line 320. Text data accumulated during the word-check cycle are displayed, followed by the reading index (ID). The rounding function is performed by the FNA(ID) formula which was established at the beginning of the program.

Lines 410-480: The remaining lines contain the user prompts. Conventional INPUT statements are used to keep the program short. END is used between the REPEAT prompt and the continuation GOSUB to prevent an error message when exiting the program. Line 470 prints the word input count and returns control to the continuation prompt of line 150.


A$	The input string is confined to one character.
BE	Beginning address of the memory storage area.
C	ASCII value of A$, and the character counter.
C$	Character string used for the display cycle.
ID	Reading index. L is the display line length counter.
LW	Long word count storage.
MS	Memory storage ending address.
P	PEEK value of MS contents.
S	Sentence count storage.
T	Display cycle loop counter.
W	Word count storage.
WC	Input cycle word count.
Z & Z$  Prompts.

Word Wizard program listing