Classic Computer Magazine Archive CREATIVE COMPUTING VOL. 10, NO. 12 / DECEMBER 1984 / PAGE 168

Indexing by microcomputer; an easy way to put the finish touches on your literary masterpiece. James McFarlane.

Most serious writers have come up against the problem of making an index at some point in their writing careers. Whether you are writing a business report or a research paper, a specialist monograph or a book, the inclusion of an index--to say nothing of the quality and reliability of that index--can contribute significantly to the appeal of that publication to its readers.

There are, of course, indexes and there are indexes. Some are no more than a list of proper names--people and places--in alphabetical order with a page reference next to each. Others can be highly wrought analyses of the contents of the work, with sub- and subdivisions under each heading.

Whatever the level of sophistication desired, however, there is in all cases a great deal of routine clerical work involved in the compilation of an index. And in these days when authors at all levels are gratefully embracing the microcomputer and its word processing potential, it is only natural that they look to this source for help in compiling the index which will complete their work.

Let it be said at once that you cannot simply dump the completed text into the microprocessor and say: "Compile me an index" any more than you can say to your word processor: "Write me an article." A great deal of skilled input is necessary in both cases before anything of value can emerge at the far end. But there is much that the computer can do to eliminate the sheer mental and physical slog which is an inevitable part of the process.

Let us assume a typical contemporary author, one who now uses a computer instead of a typewriter and who has invested in equipment which will do a reasonably serious job. His system might consist of a machine with a substantial amount of dynamic memory, a couple of disk drives, a Basic interpreter and a printer. His word processing package will do the normal tasks of moving the cursor freely about the screen, doing the regular editorial tasks of inserting, block moving, global search and replace, and similar functions; it must also allow embedded commands for turning the printer on and off, for writing to disk, and for cancelling bi-directional printing, and it must also have among its system variables one which keeps track of the page number as it works through a document. If in addiction, our hypothetical author has a Basic interpreter and a sorting package, he will find an immediate and welcome use for them.

It must be said at the outset that although many of the word processing packages available at present have sections in their manuals on "Indexing," it is usually only on a very rudimentary level. In most cases if you follow their instructions, you get a simple word list with associated page references listed in page rather than alphabetical order. A really good index requires a good deal more input, both intellectual and programmatical, but the results are worth the effort.

In what follows I describe the measures I have taken to create indexes on my own system, but with a little imagination, the method I outline can be adapted to different machines and different word processing packages.

My system consists of a CP/M machine with 64K RAM and two double sided, double density disk drives with 720K capacity. For software I use Peachtext (the former Magic Wand), which I prefer for the elegance of its structure and the sophistication of its range, plus MBasic. Other extras I find useful for indexing are Micropro's SuperSort and a Basic interpreter.

Let's take a step-by-step look at the procedure you might follow to index a document of normal report length, or of chapter length in a book, or (with adequate memory and storage) of a book length work. I am assuming that the text exists on disk, since you will have composed and formatted and printed out your work using your word processing system. Step One

The first step is to set up an editing copy of the text, calling it INDEX1.DOC and keeping the original text intact. The original text file should then be kept in a safe place to obviate any chance of accidental corruption from the indexing process. Step Two

You would edit INDEX1.DOC as follows:

Embed at the head of the text the printer codes for turning off the printer, the disk, the bi-directional printing, and the printer form feed. In Peachtext, these are: PRINT OFF, DISK OFF, BI OFF, and FORMFEED OFF.

Select a pair of markers to identify those words or short phrases you wish to include in the index, one to go before the word(s) and the other to go after. A good pair are [and], as long as you have not used square brackets in their conventional role anywhere in the text. If you have, a different pair of markers must be chosen.

Work through the text inserting the "for and aft" markers around the selected words, e.g.:

". . . in the case of [Macbeth], the dramatic tensions is . . ."

". . . performed at the [Vaudeville Theatre] in 1891 . . ."

". . . the [world premiere] took place on . . ."

If you know in advance that certain concepts, names, or phrases appear repeatedly in the text, you can mark these items with a multiple search/replace command: e.g., globally replace Nietzsche with [Nietzsche] throughout the text with a single key stroke.

Because of the various processing stages which follow, it is best to limit the number of characters between markers to a maximum of 55.

When this editorial work has been completed, you will use the global search/replace capacity of the word processing program to replace the "for-and-aft" markers with the appropriate codes to turn the "print to disk" ability on and off; it is also at this stage that you call upon the ability of the word processing package to keep track of the page number as it reads through the text.

Because you are also striving to produce a disk file which can be processed by MBasic, it is necessary to ensure that there are no extraneous commas in the items enclosed between the markers to confound the later Basic programs, and to mark off each index item by a comma from its page reference.

Using the Peachtext conventions, therefore, the following substitutions are made:

[= NL,DISK ON
]=, %PAGE,NL,DISK OFF

In this instance, the NL commands are necessary because the DISK ON/DISK OFF commands require a preceding carriage return to operate. Note also the inserted comma before the command to print the page number. Step Three

Then, END and PRINT to disk the edited text. The embedded commands will ensure that the selected index items together with their page referneces are printed to disk in a file which the machine will automatically call INDEX1.PRN.

This file will, of course, be sorted only in page order. If that is adequate for your needs, the index can be edited, formatted, and printed out.

It will probably the necessary, however, to edit this file to remove the formatting codes the machine automatically builds into a .PRN file of this nature (e.g. form feeds), which might otherwise interfere with later Basic processing. Step Four

Normally, one expects an index to be arranged alphabetically. To re-order INDEX1.PRN with this in mind, you must either use a sort/merge package or devise an MBasic sorting program. The latter can usefully serve if the dimensions of the sort are not excessive, but a program like SuperSort, which is both quicker and more capacious, is most useful.

There is, however, at this stage a further possible snag. Some of the selected items on INDEX1.PRN will have initial capital letters and others will begin with lowercase, so that when the sorting program gets to work (using ASCII values) it groups the capitals first and the small initial letters separately. It may be that your sorting package has a built-in device for coping with this; if not, you must run the file through an intermediate Basic program which will change the initial lowercase letters into caps.

Listing 1 shows a Basic program which will process INDEX1.PRN to deal with this snag; the processed file, in which all the index items have now been awarded initial caps, is called INDEX2.PRN. Step Five

This latter file then needs to be sorted alphabetically. If you use SuperSort, the entries are: INPUT=60,CR-DEL SORT-FILE=INDEX2.PRN OUTPUT-FILE=INDEX3.PRN KEY=#1,55,ASCEND GO

If you want to build this into a more comprehensive SUBMIT file, which CP/M allows, one-line .COM entry can be entered: SORT INP=60,CR;SOR= INDEX2.PRN;OUT=INDEX3. PRN;K=#1,55,ASC;G Step Six

The resulting fiel INDEX3.PRN will then have all the index items, with, their relevant page numbers, arranged alphabetically, and the main slog of the work of indexing will have been accomplished. There still remains, however, some final editing and polishing to be done, and the degree of sophistication of the final index is in direct proportion to the amount of editing you are prepared to devote to this raw list.

For example, to create a first rate index, you should replace a crude entry such as "Shaw, 86,97" with "SHAW, George Bernard (1856-1950): 86,97." This can be done using the global search/replace function of your word processor. Or where there are multiple entries for one topic, you may want to tidy it up as follows: London 43 London 57 London 102 London 152 might be edited to read London 43, 57, 102, 152

Or, if you want to build in more detail, it might become: London theatres 43, 152 society 57 planning 102

The essential thing in all this is to remember that it is your job to program the machine to do what you believe to be necessary, rather than be content to accept what the machine does well mechanically but what by professional indexing standards is less than adequate.