Classic Computer Magazine Archive COMPUTE! ISSUE 50 / JULY 1984 / PAGE 40

The ABC's Of Data Bases

Charles Brannon, Program Editor

There are several "generic" applications for microcomputers. Electronic spreadsheets such as VisiCalc helped to spark the microcomputer revolution. Word processing has made many a computer purchase easily justified. And data base programs are now one of the hottest items on the market. The best-selling software, such as Lotus 1-2-3, incorporates all three of these applications. Although packages such as Lotus 1-2-3 (first sold for the IBM PC) have more sophistication and scope (as well as a much higher price tag) than similar programs on home computers, there's still a lot you can do with even a bargain-basement data base.

Just as spreadsheets have made financial analysis easier, faster, and more flexible, just as word processors have blurred the distinction between rough and final drafts, data bases can make all your record keeping simple, streamlined, and fast. But first you have to translate the concepts of manual record keeping to the computer's way of doing things. Since you're bound to encounter new vocabulary and principles as you convert to electronic record keeping, it's useful to have a background in these things as you're searching for the right package for your needs. First, a clarification: Sometimes the program that manages the data is called a data base. The set of all your data is also referred to as the data base. Context usually makes the meaning clear.

Files, Records, Fields

To understand some of the features of data base management, think of how records are organized if no computer is involved. Short items are usually stored on index cards, then shoved in a box. This whole box of cards would be called a file on a data base. Each card is called a record. Records are further subdivided into fields. Before you can enter any information, you have to set up or define the data base by entering the name, type, and length of each field. To illustrate these subdivisions of a file, we can look at a common type of data base, the mailing list. Each record in such a file would be an individual mailing label. And, within each record, the fields would probably be: name, address, city, state, and zip code.

Many data bases ask you to estimate the length and type of each field. This way, the data base program can tally up a total for the length of each record. The memory (computer RAM or disk drive) is then subdivided into records of that length. By contrast, other data bases will assume a fixed record length (usually the size of one disk sector), then let you divide the record up into fields. You still have to estimate the length and type of each field. Some data bases can modify the fields at any time, but many data bases can't. It's often wise to set aside some extra fields in case you later want to start including additional information in each record, like telephone numbers.

Field length is based on what kind of information it will be expected to contain. People's names will generally be less than 20 characters. An address can be longer, say 30 characters. Most cities can be spelled in under 15 characters. Using the official abbreviations, the state field takes only two characters.

Another aspect of fields is their type—Name, address, city, and state are all string or alphanumeric fields. The zip code, however, is always a number and so it could be assigned to a numeric field. Numeric fields store their numbers in a fixed number of bytes, in the computer's internal floating point format. What this means to you is that no matter how long or short the number is when written out (5.2 or 5,200,000), it will only require, say, five bytes to store. Advanced data bases have several other types. A field requiring a yes/no answer can be stored with only 1 bit (1 = yes, 0=no). You could pack eight yes/no answers in one byte. This is often called a boolean, bit, or binary field. Some data bases might support a byte field, which can hold only numbers in the range 0-255.

The key field is the primary field you use when accessing records. If the key field is name, you can look up any mailing label by the addressee's name. Many data bases will let you sort the entire file. You choose a key field to sort by. If you sort by zip code, the file will be ordered according to the zip code when printed out. Some data bases let you sort or search with multiple keys. For example, you could print out a list, alphabetized by name, of all addressees living in California. The sort would be keyed to the name field, and the printout would be selective by only printing and sorting those fields whose state field is CA.

A Range Of Features

Features vary from one data base to another. Their primary purpose is to let you store and retrieve records. But once you have a large data base, you should be able to manipulate and interrogate the data base, with all the speed and power that the computer can bring to bear. We've already mentioned sorting and printing. If you want to keep a mailing list, be sure the data base can print out mailing labels! You should also be able to remove (delete) a record once it is no longer needed. If you are manually keeping the list in a certain order, you want to be able to insert new records between existing ones. It's very handy to have a printout of just one field from every record. A directory function like this can usually print out the key field from each record for future reference. Some data bases permit you to perform math on numeric fields, even across the entire file. Others offer sophisticated report generation, where you can design a custom printout, complete with rows, columns, and calculations like totals and percentages.

You may want to be able to access your data base file from within other programs. To do this, a word processor, for example, must be compatible with the data base, so that information can be retrieved and inserted into the text held by the word processor. Or, if your data base can create files that are compatible with the word processor, you're in business. If this is important to you, try out both the word processor and the data base together before you buy either.

If you have a cassette system, you'll probably want to buy a memory-storage data base. These store the records in RAM. The advantage here is speed. You can sort and search for records much faster in memory than you can with a tape drive. At the end of the session, you save out the entire data base to tape. The disadvantage here is that this limits the amount of data which can be stored to the amount of RAM you have in your computer. Also, memory-based data bases often slow to a crawl when their memory is full.

If you have a lot of data to store, you'll probably need a disk-based data base. The records are stored directly on disk, and any record can be called up without reading through the entire file. You can usually use the whole disk for a single data base, or even link the data base to a second disk or disk drive. The disadvantage with a disk-based data base is the speed of disk access time, which is generally much slower than a memory data base.

Be critical of the data-entry mode. You'll be using that part of the data base more than any other as you type in all the data. Grade a program's entry mode in terms of how easy it is to learn, how easily you can edit and make changes, and how it reacts to errors you make. Does it check to make sure the field you've entered is of the correct type for that field? Does it warn you if you've typed too much for one field, or does it just chop off the extra characters?

Some data bases, like dBASE II, are so sophisticated and flexible that they are practically a programming language for data base management. Many people buy templates for them. Templates are like programs for the data base. The template sets up all the fields, and includes the search and calculation descriptions. A template can also control the kinds of printouts allowed. You don't need to be a programmer to use an advanced data base, but you do have to learn the commands and protocols of that particular program. But once you've set up a certain kind of file, a template could create replications of that file type automatically the next time you want to build a similar data base.