Classic Computer Magazine Archive COMPUTE! ISSUE 146 / NOVEMBER 1992 / PAGE 6

Easy databases. (data base management systems) (includes related articles and product listing)
by Tom Campbell

Every day, we're engulfed with important details to sort and file away. Phone numbers, appointments, seminars, records, and new ideas flood us. It doesn't matter whether we're small business owners, PTA leaders, club and co-op members, or coin collectors; the Information Age has touched us all. There's so much to keep up with that it can turn the most energetic, organized, dedicated newcomer into a tired, frustrated cynic in no time--that is, unless you have a database management system (DBMS) looking after things for you.

No Reason for Fear

Don't let the word database scare you away. Database managers are easier to use than ever, and if you can draw a picture onscreen, you can use just about any of the popular DBMSs on the market today. In fact, some of them are downright fun to play with. True, a few years ago database managers were forbidding and dull, but now they're one of the most competitive arenas in the software world. Software developers have learned that it takes more than horsepower to bring new users into the fold.

In fact, you may be using a DBMS already. If you run Windows and use Cardfile as a phone book/dialer, you're using a simple DBMS. Consider the list of BBS phone numbers and modem settings in your communications program--that's a DBMS. Spreadsheet jockeys who know their way around @HLOOKUP and @VLOOKUP are using the database portion of their spreadsheets (remember that Lotus 1-2-3 originally was advertised as a combination plate: spreadsheet, database, graphics, Coke, and fries). And if you've ever battled with your word processor's mail merge, you've been dealing with a DBMS, too. (Do you end a secondary merge field with {END MERGE} or OR...?)

Database Details

Broadly speaking, a DBMS is any program that stores information by category and lets you get at that information in an orderly manner. Parts inventories, mailing lists, accounting systems, and order retrieval systems are all databases and require a DBMS. The categories are normally called fields or columns. The group of fields (for example, last name, first name, street address, ZIP code, and so on) is called a database or a table, as illustrated in the accompanying figure.

Ever notice how Microsoft registration cards are divided into boxes, one per letter, like this?

Last name:!C!A!M!P!E!L!L!!!!!!!

Microsoft asks you to print and to employ a couple of odd conventions (the tail of the Q points up instead of down) for its optical character recognition system, which automatically translates your printing into computer-readable format. In fact, systems like this can also link directly to a database manager. You're limited to a certain number of character boxes because of the limitations of database managers; most of them restrict the amount of information that can be stored in records or fields. Each customer record is given only 200 bytes (or 2000), and the fields within are also stored in fixed sizes (30 characters for the last name, 25 for the city name, and so on). Of course, the person designing the database determines those sizes.

The reason a DBMS imposes this rather severe limitation is speed--a recurrent theme in database management systems. Without being able to assume that records are of equal size, a DBMS would have no choice but to search through each record until it found the one it was looking for. It's much, much faster for a DBMS to know that, in a database with 120-byte records, record 100 is at position 12000 in the file. Both hard and floppy drives are programmed to be able to position anywhere on the disk in a very short time.

This speed obviously comes at the cost of disk space. Most American surnames fit into 10 or 15 characters, but foreign and hyphenated names can be much longer. Do you want to alienate your customers by starting letters to Mrs. Adamkiewicz-Stanislaw with "Dear Mrs. Adamkiewic" because you can't afford the 50-percent slack such generosity would cost with your 20MB database, or do you want to eat the disk space and shell out $600 for a new drive? Most databases for small businesses, clubs, or home use turn out to be of an easily manageable size. If you have 100 people in your community band and each record uses 200 bytes, the database will be less than 20K in size--easily small enough to fit on even the humblest floppy disk. If you run a stationery store with 6000 customers on your mailing list for offices in the surrounding urban areas and each customer record fits in 220 bytes, your database will weigh in at only 1.3 megs and will still back up to a single high-density 3 1/2-inch disk.

Many DBMSs that use fixed-length records store only one record in RAM at a time, leaving the rest on disk, so the database capacity is limited by disk space instead of RAM. This contrasts to word processors, which are often limited to available RAM or, worse, the 64K-segment limitation of the 15-year-old 8088 microprocessor that came with the orginal IBM PC.

Most database managers allow the information they store to be restricted. For example, a numeric field allows for storage only of digits, plus signs, minus signs, and decimal points. A character field can store any information that can be printed--letters, numbers, punctuation marks, whatever. A date field contains only allowable numbers for months (1-12), days (1-31), and years (typically 1900 and up). A logical field can only store T for Ture and F for False (or sometimes Y for Yes and N for No).

Restricting the kind of information that can be entered into a field confers several advantages to the DBMS user. First, most databases automatically force the user to input only that type of data, so an absent-minded data entry person won't be able to enter ZIP code (numeric) into the state field (character). Second, it speeds indexing.

An index is a copy of one or more key fields (fields you want to sort by, for example, last name or ZIP code) that's designed for very fast access. If you index a file by its ZIP code, the database manager can create a separate index file with a copy of the ZIP code for each record stored in a quick-access format. Computers can sort numbers faster than letters, so the indexing mechanism can take advantage of that trait and convert its copy of the ZIP codes, which are stored as printable characters in your record, into binary values for the index file. That way, printing your customer list by ZIP code will be sped up enormously, perhaps a hundred or a thousand times.

The designers of dBASE III Plus recognized that some databases would do well with the ability to store free-form information, at the cost of limiting how that information could be searched. For example, if you want to store a log of a client's dealings with your customer service department, you'll quickly realize that most records leave this field empty, but the ones for which it is used can easily spill over your database's maximum record size of 4000 characters. The answer to this problem is memo fields, pioneered by dBASE.

Memo fields take up ten bytes per record no matter what. That's the bad news. The good news is that those ten bytes are used to refer to a location in a separate memo file that can store up to 64K of information per memo field. (FoxPro and some other dBASE-compatible databases manage to allot even larger memo fields.) Memos can't be sorted or used in indexes, because indexes copy the contents of each indexed field and they can't be searched easily--but when you need 'em, you need 'em bad. Many database managers that claim to be dBASE compatible don't support memo fields, whereas others, such as FoxPro and Clipper, go dBASE one better by offering beefed-up support for memo fields. Fox for the Mac, for example, uses memo fields to store MacPaint images!

Family Relations

There are two kinds of fixed-length DBMSs (that term includes database managers that use memo fields): flatfile and relational. A relational DBMS is one that lets you use several databases at once, connected by common fields. An example is a customer list that's related to accounts receivable by customer ID. Scroll through the customer list, and you'll see how much each customer owes you because the DBMS knows to look up each account as you scroll. (The inventor of relational database design theory, Edward F. Codd, rightly disputes this simplified example and wrote an entire book explaining what relational databases really are, but we're using the popular definition here for the sake of discussion.) Database managers without this ability are known as flat-file managers, because they use only one database at a time.

In general, relational database managers are much more capable than flat-file managers and by definition can do everything flat-file managers can, but they're usually more expensive and more difficult to learn.

Indy Car or Family Sedan?

A DBMS that uses fixed-length records isn't the only game in town, of course. You might think of it as the family sedan of databases, which can do just about any task you can throw at it with varying degrees of success--big capacity, reasonably fast sorting and indexing, pretty flexible, and nicely paired with the performance characteristics of a disk drive. Three other varieties of DBMSs exist--network, hierarchical, and object-oriented, but they aren't popular on PCs. A fifth variety, free-firm databases, has been around for some time and occupies a small but solid niche in the PC world. Lotus Agenda and AskSam are well-known examples of free-form databases.

While you can make free-form databases look like traditional row-and-column database managers, why bother? Their true power is precisely in not forcing you to such a narrow organizational paradigm. Instead, you're encouraged to make up freestyle associations; Agenda knows that the phrase next Wednesday is the same as December 9 and can spot other such connections without your explicitly asking. Free-form DBMSs usually store all information in RAM, so they can be sharply limited in the amount of information they store.

Two hybrid DBMSs are HyperPad, a sort of character-mode HyperCard for DOS, and ToolBook, a Windows application that also looks like HyperCard. Fields aren't typed, essentially doing the job of both character and memo fields, and they may contain up to 32,000 characters. But nonetheless, they can be sorted and searched, and they don't waste disk space. Each comes with a wonderfully rich programming language and terrific sample applications that you're encouraged to modify for your own use. HyperPad boasts much snappier operation and works well even on the lowliest 8088 DOS machine. ToolBook is unpleasant on anything less than a 386 with four megs running at 33 MHz. If you plan to sort a 10,000-name mailing list in ToolBook, plan to leave your computer running overnight, but that's not really what ToolBook is for. If you want to prototype a Windows application or put together a visual database fast, ToolBook is without peer. Both HyperPad and ToolBook are stable, mature products.

Gone Fishing

With all the options available, you're sure to find a DBMS that suits your style and needs. The most difficult challenge database managers pose may well be choosing the best one for your needs. If you find this to be the case, consider setting up an appointment with a computer consultant, who will evaluate your habits and system capabilities. But don't delay. Information keeps pouring into your life; the sooner you get it managed, the sooner you can take that vacation you've been putting off.