How To Create A Data Filing System
Part I. Choosing The Right File Type
Jim Fowler
It's always a good idea to analyze your data storage problems and plan your solution carefully before you start programming. This article begins a four-part series on writing a data file/retrieval system for any computer.
Remember how your disk drive was going to solve your data storage problems? All those address cards, recipe files, inventories, and accounts were somehow going to become organized and never frustrate you again. It can happen, but you will have to do some thinking about the problem before you solve it satisfactorily.
Of course, the commercial data base systems can serve you very well, but you ought to know something about such systems before you spend money for features you may never use. It is not impossibly hard to write a program that does everything you want. I have lived through a couple of such projects so maybe I can point out areas to think about and where you might take a wrong turn.
Planning is the name of the game. I can recommend writing your own system. If you like programming, you can easily develop a system that fits your needs, and you will know it well enough to alter it when you need to make changes. One thing to keep in mind as you plan is that once you begin to put the data on a disk you are, in a sense, a hostage to your own work. The more time you have spent typing in the data, the more reluctant you will be to start over. So plan ahead.
Another bit of advice – automation is not automatically a good thing. If you have a recipe card file with the cards filed under a few headings ("salads,""desserts," "meats," etc.) and if there are only 30 or so cards in each section, you can probably find the one you want faster by flipping through the cards by hand. I remind you of that eternal verity. "If it works, don't fix it."
Pick Your Goals
The first step is to draw up a list of what you want. Actually write down what you hope a session with the file would be like: you turn on the computer, insert the disk, sit down to the keyboard, then what? Do you want a long list printed out (address labels?) or are you going to look for a needle in a haystack, such as the one record with exactly the right data to match your needs? It is well worth writing such scenarios several times on different days.
Another important consideration is flexibility. Whenever you are faced with a choice, always pick the one that gives you the greatest future flexibility. Of course, most of your choices will be made for you by the necessities of your data, your hardware, its operating system, etc. But keep flexibility in mind. This applies to every feature of your system – the number of records you expect to store, the amount of information in each, the "keys" you might use to retrieve records, and so on.
The key for an address file which is organized alphabetically by last names would be the last names of each entry. The key allows for quick searches and for sorting and entering new items into the proper order.
Finally, go to great lengths to make your system easy to use. It is so tempting to short-cut some tedious programming by saying to yourself, "Oh well, I can always remember that hitting RETURN without any input will drop me out of the program. After all, I've been running this machine for awhile, and I don't make that mistake any more."
The important thing about data file systems is that you enter and retrieve records hundreds of times. A small stone in your shoe is no big deal if you are sitting down, but walk a few miles and see how important it gets! A small annoyance in a program is tolerable if you only encounter it once in a while, but in a data entry or retrieval operation it can doom the whole system. Many a card file has been restored to active duty because, for reasons like these, its owner got fed up with automation. So, be prepared to go to great lengths to make life easy for the user.
The three types of files. Record 1 is shown as if it is only half as long as Record 2.The Three Kinds Of Files
There are three kinds of disk files. The first is one you probably already know, a Sequential File. All the data is strung together head to tail and put on the disk that way. Your programs are recorded on tape or disk in a sequential file. If you use a sequential file, you will need to put separators (called delimiters) of some kind between items of data so that you know where one ends and the next begins.
One problem with sequential files arises when you want to change a record and the new one is of a different length. It is like putting books on a shelf: take out a thin one and put a fat one in its place – you'll have to move all the rest to make room. If you rarely make any changes, it might be worthwhile just erasing the old record by filling it with blanks and adding the new version at the end.
The second kind is a Relative File. This is like a series of pigeon holes. One may be filled, another partially empty, but you do not have to move them to make room when you enlarge a record. As long as each hole is big enough to take the biggest record, you have no problem. This is the kind I use for my most complex data file.
The third kind is a sequential file, but with a "Table of Contents" like the directory on a disk. Call it a Hybrid File. To use this kind takes a lot of programming. I cannot recommend it unless the saving in space is much greater than the space taken by the extra programming and the table. Only big professional systems are likely to go this route.
The figure diagrams the three file types. If your disk operating system supports relative files (also called random-access files), you will probably want to use that kind unless you are going to be very short of space on the disk. If your system doesn't automatically support relative files, you can make your program do it. Keep a table or use a formula which turns a record number into its "address" on the disk – its track and sector. Then you read or write a record directly by track and sector. This is a bit complicated, but worth doing.
Next month, we will look at methods of retrieval and how they can affect the way you keep records.