Classic Computer Magazine Archive ST-Log ISSUE 36 / OCTOBER 1989 / PAGE 14

FEATURE

TOOLS FOR YOUR C CHEST

BY KARL E. WIEGERS

One aspect of C is that the function libraries supplied with different C compilers are not all created equal. Some libraries contain functions not included in others.

When you begin programming in a new language, you look for commands and functions to perform the kinds of operations you've become accustomed to in other languages. This helps ease the pain of transition into the new language. However, not all high-level languages are created equal. The absence of familiar and useful commands is highly frustrating to someone struggling to work with a new language.

I encountered this frustration when I began programming in C not long ago. If you come from a BASIC background, you're used to all sorts of functions for performing character-string manipulations, like LEFT$, RIGHT$ and MID$. I do a lot of programming in REXX, a nice interpreted language available on IBM mainframe computers. REXX also has a wide variety of string-handling functions, including many not available in even the finest BASICs. Face it: I was spoiled by the versatility of REXX when I began coding in C. What's a boy to do?

The obvious solution is to build some tools by writing C functions to duplicate the REXX and BASIC string-handling functions I know and love. Actually, the process of tool-building is a good way to learn a new language, although it is not without pain. To help you avoid the same kind of pain (although most of you have never heard of REXX), in this article I present C versions of seven of my favorite REXX-type functions (COPIES, LEFT, RIGHT, SUBSTR, WORDS, WORD and OVERLAY), plus another function that's used by my new RIGHT function (STRREV).

The New Functions

The names of these functions are pretty much self-explanatory, but I'll give you brief descriptions here. The source listings are extensively commented in case you want to see how it's all done.

The function copies( ) takes two arguments, a pointer to a character string and an integer. It returns a pointer to a string of concatenated copies of the supplied input string, with the integer argument specifying the number of copies to make. For example, if you called copies( ) with arguments of "Now" and 4, the string created would be "NowNowNowNow." The original string is not changed.

The function left( ) returns a pointer to a character string containing the leftmost part of a supplied input string. Its arguments are a pointer to the input string and the number of characters from the left to keep. If you ask for more characters than are in the original string, the output string is padded on the right to the desired length with blanks. The call left("Now hear this," 6) will return "Now he." The original string is not changed.

The function right( ) does the same thing, but for the right part of a string. The call right("Now hear this" 6) returns "r this." This function can be used to right-justify a string in a field longer than the original string since padding blanks are placed on the left. The original string is not changed.

The function strrev( ) reverses the characters in a supplied input string; the original string is lost in the process (the function could easily be changed so this doesn't happen, if you like). This function is supplied as a library function with some C compilers.

The function substr( ) performs like the MID$ function in BASIC, returning a pointer to a substring of a supplied input string, beginning at some character position and extending for a specified number of characters. The call substr("Now hear this," 3, 5) returns "w hea." The original string is not changed.

The function words( ) counts the number of blank-delimited words in a supplied input string and returns a short integer. The call words ("Now hear this") will return the value three.

The function word( ) returns a pointer to a character string containing a specified word in a supplied input string. The call word ("Now hear this," 2) will return "hear." The original string is not changed.

Finally, the function overlay( ) is a bit more complex. It overlays one character string (the guest) upon another (the host), beginning in a specified character position in the host string and extending for a specified number of characters of the guest string. If the guest string is shorter than the requested length, it is padded with a supplied pad character. The call overlay ("see," "Now hear this," 5, 4, "K") would return "Now seeK this." Again, the original strings are not changed.

Now let's look at some aspects of the C language that influenced why and how these functions came into existence.

Some Pointers on C Strings

C does not have a character string data type as such. Instead, a string is represented by a one-dimensional array of single-byte characters terminated with a null character (shown as \0). Array manipulations in C often are most easily handled by using pointers; this is certainly true of the character-string operations used in these functions. A pointer to a character string can be initialized by assigning some string constant (any sequence of characters enclosed in double quotes) to the pointer variable. Here's a sample variable definition:

char *charptr = "Some literal string";

You can also define a pointer to a character string by allocating the desired number of bytes using a function like malloc( ) or calloc( ). Be sure to leave room for the extra null character at the end of the string and be sure that the value returned from malloc( ) is in the form of a pointer to a character. Here's an example of how to allocate storage for a ten-byte character string using a pointer variable called charptr:

char *charptr;
charptr = (char *) malloc (11);

Note that I've performed a cast to make sure that the value returned from malloc( ) is of type pointer to charptr. This cast may not be required with all C compilers, but it is with Laser C.

Another aspect of C is that the function libraries supplied with different C compilers are not all created equal. Some libraries contain functions not included in others. For example, a common C function is strtok( ), which tokenizes (breaks apart) a character string based on specified delimiter characters. If the delimiter is a blank character, strtok( ) provides an easy way to separate a string into its individual words. The function strrev( ), which reverses the characters in a string, is also found in some, but not all compiler libraries. I used both strtok( ) and strrev( ) in my initial versions of some of these string-handling functions.

The listings included in this article were written for the Laser C compiler. Unfortunately, the Laser C library contains neither strtok( ) nor strrev( ). Hence, I wound up writing my own strrev( ). I also wrote new word( ) and words( ) functions that didn't require strtok( ) at all. So much for the vaunted portability of C programs. If you do end up programming for multiple systems, check the function libraries carefully beforehand to avoid compiler-specific code.

Notes on Null

In C programs, null often is returned by a function to indicate that an error has taken place. For example, in these functions I'm returning a null if an attempt to allocate memory fails or if some other error condition is encountered during processing. The calling program is responsible for testing the value returned by a function and handling nulls appropriately.

The symbol "NULL" usually is defined in a header file with a value of zero, like this: #define NULL 0. If a numeric result is returned, there's no problem. But most of these functions return a value of type pointer to character. What happens when you try to print something at address zero? Well, the printf() function in some C compilers is smart enough to print the literal value "NULL", or just a backslash character, which clearly shows that something evil has happened. Not so with Laser C. Instead, you get an addressing (or bus) exception, which shows up as two bombs. Not fatal (as in reboot the computer), but definitely unhealthy for the program.

The examples in listing CHARTEST.C are carefully crafted to avoid returning a null. However, it is safer to test the value returned from each function for null before using it, something like this:

char *string = "Sample string";
char *answer;
answer = strrev ( string );
if ( answer == NULL )
   printf ( "Something horrible has happened." );
else printf ( "Reversed string is; Xs\n", answer);

Sample Programs

Two source code listings accompany this article. CHARTEST.C (Listing 1) is a demo program that exercises the eight string-manipulation functions. It illustrates the syntax and the results obtained, with several examples for each function.

The second program listing contains the source code for all eight of our new functions, all strung together. This file exists as CHARFUNC.C on your ST-LOG disk. In practice, you don't want to simply compile this entire file and link-in the resulting object file with your own program. That way you might be including object code for unused functions, thereby making your .PRG file larger to no useful purpose. You're better off including just the object files you need.

Laser C provides a convenient way to manage a collection of your own functions like this through its archiver/librarian utility called AR.TTP.AR.TTP lets you combine several object files (or any other sort of files) into a single archive file, which has an extension of .A. You can execute the AR.TTP program from within the Laser C shell by simply typing the command you want into an empty window, highlighting the entire command with the mouse and pressing Control-Return. Here's a typical command line:

F:\UTILS\AR.TTP RV \REXXOID\CHARFUNC.A\REXXOID\STRREV.0 \REXXOID\RIGHT.0

This sequence assumes that the AR.TTP file is in folder UTILS on Drive F:. The R function says that I want to replace or add a file to the archive file, and the V (verbose) function tells me what is going on as the operation progresses. The name of the archive file is CHARFUNC.A, and it's in folder \REXXOID. I'm adding files STRREV.O and RIGHT.O to this archive file, both of which also are in folder /REXXOID.

It takes a bit of getting used to, but the archive function can really cut down on file clutter. It also simplifies the linking process, as you can specify archive files in the list of object files to link together. Only the required members of the archive file will be extracted and used during linking.

One Last Thing

While testing these functions from within the Laser C shell environment, sometimes I encountered execution errors, like the dreaded exception 02 (equal to two bombs from the GEM desktop). Fortunately, Laser C provides a graceful exit from most such failures. I'd wrack my brain trying to find out what was wrong, but to no avail.

Imagine my relief when the very same program ran just fine from the desktop, although it failed in the shell. I guess the shell can't completely simulate the real-world GEM environment your program may encounter. The moral is to test all programs from the desktop as well as the shell before concluding that they are either right or wrong.

Karl Wiegers, Ph.D., spent the 70s learning how to be an organic chemist, then spent the 80s wrestling with computers. He is now a software engineer in the Eastman Kodak Company Photographic Research Labs. He hasn't selected a career for the 90s yet.