C FOR SPEED!
Techniques To Transform Your Turtle Code
BY JOHN ALLEN
The C programming language is famous for its ability to produce speedy and efficient code, but it still can't beat assembly, right? Wrong! If you optimize your code using these techniques, you can get assembly-fast results without losing the power of C.
I have three categories for measuring the speed of a computer program: the first one is the programming time, the time it takes you to write the code; the second is the time it takes to modify the program if the user's requirements change; and the third is the actual time required for the program to execute. Many of the programmer's decisions speed up one category at the expense of another. For example, you can save time by not commenting the program, but that slows down subsequent modifications. If, on the other hand, you spend enormous quantities of time documenting a program and squeezing every cycle out of it, you may not finish before your competitor does. Obviously, you must balances your efforts.
Competing With Assembly
Because they dictate every detail of execution, the very best assembly-language
programmers will always produce faster-than-C code. However, good programming
techniques combined with exploitation of the power of the C language enables
you to write code that is easy to understand and modify, yet runs as fast
as stuff written by all but the most experienced assembly programmers.
They may have the edge on execution time, but you'll probably spend much
less time in the development stage than they do.
There are two phases of code optimization. The first is to eliminate unnecessary run-time cycles by using efficient algorithms. The second is to accomplish each of the necessary operations in as little time as possible without sacrificing too much development time or code clarity.
Read your favorite C
reference book and
familiarize yourself
with
the #define macros.
Watch Your Algorithms
The algorithmic methods of code optimization can be used in any language,
and you probably already use them to some extent. Here's an example:
for (i=0; i<(1000; i++){
a=b+c;
x[i]= a*i;
}
The values stored in the variables b and c do not change, therefore
the value stored in the variable a does not change after the first iteration.
Consequently, the variable a only needs to be set once:
a = b+c;
for (i=0; i<1000; i+ +){
x[i]=a*i;
}
Be careful not to use unnecessary variables. For example:
j=0;
for (i=0; <1000; i++){
x[i]=i;
y[i]=j++;
}
Why increment two variables each loop iteration when one will do? Try
this instead:
for (i=0;i<1000; i++)
x[i]=i;
y[i]=i
}
Another technique is to eliminate common sub-expressions, for example:
a =(x*y)+1;
b = 2*(x*y)+3;
c = a+b+(x*y);
See how x and y are multiplied each time. Looking up the value of a
variable is faster than re-multiplying!
Temp1 = x*y;
a = templ+1;
b = 2*temp1+3;
c = a+b+temp1;
Also, beware of using variables which are really constants.
a = 1;
b = c+a;
d = a+1;
Use a constant instead.
a = 1;
b = c+l;
d=2;
This rule also applies for the following:
a = 1+2;
which, of course should he
a = 3;
Most compilers will catch this simple case, but you might as well do it yourself, just in case the compiler doesn't.
Another thing to look for is code like this:
#define MAXSIZE 5
a = MAXSIZE+b-1;
It would be faster to do:
#define MAXSIZE
5
#define MAXSIZEMINUSONE 4
a = MAXSIZEMINUSONE+b;
However, this makes it harder to modify later, because if MAXSIZE is changed you also must change MAXSIZEMINUSONE. It would probably not he a good trade-off if that line is executed only once, but if it is inside a loop that exeutes 10,000 times, it can really make a difference.
Unrolling Loops
If you are trying to squeeze every last ounce of speed, you can unroll
your loops, but this may cost you development or modification time if the
number of times the loop is executed has to change. Also, it is more difficult
to do this when the loop is controlled by user input. Here is an example
of unrolling:
for (i=0; i<3; i++)
a[i] =i
becomes
a[0]=0;
a[1]= 1;
a[2] = 2;
This eliminates the increment for each loop iteration, and substitutes an explicit operation for the memory accesses on the variable i.
If unrolling of loops is not practical, perhaps you can combine two
or more loops like this:
for (i=0; i<l000; i++)
a[i] =i
for (i=0; i<1000; i++)
b[i]=i;
into a single loop:
for (i=0; <1000; i++)
a[i] = b[i]=i;
You just eliminated one increment and one test instruction for each loop iteration,
All of the optimizations I have shown so far are programming techniques that, for the most part, work in any language. Now we will explore two C-specific techniques, which may also he adapted, with some effort, to other languages.
Using Macros
In many cases a piece of code is broken up into functions to eliminate
typing duplicate code and to make things easily modifiable and readable.
However, the use of macros will do the same thing, and, since they arc
resolved at compile time, they require absolutely no run-time overhead!
There is one thing to remember about using #define macros. When passing
arguments to macros, whatever you give as an argument will be literally
replaced when the macro is expanded. For example:
#define squared(x) (x*x)
a = 2;
b = squared(a++);
is replaced with
a = 2;
b = a++*a++;
This may not be what you intended!
The most common use of macros is single-line macros, but C does not
limit you to that. Lines that end with the backslash character (\) are
combined with the next line, then both the backslash character and the
following line feed are deleted during pre-processing:
#define test_macro(a,b,c,array) \
inti;
\
for (i=0; i<c; i++)
\
a+ = array[i,b];
\
will become
#define test_macro(a,b,c,array) int i for (i=0; i<c; i++) = array[i,b];
Remember that macros are substituted directly into the source file by the pre-processor. You cannot call a macro from itself, or from another macro that it calls, because recursion is not possible. Read your favorite C reference book a couple of times to familiarize yourself with #define macros. Once you understand how macro substitution works, you can avoid the run-time overhead of some function calls.
Shifty Bits
Another way to speed up execution is to take a trick from assembly
programmers and fake your powers-of-two division and multiplication:
5*2 = 10 is, in binary, 0101* 0010 = 1010;
12/2 = 6 is, in binary, 1100 / 0010 = 0110.
Notice that 10 in binary (1010) is just five shifted to the left by
one bit (010l). Also, six is just 12 shifted to the right by one bit. A
shift operation is very fast compared to mathematical operations such as
multiplication and division. of course, this only works for powers of two,
but it :an still help considerably. Here is a straightforward way to divide
or multiply integers by two:
#define divide_2(a) (a>>1)
#define mult_2(a) (a<<1)
The first technique is
to
use efficient algorithms.
For division or multiplication by four it would be:
#define divide_4(a) (a>>2)
#define mult_4(a)
(a<<2)
In a more general fashion, you can write macros to divide or multiply
by powers of two, where the first argument is the number and the second
is the divisor (2. 4, 8, etc.):
#define divide_2p(a,p) (a>>p)
#define mulL_2p(a,p) (a<<p)
If you divide an odd integer such as five by two using mathematical division, you will get the real number 2.5. If you use the shifting method you will get the integer two; the remainder is lost. This is the same as the DIV operation in Pascal. Remember that shifting numbers in lieu of multiplication or division only works for integers, not for floating point numbers.
Power Tools
These tools can he very powerful when used properly but like all tools,
they are only as efficient as the person using them. Being efficient takes
some skill and a lot of practice - so practice! This is certainly not a
complete discussion on how to optimize programs; the art of programming
is too complex to allow such a thing in one article. But that same complexity
leaves a lot of room for improvement, and I hope that the presentation
of these techniques encourages you to further develop your own skills.
John Allen started programming in C when he got an original 520ST with TOS-on-disk back in 1985.