American Science Institute of Technology |
|
|
Organizing Control StructuresAlthough the control structures found in most modern languages trace their
roots back to Algol-60, there is a surprising number of subtle variations
between the control structures found in common programming languages in use
today. This paper will describe a mechanism to unify the control structures the
various programming languages use in an attempt to make it possible for a Visual
BASIC programmer to easily understand code written in Pascal or C++ as well as
make it possible for C++ programmers to read BASIC and Pascal programs, etc. if (expression) then begin
{ Statement 1 }
{ Statement 2 }
.
.
.
{ Statement n }
end;
if (expression) then
begin
{ Statement 1 }
{ Statement 2 }
.
.
.
{ Statement n }
end;
C/C++ programmers are even worse, there are no less than four common ways of
putting the opening and closing braces around a compound statement after an
"if". if (expression) then
if (expression) then
(* Statement *)
else (* Statement *);
To which "if" does the "else" belong? Of course, you've
always been taught that the else goes with the first un-elsed "if"
looking back in the file (i.e., the second "if" statement above). What
happens if you want it to go with the first one? What happens if there is a long
compound statement after the second "if" above and the else is far
removed from these two ifs? How easy is it to tell which if belongs to the else? if (expression) then
if (expression) then
{ Statement list}
endif;
else
{ Statement list};
endif;
Now there is no question that the else belongs to the first if above, not the
second. Note that this form of the if statement allows you to attach a list of
statements (between the if and else or if and endif) rather than a single or
compound statement. Furthermore, it totally eliminates the religious argument
concerning where to put the braces or the begin..end pair on the if. if..then..elseif..else..endif select..case..default..endselect (typical case/switch statement). while..endwhile repeat..until loop..endloop for..endfor break breakif continue Those who have had the opportunity to use these control structures for a
considerable amount of time generally recognize their superiority over the
Pascal/C/C++ variants. The biggest fault Pascal/C/C++ programmers tend to find
with these structures (other than they are different ) is that "Ada uses
these structures and Ada is a 'yucky' language." Hardly a scientific
assessment of the quality of these control constructs.
Most compilers implement multi-way selection statements (case/switch) using a jump table. This means that the order of the cases within the selection statement is usually irrelevant. Placing the statements in a particular order rarely improves performance. Since the order is usually irrelevant to the compiler, you should organize the cases so that they are easy to read. There are two common organizations that make sense: sorted (numerically or alphabetically) or by frequency (the most common cases first). Either organization is readable, sorting by frequency has the advantage of being faster if your compiler uses a brain-dead if..then.elseif..elseif... implementation of multi-way selection. One drawback to the second approach is that it is often difficult to predict which cases the program will execute most often.
There are three general categories of looping constructs available in common high-level languages- loops that test for termination at the beginning of the loop (e.g., while), loops that test for loop termination at the bottom of the loop (e.g., repeat..until), and those that test for loop termination in the middle of the loop (e.g., loop..endloop). It is possible simulate any one of these loops using any of the others. This is particularly trivial with the loop..endloop construct: /* Test for loop termination at beginning of LOOP..ENDLOOP */
loop
breakif (x==y);
.
.
.
endloop;
/* Test for loop termination in the middle of LOOP..ENDLOOP */
loop
.
.
.
breakif (x==y);
.
.
.
endloop;
/* Test for loop termination at the end of LOOP..ENDLOOP */
loop
.
.
.
breakif (x==y);
endloop;
Given the flexibility of the loop..endloop control structure, you might question why one would even burden a compiler with the other loop statements. However, using the appropriate looping structure makes a program far more readable, therefore, you should never use one type of loop when the situation demands another. If someone reading your code sees a loop..endloop construct, they may think it's okay to insert statements before or after the exit statement in the loop. If your algorithm truly depends on while..do or repeat..until semantics, the program may now malfunction.
Many languages provide a special case of the while loop that executes some number of times specified upon first encountering the loop (a definite loop rather than an indefinite loop). This is the "for" loop in most languages. Unfortunately, this iterative loop ranges from very simple (e.g., in Pascal) to extremely complex (e.g., Algol-68 and PL/I). The vast majority of the time a for loop sequences through a fixed range of value incrementing or decrementing the loop control variable by one. Therefore, most programmers automatically assume this is the way a for loop will operate until they take a closer look at the code. Since most programmers immediately expect this behavior, it makes sense to limit for loops to these semantics. If some other looping mechanism is desirable, you should use a while loop to implement it (since the for loop is just a special case of the while loop). There are other reasons behind this decision as well. Most compilers generate especially efficient code for standard for loops, while they tend to generate less than optimal code for "funny" versions of for loops. Hence there are efficiency considerations as well as readability reasons behind this choice.
Most people expect the execution of a loop to begin with the first statement at the top of the loop, therefore,
Likewise, most people expect a loop to have a single exit point, especially if it's a while or repeat..until loop. They will rarely look closely inside a loop body to determine if there are "break" statements within the loop once they find one exit point. Therefore,
Whenever a programmer sees an empty loop, the first thought is that something is missing. Worse yet, in languages like Pascal or C/C++ where you don't have a terminating ENDloop statement, it's easy to think that the next statement in the program is the body of the loop (worse yet, it's easy to forget the semicolon that marks the end of the loop and actually make the next statement in the program the loop's body). Therefore,
Even if the loop body is not empty, you should avoid side effects in a loop termination expression. When someone else reads your code and sees a loop body, they may skim right over the loop termination expression and start reading the code in the body of the loop. If the (correct) execution of the loop body depends upon the side effect, the reader may become confused since s/he did not notice the side effect earlier. The presence of side effects (that is, having the loop termination expression compute some other value beyond whether the loop should terminate or repeat) indicates that you're probably using the wrong control structure. Consider the following while loop from "C" that is easily corrected: while ( ( ch = getc(stdin)) != 'A')
{
<< statements >>
}
for(;;) /* C/C++'s infinite loop statement */
{
ch = getc(stdin);
if (ch != 'A') break;
<< statements >>
}
Like functions, loops should exhibit functional cohesion. That is, the loop should accomplish exactly one thing. It's very tempting to initialize two separate arrays in the same loop. You have to ask yourself, though, "what do you really accomplish by this?" You save about four machine instructions on each loop iteration, that's what. That rarely accounts for much. Furthermore, now the operations on those two arrays are tied together, you cannot change the size of one without changing the size of the other. Finally, someone reading your code has to remember two things the loop is doing rather than one.
Programs are much easier to read if you read them from left to right, top to bottom (beginning to end). Programs that jump around quite a bit are much harder to read. Of course, the goto statement is well-known for its ability to scramble the logical flow of a program, but you can produce equally hard to read code using other, structured, statements in a language. For example, a deeply nested set of if statements, some with and some without else clauses, can be very difficult to follow because of the number of possible places the code can transfer depending upon the result of several different boolean expressions.
ExpressionsFew things look so similar between different languages yet act so different
as arithmetic expressions. Between various languages the precedence of operators
is different, the associativity of operators is different, even the operation
computed is often different. It goes without saying that different languages
often use the same symbol for different operations and, likewise, use different
symbols for the same operation. This creates a problem with a coding standard if
the intent is to allow a Visual BASIC programmer to easily read a program
written in C/C++ or Pascal. Although there are many issues that a coding
standard cannot practically resolve, some standards can improve the situation.
As long as two adjacent operators in an expression belong to two different classes above, you can skip using parentheses. You can assume that addition, subtraction, multiplication, remainder and division are left associative. Therefore, if there are two adjacent operators are addition and subtraction, or multiplication, remainder, or division, then you can skip the parentheses. In all other cases, you must supply parentheses to explicitly state the precedence.
Some language use short-circuit evaluation, some use full evaluation of expressions. If your program uses and depends upon short-circuit evaluation, you will comment this fact next to each expression that requires short-circuit evaluation.
In most languages it is possible to produce side effects within an expression. You can accomplish this, for example, by passing a parameter by reference to a function or if the function modifies global variables. Since most languages give the compiler writer leeway with respect to the order of evaluation of expressions, you should never use a variable whose value is modified as a side effect of a function or operator within that expression (e.g., in C/C++ consider the statement "Y = X + Y + ++X;"). Even if you're sure the result will be correct, such code would be very difficult to understand.
There are some obvious exceptions to the rule above. The whole purpose of some operators and functions is to produce a side effect. Examples include the "++" and "--" operators in C/C++ and any of the various assignment operators. A stronger rule to allow for this might be
Never execute an expression solely for the side effects it produces. Programmers generally expect the value of an expression to carry some significance; they feel there would be no need to compute the value of an expression if that value were of no importance. If all you need are the side effects, find some other way to achieve those side effects. Example: What does the following C statement do? (This came out of a real program on the net.) *s++ || *s++ || *s++ || *s++ || s++;
There are some mechanical issues regarding expressions that can make them easier to read. The following rules and guidelines document these issues:
-x *p !b /* from C/C++ */
x = *p + a / b;
recname.field recptr->field ary[ i ]
proc( parm1, parm2, parm3); procedure PascalProc( i:integer; b:boolean );
x := f( x + 2 * a[ i, j ] ); Some languages (C/C++ and Algol-68 come to mind) have a tremendous number of
operators. Some of them are quite arcane and have no counterpart in other
languages (when was the last time you used ">>=" or
"->*" ?). If an alternative is available, you should avoid using
assignments within expressions and other lesser-used operators. |
|
Send mail to
webmaster@amscitech.com
with questions or comments about this web
site.
|