American Science Institute of Technology  

 

   Control Structures
Home Up Feedback Legal News

 

 

 

Organizing Control Structures

Although the control structures found in most modern languages trace their roots back to Algol-60, there is a surprising number of subtle variations between the control structures found in common programming languages in use today. This paper will describe a mechanism to unify the control structures the various programming languages use in an attempt to make it possible for a Visual BASIC programmer to easily understand code written in Pascal or C++ as well as make it possible for C++ programmers to read BASIC and Pascal programs, etc.

Typical programming languages contain eight flow-of-control statements: two conditional selection statements (if..then..else and case/switch), four loops (while, repeat..until/do..while, for, and loop), a program unit invocation (i.e., procedure call), and a sequence. There are other less common control structures include processes/coroutines, foreach loops (iterators), and generators, but this paper will focus only on the more common control mechanisms.

Control structures typically come in two forms: those that act on a single statement as an operand and those that act on a sequence of statements. For example, the if..then statement in Pascal operates on a single statement:

if (expression) then Single_Statement;

Of course it is possible to apply Pascal's if statement to a list of statements, but that involves creating a compound statement using a begin..end pair. There are two problems with this type of statement. First of all, it introduces the problem of where you are supposed to put the begin and end in a well-formatted program. This is a very controversial issue with large numbers of programmers in different camps. Some feel an if with a compound statement should look like this:

        if (expression) then begin

                { Statement 1 }
                { Statement 2 }
                        .
                        .
                        .
                { Statement n }

        end;

Others feel it should look like this:

        if (expression) then 
        begin

                { Statement 1 }
                { Statement 2 }
                        .
                        .
                        .
                { Statement n }

        end;

C/C++ programmers are even worse, there are no less than four common ways of putting the opening and closing braces around a compound statement after an "if".

The second problem with C/C++'s and Pascal's "if" statements is the ambiguity involved. Consider the following Pascal code:

        if (expression) then
            if (expression) then
                (* Statement *)

           else (* Statement *);



To which "if" does the "else" belong? Of course, you've always been taught that the else goes with the first un-elsed "if" looking back in the file (i.e., the second "if" statement above). What happens if you want it to go with the first one? What happens if there is a long compound statement after the second "if" above and the else is far removed from these two ifs? How easy is it to tell which if belongs to the else?

Modern programming languages (Modula-2, Ada, Visual BASIC, FORTRAN 90, etc.) avoid this problem altogether by using control structures that begin and end with a reserved word, for example, IF and ENDIF. The code above, in one of these languages would look something like:

        if (expression) then

                if (expression) then
                        { Statement list}
                endif;

        else 
                { Statement list};
        endif;

Now there is no question that the else belongs to the first if above, not the second. Note that this form of the if statement allows you to attach a list of statements (between the if and else or if and endif) rather than a single or compound statement. Furthermore, it totally eliminates the religious argument concerning where to put the braces or the begin..end pair on the if.

The complete set of modern programming language constructs includes:

if..then..elseif..else..endif
select..case..default..endselect  (typical case/switch statement).
while..endwhile
repeat..until
loop..endloop
for..endfor
break
breakif
continue

Those who have had the opportunity to use these control structures for a considerable amount of time generally recognize their superiority over the Pascal/C/C++ variants. The biggest fault Pascal/C/C++ programmers tend to find with these structures (other than they are different ) is that "Ada uses these structures and Ada is a 'yucky' language." Hardly a scientific assessment of the quality of these control constructs.

All programs should use these control structures where available and simulate them if they are not available. The exact simulation details will appear in language-specific sections of this document.

Rule: Programs written in a standard imperative language (e.g., C/C++, Pascal, Ada, Visual BASIC, Delphi, etc.) will use the modern versions of the standard control constructs. If the language does not directly support these control structures, the programmer will simulate them using rules appearing elsewhere in this document.

Rule:
If your code contains a chain of if..elseif..elseif.......elseif..... statements, do not use the final else clause to handle a remaining case. Only use the final else to catch an error condition. If you need to test for some value in an if..elseif..elseif.... chain, always test the value in an if or elseif statement.

Most compilers implement multi-way selection statements (case/switch) using a jump table. This means that the order of the cases within the selection statement is usually irrelevant. Placing the statements in a particular order rarely improves performance. Since the order is usually irrelevant to the compiler, you should organize the cases so that they are easy to read. There are two common organizations that make sense: sorted (numerically or alphabetically) or by frequency (the most common cases first). Either organization is readable, sorting by frequency has the advantage of being faster if your compiler uses a brain-dead if..then.elseif..elseif... implementation of multi-way selection. One drawback to the second approach is that it is often difficult to predict which cases the program will execute most often.

Guideline:
When using multi-way selection statements (case/switch) sort the cases numerically (alphabetically) or by frequency of expected occurrence.

There are three general categories of looping constructs available in common high-level languages- loops that test for termination at the beginning of the loop (e.g., while), loops that test for loop termination at the bottom of the loop (e.g., repeat..until), and those that test for loop termination in the middle of the loop (e.g., loop..endloop). It is possible simulate any one of these loops using any of the others. This is particularly trivial with the loop..endloop construct:

/* Test for loop termination at beginning of LOOP..ENDLOOP */

    loop
        breakif (x==y);
         .
         .
         .
    endloop;


/* Test for loop termination in the middle of LOOP..ENDLOOP */

    loop
         .
         .
         .
        breakif (x==y);
         .
         .
         .
    endloop;

/* Test for loop termination at the end of LOOP..ENDLOOP */

    loop
         .
         .
         .
        breakif (x==y);
    endloop;

Given the flexibility of the loop..endloop control structure, you might question why one would even burden a compiler with the other loop statements. However, using the appropriate looping structure makes a program far more readable, therefore, you should never use one type of loop when the situation demands another. If someone reading your code sees a loop..endloop construct, they may think it's okay to insert statements before or after the exit statement in the loop. If your algorithm truly depends on while..do or repeat..until semantics, the program may now malfunction.

Rule:
Always use the most appropriate type of loop (categorized by termination test position). Never force one type of loop to behave like another.

Many languages provide a special case of the while loop that executes some number of times specified upon first encountering the loop (a definite loop rather than an indefinite loop). This is the "for" loop in most languages. Unfortunately, this iterative loop ranges from very simple (e.g., in Pascal) to extremely complex (e.g., Algol-68 and PL/I). The vast majority of the time a for loop sequences through a fixed range of value incrementing or decrementing the loop control variable by one. Therefore, most programmers automatically assume this is the way a for loop will operate until they take a closer look at the code. Since most programmers immediately expect this behavior, it makes sense to limit for loops to these semantics. If some other looping mechanism is desirable, you should use a while loop to implement it (since the for loop is just a special case of the while loop). There are other reasons behind this decision as well. Most compilers generate especially efficient code for standard for loops, while they tend to generate less than optimal code for "funny" versions of for loops. Hence there are efficiency considerations as well as readability reasons behind this choice.

Rule:
"FOR" loops should always use an ordinal loop control variable (e.g., integer, char, boolean, enumerated type) and should always increment or decrement the loop control variable by one.

Most people expect the execution of a loop to begin with the first statement at the top of the loop, therefore,

Rule:
All loops should have one entry point. The program should enter the loop with the instruction at the top of the loop.

Likewise, most people expect a loop to have a single exit point, especially if it's a while or repeat..until loop. They will rarely look closely inside a loop body to determine if there are "break" statements within the loop once they find one exit point. Therefore,

Guideline:
Loops with a single exit point are more easily understood.

Whenever a programmer sees an empty loop, the first thought is that something is missing. Worse yet, in languages like Pascal or C/C++ where you don't have a terminating ENDloop statement, it's easy to think that the next statement in the program is the body of the loop (worse yet, it's easy to forget the semicolon that marks the end of the loop and actually make the next statement in the program the loop's body). Therefore,

Guideline:
Avoid empty loops. If testing the loop termination condition produces some side effect that is the whole purpose of the loop, move that side effect into the body of the loop. If a loop truly has an empty body, place a comment like "/* nothing */" or "{null statement}" within your code.

Even if the loop body is not empty, you should avoid side effects in a loop termination expression. When someone else reads your code and sees a loop body, they may skim right over the loop termination expression and start reading the code in the body of the loop. If the (correct) execution of the loop body depends upon the side effect, the reader may become confused since s/he did not notice the side effect earlier. The presence of side effects (that is, having the loop termination expression compute some other value beyond whether the loop should terminate or repeat) indicates that you're probably using the wrong control structure. Consider the following while loop from "C" that is easily corrected:

    while ( ( ch = getc(stdin)) != 'A')
    {
        << statements >>
    }

A better implementation of this code fragment would be to use a loop..endloop construct:

    for(;;) /* C/C++'s infinite loop statement */
    {
        ch = getc(stdin);
        if (ch != 'A') break;

        << statements >>
    }

An even better solution to the above would be to use the newer high level language constructs. See the C/C++ language-specific section for more details.

Rule:
Avoid side-effects in the computation of the loop termination expression (others may not be expecting such side effects). Also see the guideline about empty loops.

Like functions, loops should exhibit functional cohesion. That is, the loop should accomplish exactly one thing. It's very tempting to initialize two separate arrays in the same loop. You have to ask yourself, though, "what do you really accomplish by this?" You save about four machine instructions on each loop iteration, that's what. That rarely accounts for much. Furthermore, now the operations on those two arrays are tied together, you cannot change the size of one without changing the size of the other. Finally, someone reading your code has to remember two things the loop is doing rather than one.

Guideline:
Make each loop perform only one function.

Programs are much easier to read if you read them from left to right, top to bottom (beginning to end). Programs that jump around quite a bit are much harder to read. Of course, the goto statement is well-known for its ability to scramble the logical flow of a program, but you can produce equally hard to read code using other, structured, statements in a language. For example, a deeply nested set of if statements, some with and some without else clauses, can be very difficult to follow because of the number of possible places the code can transfer depending upon the result of several different boolean expressions.

Rule:
Code, as much as possible, should read from top to bottom.
Rule:
Related statements should be grouped together and separated from unrelated statements with whitespace or comments.
Enforced Rule:
GOTOs, if they appear at all in a program, must be okayed by a peer review of at least two peers, both of whom agree the resulting code with a GOTO is easier to understand than equivalent code without a GOTO. GOTOs should only be used in exception processing statements or after exhausting several other attempts at writing clear code without the GOTO. Of course some code is actually easier to read with a GOTO statement than without, but it is easy to develop a mental block that would suggest the use of a GOTO when a clearer solution exists, hence the peer review.

 

Expressions

Few things look so similar between different languages yet act so different as arithmetic expressions. Between various languages the precedence of operators is different, the associativity of operators is different, even the operation computed is often different. It goes without saying that different languages often use the same symbol for different operations and, likewise, use different symbols for the same operation. This creates a problem with a coding standard if the intent is to allow a Visual BASIC programmer to easily read a program written in C/C++ or Pascal. Although there are many issues that a coding standard cannot practically resolve, some standards can improve the situation.

One of the big areas where programming languages differ is how they handle operator precedence. For example, in C/C++ the "<<" and ">>" (shift left and shift right) operators have lower precedence than addition and subtraction. In Borland Turbo Pascal and Delphi, the "SHL" and "SHR" operators have higher precedence than addition and subtraction. Likewise, in many languages the relational operators all have the same precedence while in others they do not. The overly simplistic solution is to take the "Beginning Programmer Textbook" attitude of accepting the (almost) universal precedence relationship between addition, subtraction, multiplication, and division and requiring parentheses everywhere else. While this is, perhaps, a good starting point it often falls short in practice because some expressions wind up with too many parentheses (impairing the readability) when the intent would have been clear without them.

As a general rule, the reader of a program should be able to make the following assumptions about the operator precedence within a program:

  • Operands have the highest precedence. This includes functions, variables (scalar, array element, and record field), constants, dereferenced pointers, etc.
  • Unary operators
  • Multiplication, division, and remainder (mod)
  • Addition and subtraction
  • Relational operators (may not all be the same precedence)
  • Logical operators (and, or, may not be the same precedence)

As long as two adjacent operators in an expression belong to two different classes above, you can skip using parentheses. You can assume that addition, subtraction, multiplication, remainder and division are left associative. Therefore, if there are two adjacent operators are addition and subtraction, or multiplication, remainder, or division, then you can skip the parentheses. In all other cases, you must supply parentheses to explicitly state the precedence.

Rule:
The assumable precedences are: [highest]: {operands} {unary operators} {*,/,mod} {+.-} {<, <=, =, <>, >, >=} {and, or}. Note that you can only assume left associativity for {*,/,mod} and {+,-}. Assume all other operators are non-associative and that you must use parentheses if they are next to one another in an expression. If you cannot assume the precedence according to the rule above, use parentheses to explicitly state the precedence.

Some language use short-circuit evaluation, some use full evaluation of expressions. If your program uses and depends upon short-circuit evaluation, you will comment this fact next to each expression that requires short-circuit evaluation.

Rule:
If an expression depends upon short-circuit evaluation to produce a correct answer, you must explicitly state this in a comment nearby.

In most languages it is possible to produce side effects within an expression. You can accomplish this, for example, by passing a parameter by reference to a function or if the function modifies global variables. Since most languages give the compiler writer leeway with respect to the order of evaluation of expressions, you should never use a variable whose value is modified as a side effect of a function or operator within that expression (e.g., in C/C++ consider the statement "Y = X + Y + ++X;"). Even if you're sure the result will be correct, such code would be very difficult to understand.

Guideline:
An expression should not produce any side effects.

There are some obvious exceptions to the rule above. The whole purpose of some operators and functions is to produce a side effect. Examples include the "++" and "--" operators in C/C++ and any of the various assignment operators. A stronger rule to allow for this might be

Rule:
A program should never use the value of a variable modified as a result of a side effect within that same expression.

Never execute an expression solely for the side effects it produces. Programmers generally expect the value of an expression to carry some significance; they feel there would be no need to compute the value of an expression if that value were of no importance. If all you need are the side effects, find some other way to achieve those side effects. Example: What does the following C statement do? (This came out of a real program on the net.)

	*s++ || *s++ || *s++ || *s++ || s++;
Rule:
Never execute an expression solely for the side effects it produces.

 

There are some mechanical issues regarding expressions that can make them easier to read. The following rules and guidelines document these issues:

Guideline:
There should be no spaces between a unary operator (e.g., "-") and the object on which it operates.
	-x	*p	!b	/* from C/C++ */
Guideline:
There should be at least one space on either side of a binary operator.
	x = *p + a / b;
Guideline:
Operators that select a component of a larger object (e.g., "." for records/structures and "[ ]" for arrays) should be adjacent to the object(s) they operate upon.
	recname.field			recptr->field  ary[ i ]
Guideline:
Objects that separate items (e.g., "," and ";") should immediately follow the previous object. If a second object follows the separator, there should be a space between the separator and the second object.
	proc( parm1, parm2, parm3);
	procedure PascalProc( i:integer; b:boolean );
Guideline:
Bracketing symbols (e.g., "(" and ")", "[" and "]", and "{" and "}" ) should have one space on the "open" end of the symbol, that is, to the right of "(", "[", and "[" and to the left of ")", "]", and "}".
	x := f( x + 2 * a[ i, j ] );

Some languages (C/C++ and Algol-68 come to mind) have a tremendous number of operators. Some of them are quite arcane and have no counterpart in other languages (when was the last time you used ">>=" or "->*" ?). If an alternative is available, you should avoid using assignments within expressions and other lesser-used operators.

 

Hit Counter

Home ] Up ]

Send mail to webmaster@amscitech.com with questions or comments about this web site.
Copyright © 1997 - 2006 American Science Institute of Technology