|
| |
Program Layout
After naming conventions and where to put braces (or begin..end), the other
major argument programmers engage in is how to lay out a program, i.e., what are
the indentations one should use in a well written program? Unfortunately, the
ideal program layout is something that varies by language. The layout of an easy
to read C/C++ program is considerably different than that of an assembly
language, Prolog, or Bison/YACC program. As usual, this section will describe
those conventions that generally apply to all programs. It will also discuss
layouts of the standard control structures described earlier.
According to McConnell (Code Complete), research has shown that there is a
strong correlation between program indentation and comprehensibility. Miaria et.
al ("Program Indentation and Comprehension") concluded that
indentation in the two to four character range was optimal even though many
subjects felt that six-space indentation looked better. These results are
probably due to the fact that the eye has to travel less distance to read
indented code and therefore the reader's eyes suffer from less fatigue.
- Guideline:
- Indentation should be three to four spaces in an indented control
structure with four spaces probably being the optimal value.
- Enforced Rule:
- If you use tabs to indent your code, insert a comment at the very
beginning of the program that states the number of positions for each tab
stop. E.g., "/* This program is formatted using four character position
tabstops. */"
Steve McConnell, in Code Complete, mentions several objectives of good
program layout:
- The layout should accurately reflect the logical structure of the code.
Code Complete refers to this as the "Fundamental Theorem of
Formatting." White space (blank lines and indentation) is the primary
tool one can use to show the logical structure of a program.
- Consistently represent the logical structure of the code. Some common
formatting conventions (e.g., those used by many C/C++ programmers) are full
of inconsistencies. For example, why does the "{" go on the same
line as an "if" but below "int main()" (or any other
function declaration)? A good style applies consistently.
- Improve readability. If the indentation scheme makes a program harder to
read, why waste time with it? As pointed out earlier, some schemes make the
program look pretty but, in fact, make it harder to read (see the example
about 2-4 vs. 6 position indentation, above).
- Withstand modifications. A good indentation scheme shouldn't force a
programmer to modify several lines of code in order to affect a small change
to one line. For example, many programmers put a begin..end block (or
"{".."}" block) after an if statement even if there is
only one statement associated with the if. This allows the programmer to
easily add new statements to the then-clause of the if statement
without having to add additional syntactical elements later.
The principle tool for creating good layout is whitespace (or the lack
thereof, that is, grouping objects). The following paragraphs summarize
McConnell's finding on the subject:
- Grouping: Related statements should be grouped together. Statements that
logically belong together should contain no arbitrary interleaving
whitespace (blank lines or unnecessary indentation).
- Blank lines: Blank lines should separate declarations from the start of
code, logically related statements from unrelated statements, and blocks of
comments from blocks of code.
- Alignment: Align objects that belong together. Examples include type names
in a variable declaration section, assignment operators in a sequence of
related assignment statements, and columns of initialized data.
- Indentation: Indenting statements inside block statements improves
readability, see the comments and rules earlier in this section.
- Rule:
- At least one blank line must separate a comment on a line by itself from a
line of code following or preceding the comment.
This style guide uses the "Pure Blocks" layout form suggested by
McConnell. This is the obvious layout scheme to use when your language supports
modern structured statements like if..then..elseif..else..endif. Since this
standard requires the emulation of the modern block structured statements, the
Pure Blocks layout is appropriate.
- Rule:
- The standard layout scheme for this coding standard is the Pure Block
format. For languages that do not support modern structured control
statements, this coding standard specifies an emulation of these statements
that allows the use of the Pure Block layout format.
In theory, a line of source code can be arbitrarily long. In practice, there
are several practical limitations on source code lines. Paramount is the amount
of text that will fit on a given terminal display device (we don't all have
21" high resolution monitors!) and what can be printed on a typical sheet
of paper. If this isn't enough to suggest an 80 character limit on source lines,
McConnell suggests that longer lines are harder to read (remember, people tend
to look at only the left side of the page while skimming through a listing).
- Enforced Rule:
- Source code lines will not exceed 80 characters in length.
If a statement approaches the maximum limit of 80 characters, it should be
broken up at a reasonable point and split across two lines. If the line is a
control statement that involves a particularly long logical expression, the
expression should be broken up at a logical point (e.g., at the point of a
low-precedence operator outside any parentheses) and the remainder of the
expression placed underneath the first part of the expression. E.g.,
if
(
( ( x + y * z) < ( ComputeProfits(1980,1990) / 1.0775 ) ) &&
( ValueOfStock[ ThisYear ] >= ValueOfStock[ LastYear ] )
)
<< statements >>
endif;
Many statements (e.g., IF, WHILE, FOR, and function or procedure calls)
contain a keyword followed by a parenthesis. If the expression appearing between
the parentheses is too long to fit on one line, consider putting the opening and
closing parentheses in the same column as the first character of the start of
the statement and indenting the remaining expression elements. The example above
demonstrates this for the "IF" statement. The following examples
demonstrate this technique for other statements:
while
(
( NumberOfIterations < MaxCount ) &&
( i <= NumberOfIterations )
)
<< Statements to execute >>
endwhile;
fprintf
(
stderr,
"Error in module %s at line #%d, encountered illegal value\n",
ModuleName,
LineNumber
);
- Guideline:
- For statements that are too long to fit on one physical 80-column line,
you should break the statement into two (or more) lines at points in the
statement that will have the least impact on the readability of the
statement. This situation usually occurs immediately after low-precedence
operators or after commas.
For block statements there should always be a blank line between the line
containing an if, elseif, else, endif, while, endwhile, repeat, until, etc.,
and the lines they enclose. This clearly differentiates statements within a
block from a possible continuation of the expression associated with the
enclosing statement. It also helps clearly show the logical format of the code.
Example:
if ( ( x = y ) and PassingValue( x, y ) ) then
Output( 'This is done' );
endif;
- Rule:
- Always put a blank line between any block statement and the statement(s)
it encloses.
If a procedure, function, or other program unit has a particularly long
actual or formal parameter list, each parameter should be placed on a separate
line. The following (C/C++) examples demonstrate a function declaration and call
using this technique:
int
MyFunction
(
int NumberOfDataPoints,
float X1Root,
float X2Root,
float &YIntercept
);
x = MyFunction
(
GetNumberOfPoints(RootArray),
RootArray[ 0 ],
RootArray[ 1 ],
Solution
);
- Rule:
- If an actual or formal parameter list is too long to fit a function call
or definition on a single line, then place each parameter on a separate line
and align them so they are easy to read.
Comments and (program) Documentation
Almost everyone agrees that a program should have good comments.
Unfortunately, few people agree on the definition of a good comment. Some
people, in frustration, feel that minimal comments are the best. Others feel
that every line should have two or three comments attached to it. Everyone else
wishes they had good comments in their program but never seem to find the time
to put them in.
It is rather difficult to characterize a "good comment." In fact,
it's much easier to give examples of bad comments than it is to discuss good
comments. The following list describes some of the worst possible comments you
can put in a program (from worst up to barely tolerable):
- The absolute worst comment you can put into a program is an incorrect
comment. Consider the following Pascal statement:
A := 10; { Set 'A' to 11 }
- It is amazing how many programmers will automatically assume the comment
is correct and try to figure out how this code manages to set the variable
"A" to the value 11 when the code so obviously sets it to 10.
- The second worst comment you can place in a program is a comment that
explains what a statement is doing. The typical example is something like
"A := 10; { Set 'A' to 10 }". Unlike the previous example, this
comment is correct. But it is still worse than no comment at all because
it is redundant and forces the reader to spend additional time reading the
code (reading time is directly proportional to reading difficulty). This
also makes it harder to maintain since slight changes to the code (e.g.,
"A := 9") requires modifications to the comment that would not
otherwise be required.
- The third worst comment in a program is an irrelevant one. Telling a
joke, for example, may seem cute, but it does little to improve the
readability of a program; indeed, it offers a distraction that breaks
concentration.
- The fourth worst comment is no comment at all.
- The fifth worst comment is a comment that is obsolete or out of date
(though not incorrect). For example, comments at the beginning of the file
may describe the current version of a module and who last worked on it. If
the last programmer to modify the file did not update the comments, the
comments are now out of date.
Steve McConnell provides a long list of suggestions for high-quality code.
These suggestions include:
- Use commenting styles that don't break down or discourage modification.
Essentially, he's saying pick a commenting style that isn't so much work
people refuse to use it. He gives an example of a block of comments
surrounded by asterisks as being hard to maintain. This is a poor example
since modern text editors will automatically "outline" the
comments for you. Nevertheless, the basic idea is sound.
- Comment as you go along. If you put commenting off until the last
moment, then it seems like another task in the software development
process and management is likely to discourage the completion of the
commenting task in hopes of meeting new deadlines.
- Avoid self-indulgent comments. Also, you should avoid sexist, profane,
or other insulting remarks in your comments. Always remember, someone else
will eventually read your code.
- Avoid putting comments on the same physical line as the statement they
describe. Such comments are very hard to maintain since there is very
little room. McConnell suggests that endline comments are okay for
variable declarations. For some this might be true but many variable
declarations may require considerable explanation that simply won't fit at
the end of a line. One exception to this rule is "maintenance
notes." Comments that refer to a defect tracking entry in the defect
database are okay (note that the CodeWright text editor provides a much
better solution for this -- buttons that can bring up an external file).
Endline comments are also useful for marking the end of a control
structure (e.g., "end{if};").
- Write comments that describe blocks of statements rather than individual
statements. Comments covering single statements tend to discuss the
mechanics of that statement rather than discussing what the program is
doing.
- Focus paragraph comments on the why rather than the how. Code should
explain what the program is doing and why the programmer chose to do it
that way rather than explain what each individual statement is doing.
- Use comments to prepare the reader for what is to follow. Someone
reading the comments should be able to have a good idea of what the
following code does without actually looking at the code. Note that this
rule also suggests that comments should always precede the code to which
they apply.
- Make every comment count. If the reader wastes time reading a comment of
little value, the program is harder to read; period.
- Document surprises and tricky code. Of course, the best solution is not
to have any tricky code. In practice, you can't always achieve this goal.
When you do need to restore to some tricky code, make sure you fully
document what you've done.
- Avoid abbreviations. While there may be an argument for abbreviating
identifiers that appear in a program, no way does this apply to comments.
- Keep comments close to the code they describe. The prologue to a program
unit should give its name, describe the parameters, and provide a short
description of the program. It should not go into details about the
operation of the module itself. Internal comments should to that.
- Comments should explain the parameters to a function, assertions about
these parameters, whether they are input, output, or in/out parameters.
- Comments should describe a routine's limitations, assumptions, and any
side effects.
- Rule:
- All comments will be high-quality comments that describe the actions of
the surrounding code in a concise manner
- Enforced Rule:
- All comments will be up to date. If a programmer makes changes to the
code, that programmer is responsible for updating the internal comments and
any external documentation affected by those changes.
Unfinished Code
Often it is the case that a programmer will write a section of code that
(partially) accomplishes some task but needs further work to complete a feature
set, make it more robust, or remove some known defect in the code. It is common
for such programmers to place comments into the code like "This needs more
work," "Kludge ahead," etc. The problem with these comments is
that they are often forgotten. It isn't until the code fails in the field that
the section of code associated with these comments is found and their problems
corrected.
Ideally, one should never have to put such code into a program. Of course,
ideally, programs never have any defects in them, either. Since such code
inevitably finds its way into a program, it's best to have a policy in place to
deal with it, hence this section.
Unfinished code comes in four general categories: non-functional code, partially
functioning code, suspect code, and code in need of enhancement. Non-functional
code might be a stub or driver that needs to be replaced in the future with
actual code or some code that has severe enough defects that it is useless
except for some small special cases. This code is really bad, fortunately its
severity prevents you from ignoring it. It is unlikely anyone would miss such a
poorly constructed piece of code in early testing prior to release.
Partially functioning code is, perhaps, the biggest problem. This code works
well enough to pass some simple tests yet contains serious defects that should
be corrected. Moreover, these defects are known. Software often contains a large
number of unknown defects; it's a shame to let some (prior) known defects ship
with the product simply because a programmer forgot about a defect or couldn't
find the defect later.
Suspect code is exactly that- code that is suspicious. The programmer may not be
aware of a quantifiable problem but may suspect that a problem exists. Such code
will need a later review in order to verify whether it is correct.
The fourth category, code in need of enhancement, is the least serious. For
example, to expedite a release, a programmer might choose to use a simple
algorithm rather than a complex, faster algorithm. S/he could make a comment in
the code like "This linear search should be replaced by a hash table lookup
in a future version of the software." Although it might not be absolutely
necessary to correct such a problem, it would be nice to know about such
problems so they can be dealt with in the future.
The fifth category, documentation, refers to changes made to software that will
affect the corresponding documentation (user guide, design document, etc.). The
documentation department can search for these defects to bring existing
documentation in line with the current code.
This standard defines a mechanism for dealing with these five classes of
problems. Any occurrence of unfinished code will be preceded by a comment that
takes one of the following forms (where "@" denotes the standard
comment delimiters in a given language and "_" denotes a single
space):
@_#defect#severe_@
@_#defect#functional_@
@_#defect#suspect_@
@_#defect#enhancement_@
@_#defect#documentation_@
It is important to use all lower case and verify the correct spelling so it
is easy to find these comments using a text editor search or a tool like grep.
Obviously, a separate comment explaining the situation must follow these
comments in the source code.
Examples in various languages:
Pascal/Delphi:
(* #defect#severe *)
{ #defect#enhancement }
(* #defect#functional *)
{ #defect#suspect }
{ #defect#documentation }
C:
/* #defect#severe */
/* #defect#suspect */
/* #defect#documentation */
C++:
/* #defect#functional */
// #defect#enhancement //
BASIC:
' #defect#functional '
Assembly (80x86):
; #defect#suspect ;
Ada:
-- #defect#enhancement --
-- #defect#documentation --
Notice the use of delimiters on both sides even if the language,
technically, doesn't require them (C++. BASIC, assembly, and Ada).
- Enforced Rule:
- If a module contains some defects that cannot be immediately removed
because of time or other constraints, the program will insert a standardized
comment before the code so that it is easy to locate such problems in the
future. The four standardized comments are "@_#defect#severe_@,
"@_#defect#functional_@", "@_#defect#suspect_@",
"@_#defect#enhancement_@", and
"@_#defect#documentation_@" where "@" denotes the
comment delimiter and "_" denotes a single space. The spelling and
spacing should be exact so it is easy to search for these strings in the
source tree.
Cross References in Code to Other Documents
In many instances a section of code might be intrinsically tied to some other
document. For example, you might refer the reader to the user document or the
design document within your comments in a program. This document proposes a
standard way to do this so that it is relatively easy to locate cross references
appearing in source code. The technique is similar to that for defect reporting,
except the comments take the form:
@ text #link#location text @
The "@" represents the comment delimiters. "Text" is
optional and represents arbitrary text (although it is really intended for
embedding html commands to provide hyperlinks to the specified document).
"Location" describes the document and section where the associated
information can be found.
Examples:
C/C++:
/* #link#User's Guide Section 3.1 */
// #link#Program Design Document, Page 5 //
Pascal:
(* #link#Funcs.pas module, "xyz" function *)
{ <A HREF="DesignDoc.html#xyzfunc"> #link#xyzfunc </a> }
- Guideline:
- If a module contains some cross references to other documents, there
should be a comment that takes the form "@ text #link#location text
@" that provides the reference to that other document. In this comment,
the "@" represents the language's comment delimeter(s),
"text" represents some optional text (typically reserved for html
tags), and "location" is some descriptive text that describes the
document (and a position in that document) related to the current section of
code in the program.
|