American Science Institute of Technology |
|
|
Data Typing, Declarations, Variables, and other ObjectsMost languages' built-in data types are abstractions of the underlying
machine organization and rarely does the language define the types in terms of
exact machine representations. For example, an integer variable may be a 16-bit
two's complement value on one machine, a 32-bit value on another, or even a
64-bit value. Clearly, a program written to expect 32 or 64 bit integers will
malfunction on a machine (or compiler) that only supports 16-bit integers. The
reverse can also be true. typedef int int16; typedef long int32;
typedef short int16; typedef int int32;
Don't redefine existing types. This may seem like a contradiction to the guideline above, but it really isn't. This statement says that if you have an existing type that uses the name "integer" you should not create a new type named "integer." Doing so would only create confusion. Another programmer, reading your code, may confuse the old "integer" type every time s/he sees a variable of type integer. This applies to existing user types as well as predefined types.
Declare all variables, even if the language processor allows implicit declarations. At one time there was a controversy as to whether it was better to have implicitly declared variables or force the user to explicitly declare all variables (e.g., the FORTRAN vs. ALGOL/Pascal crowd). When NASA and JPL lost a Venus probe due to an implicitly declared variable (that just happened to have the wrong type), the "explicitly declare" crowd won the argument. Fortunately, most modern languages require explicit declarations.
Some languages force you to declare all your variables at a given point in a program unit (e.g., Pascal); some languages are more flexible and let you declare variables anywhere in your program as long as you declare them before their first use; other languages do not require that you declare variables at all (see the above rule). Since it is possible to declare symbols at different points in a program, different programmers have developed different conventions concern the position of their declarations. The two most popular conventions are the following:
Logically, the second scheme above would seem to be the best. However, it has one major drawback - although names typically have only a single definition, the program may use them in several different locations. So although you can easily define a variable just prior to its first use, other uses may be hundreds of lines away. The advantage of declaring variables at the beginning of the program unit is that, no matter how far away it is, the programmer always knows where to look to find the variable declarations. If you embed the definition in the middle of the code nearest the first usage, someone reading the program may have to resort to a "linear search" in order to find the declaration.
Unfortunately, not all name definitions are passive, some actually execute code. A instance of a class object in C++ is a good example. The definition of a class object calls the constructor for that class. The constructor may require the computation of some parameter values prior to the object's definition. This would prevent the placement of the definition at the beginning of the module. The solution is rather simple and well within the definition of a "Rule" within this guide:
Some might argue that certain languages, like C++, provide excellent facilities for declaring otherwise anonymous variables with certain language constructs. For example, the "for ( int i = 0; i < 10; ++i) ..." statement limits the scope of "i" to this for loop. However, the goal of these guidelines is to produce a standard that applies to all languages; making special exceptions for C++ (or some feature-laden language) will only lead to confusion. Besides, C++ lets you create new program units by using "{" and "}" (e.g., the compound statement). Those who absolutely desire to put their definitions as close to the for-loop as possible can always do something like the following:
// Previous statements in this code...
.
.
.
{
int i;
for (i=start; i <= end; ++k) ...
}
.
.
.
// Additional statements in this code.
Descriptive comments should always accompany a set of variable declarations. These comments should describe the purpose of the variables, provide complete English names for the variables if the names use any abbreviations (see the next section), and describe any constraints or assumptions on the use of these variables. The position of these comments should be immediately before the block or program unit that declares the variables (e.g., in the block of comments preceding a function definition). To improve readability and make it easy for a programmer to locate a particular name while manually scanning through a listing, you should place only one variable declaration per line so the reader can easily find the variable's name while scanning the left-hand side of the list. In languages where the type name precedes the variable name, it's a good idea to put the type name on one line and the variable name (indented) on the next line. Examples: (* Pascal *)
var
LineCnt, { Number of lines, words, and }
WordCnt, { and characters in a file. }
CharCnt:integer;
(* Also Reasonable *)
var
LineCnt:integer; { Number of lines, words, and }
WordCnt:integer; { and characters in a file. }
CharCnt:integer;
/* C/C++ */
int
LineCnt, /* Number of lines, words, and */
WordCnt, /* and characters in a file. */
CharCnt;
/* Another C/C++ Version */
int LineCnt; /* Number of lines, words, and */
int WordCnt; /* and characters in a file. */
float CharCnt;
NamesAccording to studies done at IBM, the use of high-quality identifiers in a
program contributes more to the readability of that program than any other
single factor, including high-quality comments. The quality of your identifiers
can make or break your program; program with high-quality identifiers can be
very easy to read, programs with poor quality identifiers will be very difficult
to read. There are very few "tricks" to developing high-quality names;
most of the rules are nothing more than plain old-fashion common sense.
Unfortunately, programmers (especially C/C++ programmers) have developed many
arcane naming conventions that ignore common sense. The biggest obstacle most
programmers have to learning how to create good names is an unwillingness to
abandon existing conventions. Yet their only defense when quizzed on why they
adhere to (existing) bad conventions seems to be "because that's the way
I've always done it and that's the way everybody else does it."
Alphabetic Case ConsiderationsA case-neutral identifier will work properly whether you compile it with a compiler that has case sensitive identifiers or case insensitive identifiers. In practice, this means that all uses of the identifiers must be spelled exactly the same way (including case) and that no other identifier exists whose only difference is the case of the letters in the identifier. For example, if you declare an identifier "Profits This Year" in Pascal (a case-insensitive language), you could legally refer to this variable as "profits This Year" and "PROFITS THIS YEAR". However, this is not a case-neutral usage since a case sensitive language would treat these three identifiers as different names. Conversely, in case-sensitive languages like C/C++, it is possible to create two different identifiers with names like "PROFITS" and "profits" in the program. This is not case-neutral since attempting to use these two identifiers in a case insensitive language (like Pascal) would produce an error since the case-insensitive language would think they were the same name.
Different programmers (especially in different languages) use alphabetic case to denote different objects. For example, a common C/C++ coding convention is to use all upper case to denote a constant, macro, or type definition and to use all lower case to denote variable names or reserved words. Prolog programmers use an initial lower case alphabetic to denote a variable. Other comparable coding conventions exist. Unfortunately, there are so many different conventions that make use of alphabetic case, they are nearly worthless, hence the following rule:
There are going to be some obvious exceptions to the above rule, this
document will cover those exceptions a little later. Alphabetic case does have
one very useful purpose in identifiers - it is useful for separating words in a
multi-word identifier; more on that subject in a moment.
Note that the rule above does not specify whether the first letter of an
identifier is upper or lower case. Subject to the other rules governing case,
you can elect to use upper or lower case for the first symbol, although you
should be consistent throughout your program.
AbbreviationsThe primary purpose of an identifier is to describe the use of, or value
associated with, that identifier. The best way to create an identifier for an
object is to describe that object in English and then create a variable name
from that description. Variable names should be meaningful, concise, and
non-ambiguous to an average programmer fluent in the English language. Avoid
short names. Some research has shown that programs using identifiers whose
average length is 10-20 characters are generally easier to debug than programs
with substantially shorter or longer identifiers.
The variable names you create should be pronounceable. "NumFiles" is a much better identifier than "NmFls". The first can be spoken, the second you must generally spell out. Avoid homonyms and long names that are identical except for a few syllables. If you choose good names for your identifiers, you should be able to read a program listing over the telephone to a peer without overly confusing that person.
The Position of Components Within an IdentifierWhen scanning through a listing, most programmers only read the first few characters of an identifier. It is important, therefore, to place the most important information (that defines and makes this identifier unique) in the first few characters of the identifier. So, you should avoid creating several identifiers that all begin with the same phrase or sequence of characters since this will force the programmer to mentally process additional characters in the identifier while reading the listing. Since this slows the reader down, it makes the program harder to read.
Many C/C++ Programmers, especially Microsoft Windows programmers, have
adopted a formal naming convention known as "Hungarian Notation." To
quote Steve McConnell from Code Complete: "The term 'Hungarian' refers both
to the fact that names that follow the convention look like words in a foreign
language and to the fact that the creator of the convention, Charles Simonyi, is
originally from Hungary." One of the first rules given concerning
identifiers stated that all identifiers are to be English names. Do we really
want to create "artificially foreign" identifiers? Hungarian notation
actually violates another rule as well: names using the Hungarian notation
generally have very common prefixes, thus making them harder to read.
Although attaching machine type information to an identifier is generally a
bad idea, a well thought-out name can successfully associate some high-level
type information with the identifier, especially if the name implies the type or
the type information appears as a suffix. For example, names like "PencilCount"
and "BytesAvailable" suggest integer values. Likewise, names like
"IsReady" and "Busy" indicate boolean values. "KeyCode"
and "MiddleInitial" suggest character variables. A name like "StopWatchTime"
probably indicates a real value. Likewise, "CustomerName" is probably
a string variable. Unfortunately, it isn't always possible to choose a great
name that describes both the content and type of an object; this is particularly
true when the object is an instance (or definition of) some abstract data type.
In such instances, some additional text can improve the identifier. Hungarian
notation is a raw attempt at this that, unfortunately, fails for a variety of
reasons.
Can we apply this suffix idea to variables and avoid the pitfalls? Sometimes. Consider a high level data type "button" corresponding to a button on a Visual BASIC or Delphi form. A variable name like "CancelButton" makes perfect sense. Likewise, labels appearing on a form could use names like "ETWWLabel" and "EditPageLabel". Note that these suffixes still suffer from the fact that a change in type will require that you change the variable's name. However, changes in high level types are far less common than changes in low-level types, so this shouldn't present a big problem. Names to AvoidAvoid using symbols in an identifier that are easily mistaken for other symbols. This includes the sets {"1" (one), "I" (upper case "I"), and "l" (lower case "L")}, {"0" (zero) and "O" (upper case "O")}, {"2" (two) and "Z" (upper case "Z")}, {"5" (five) and "S" (upper case "S")}, and ("6" (six) and "G" (upper case "G")}.
Avoid misleading abbreviations and names. For example, FALSE shouldn't be an identifier that stands for "Failed As a Legitimate Software Engineer." Likewise, you shouldn't compute the amount of free memory available to a program and stuff it into the variable "Profits".
You should avoid names with similar meanings. For example, if you have two variables "InputLine" and "InputLn" that you use for two separate purposes, you will undoubtedly confuse the two when writing or reading the code. If you can swap the names of the two objects and the program still makes sense, you should rename those identifiers. Note that the names do not have to be similar, only their meanings. "InputLine" and "LineBuffer" are obviously different but you can still easily confuse them in a program.
In a similar vein, you should avoid using two or more variables that have different meanings but similar names. For example, if you are writing a teacher's grading program you probably wouldn't want to use the name "NumStudents" to indicate the number of students in the class along with the variable "StudentNum" to hold an individual student's ID number. "NumStudents" and "StudentNum" are too similar.
Avoid names that sound similar when read aloud, especially out of context. This would include names like "hard" and "heart", "Knew" and "new", etc. Remember the discussion in the section above on abbreviations, you should be able to discuss your problem listing over the telephone with a peer. Names that sound alike make such discussions difficult.
Avoid misspelled words in names and avoid names that are commonly misspelled. Most programmers are notoriously bad spellers (look at some of the comments in our own code!). Spelling words correctly is hard enough, remembering how to spell an identifier incorrectly is even more difficult. Likewise, if a word is often spelled incorrectly, requiring a programer to spell it correctly on each use is probably asking too much.
If you redefine the name of some library routine in your code, another program will surely confuse your name with the library's version. This is especially true when dealing with standard library routines and APIs.
|
|
Send mail to
webmaster@amscitech.com
with questions or comments about this web
site.
|