In his essay Making Wrong Code Look Wrong, Joel Spolsky argues that programmers should use naming conventions to prevent errors. He reasons that these conventions should be constructed so that when the code is wrong, it looks wrong. That is to say that it is the programmer who should detect errors. I do not disagree that programmers are the ones who should take responsibility for errors, but I do disagree with Joel on how this should be achieved. I believe that the type system of your compiler should be brought to bear on the type of error detection that Joel discusses. This essay argues that Joel’s opinion is incorrect and gives a simple example of a better approach to preventing and detecting the class of error Joel discusses.
Joel takes the example of the cross-site scripting vulnerability, where a programmer may be dealing with strings that they know to be “safe” (i.e. free of potentially malicious content, as may be generated internally) and strings that they know may be “unsafe” (i.e. may contain malicious content). He essentially argues that one should prefix variable and function names that store or return safe strings with an “s” and those that store or return unsafe strings with “us”. If the prefices of an assignment expression do not match, he argues that the programmer will notice and identify the error. Joel likens this to the naming of variables in Microsoft Excel, which apparently are prefixed with “rw” and “col” to indicate row and column indices.
On the face of it, this seems sensible. However, it would be much better if the computer could identify this type of error. And it can. Instead of using the language’s built-in types and a naming convention, one can usually define one’s own types. Then, the compiler should be able to detect when the types on each side of an assignment do not match. Further, it is not sensible to tie type names to variable names, as the variable type may later change, requiring many variable names to be changed for the code to “look correct”.
Here’s an example (in C) of how type definition can let the compiler detect this class of error:
typedef struct
{
char* content;
} safeString;
typedef struct
{
char* content;
} unsafeString;
int main()
{
/* Make a couple of safe strings. */
safeString safe; safe.content = "This is safe.";
safeString alsosafe; alsosafe.content = "This is also safe.";
/* And an unsafe string. */
unsafeString unsafe; unsafe.content = "This is unsafe.";
safe = alsosafe; /* This compiles OK and works as expected. */
safe = unsafe; /* This gives a type error. */
/* Show us what we have done. */
printf("%s\n", safe.content);
printf("%s\n", unsafe.content);
return 0;
}
Initially I tried simply binding the char* type to the
two new type names, but this did not let the compiler protect
against type errors (it has been a while since I have used C in
anger, so if anyone can tell me exactly why this is, I’d
appreciate it; my guess is that the compiler treats such a simple
type definition differently to the case where the struct is used).
In object-oriented languages such as C++ or Java, one can create an abstract string base class and then derive safe and unsafe concrete classes from it. An example (in Java) is below.
abstract class MyString
{
public String string;
}
class SafeString extends MyString
{
public SafeString(String aString) {string = aString;}
}
class UnsafeString extends MyString
{
public UnsafeString(String aString) {string = aString;}
}
class MainFunction
{
public static void main(String[] args)
{
SafeString safe = new SafeString("This is safe\n");
UnsafeString unsafe = new UnsafeString("This is unsafe\n");
System.out.println(safe.string);
System.out.println(unsafe.string);
// Can we assign safe to unsafe?
safe = unsafe;
unsafe = safe;
// Nope, you get a type error.
// Try to go 'via' the base class.
MyString aString = safe;
UnsafeString yikes = aString;
// Nope, this is an error, too.
}
}
Of course, doing this properly is a bit more involved. For example,
you have to know that a naive programmer won’t think that the
new types are pointless and simply copy strings into plain old
char* variables (or String objects),
losing the ability for the compiler to detect errors. One way to
combat this problem is to develop a simple tool that will check all
code for the use of char*, and report their use as
potentially erroneous. Of course, there are apparent corner cases
where simple ints may be legally used in one situation
(e.g. as variables in for loops) but not in another
(row and column indices in a spreadsheet application). However, the
approach still above makes sense, as you simply have to ensure that
your interfaces use the appropriate types (e.g. a function to
update the display of a spreadsheet element called
updateElement() should take user-defined types
rowIndex and colIndex, rather than a pair
of ints). Clients of the interface are then encouraged to use the
correct types.
Just as with any professional software development, you still need code reviews to ensure code quality. A programmer using a built-in type might be sign that the code is incorrect, and the code would simply look wrong. Maybe Joel’s heart was in the right place after all, but I think he should trust the compiler if he’s going to use a typed language.