The best format for multiple-word identifiers

Darius J Chuck

2021-07-07

Let’s consider the examples of multiple-word identifier formats from Wikipedia:

Formatting Name(s)
twowords flat case
TWOWORDS upper flat case
twoWords (lower) camelCase, dromedaryCase
TwoWords PascalCase, UpperCamelCase, StudlyCase
two_words snake_case, pothole_case
TWO_WORDS SCREAMING_SNAKE_CASE, MACRO_CASE, CONSTANT_CASE
two_Words camel_Snake_Case
Two_Words Pascal_Snake_Case
two-words kebab-case, dash-case, lisp-case
TWO-WORDS TRAIN-CASE, COBOL-CASE, SCREAMING-KEBAB-CASE
Two-Words Train-Case, HTTP-Header-Case

These are the common ways of dealing with the fact that spaces are forbidden in identifiers in virtually all modern programming languages. This was however not always the case. As the same article notes:

Historically some early languages, notably FORTRAN (1955) and ALGOL (1958), allowed spaces within identifiers, determining the end of identifiers by context. This was abandoned in later languages due to the difficulty of tokenization.

What seems to remain obscure is the fact that also early Lisp (ca. 1956-1958) allowed spaces within identifiers (there called symbols). A footnote added in 1995 to the foundational paper on Lisp (John McCarthy, April 1960) states:

Imbedded blanks could be allowed within symbols, because lists were then written with commas between elements.

Here the difficulty of tokenization certainly wasn’t an issue. So why did Lisp move away from this? I can find no clear explanation. I suspect though that it comes down to a historical accident.

Moreover, I’d argue that allowing identifiers in spaces was the right idea and it can be made to work very well with a simple syntax akin to Lisp’s.

This would truly be a better format for multiple-word identifiers than all of the existing ones, because the need to map between the natural and the programming language would be removed altogether, without compromise.