Worst abuse of the C preprocessor (IOCCC winner, 1986)
IOCCC (The International Obfuscated C Code Contest) is an annual programming contest for C “obfuscated” code.
Obfuscated code is machine code that is deliberately written to be hardly understood by humans. It can be used for different purposes but it is mainly done for security reasons, by making it obscure to avoid tampering, hide implicit values or conceal the logic used.
In 1986 the award for the category “worst abuse of the C preprocessor” was given to Jim Hague for writing the following piece of code:
Apart from the particular formatting, what jumps to the eye is the number of “unnecessary” macros and the repetitive use of
If we compile the code at this point we see many warnings. Among them, two for the implicit declaration of
_DAH. After that step, we can run the code, and as we provide sequences of ascii characters, it returns sequences of “.” and _.
$ ./a.out hello, world
.... . .-.. .-.. --- --..-- .-- --- .-. .-.. -..
This is actually Morse code (the one used in telegraph communications back in XIX century). When decoded it reverses back to HELLO, WORLD.
Let’s first try to perform the pre-processor job and replace the macros by their values. After a bit of reformatting, this is what we have:
char *_DIT, *DAH_, *DIT_, *_DIT_, *malloc (), *gets();
for (_DIT = malloc(81), DIT_=_DIT++; _DIT == gets(_DIT); __DIT(‘n’))
for (DAH_=_DIT; *DAH_; __DIT(*_DIT_ ? _DAH(*DIT_ ) : ‘?’),__DIT(‘ ‘),DAH_++)
for (*DIT_ = 2, _DIT_ = _DAH_; *_DIT_ && (*_DIT_ != (*DAH_ >= ‘a’ ? *DAH_&223 : *DAH_ )); (*DIT_ )++,_DIT_++)
*DIT_+= (*_DIT_>=’a’ ? *_DIT_ — ‘a’ : 0);
__DIT(DIT_> 3 ? _DAH(DIT_>>1) : ‘’);
return DIT_ & 1 ? ‘-’ : ‘.’;
__DIT(DIT_) char DIT_;
(void) write (1,&DIT_,1);
We see the three functions we expected:
__DIT. We also see an external variable
__DAH__ , a long string.
__DIT is like the
putchar function from the standard library, printing a char at a time.
And what about
As long as the argument is a number that takes more than 2 bits to write, it calls the function again, stripping the number from its last bit. The output will be part of the argument printed as
. masking for
0 , i.e. the number in binary format, and it will return the second leftmost digit. As an example, if we call
_DAH(5) , 5 being
101 in binary, it will call
_DAH(2) . That is the base case. it prints (nothing) and return
10 & 1 == 0 so . . Then it will print
. and return
101 & 1 == 1 so -. If we want to print that we have to call
__DIT(_DAH(5)) which outputs
.- which actually corresponds to 3 written in binary.
_DAH(n) is a rather obfuscated function, which will not print/return
n in binary but
n — 2 .
The main function
main ( )
char *line, *letter, *value, *code_copy, *malloc ( ),* gets ( );
for (line = malloc(81), value= line++; line == gets(line); putchar(‘n’))
for (letter = line; *letter; putchar(*code_copy ? _DAH(*value) : ‘?’), putchar(‘ ‘), letter++)
for (*value = 2, code_copy = code; *code_copy && (*code_copy != (*letter >= ‘a’ ? *letter & 2
23: *letter)); (*value)++, code_copy++)
*value += (*code_copy >=’a’ ? *code_copy — ‘a’: 0);
The outer loop: Each time the user enters a new line, it creates a buffer, and reads a line from the standard input to the buffer. The function either returns the buffer it takes as an argument or
NULL in this case. This loop also assigns an address to
value which it will use in the inner loop. This loop will print a new line after completion of the two inner loops.
The middle loop: it will loop through each letter in the string obtained above. As it moves from one letter to another, it will either print it using
_DAH seen above or print a
? and add a space. As we looked at
_DAH above, we used integers as arguments. It works fine as letters, ASCII characters like
*value to be more precise, and are small integers in C.
The inner loop: it sets the value of
*value to 2, and look at the value of the letter at this point.
((*letter >= 'a') ? *letter & 223 : *letter) means if the letter is lower case, use its upper case version. The octal
10010011 serves as a mask to change the one bit that is different for a lowercase letter than an uppercase. Of course
223 is not the most obvious one
137 would more easily come to mind. Knowing this, we can realize this inner loop will iterate as long as the
*letter variable does not have a match in the
code string. As it iterates, it will increase
*value by 1 and move on to the next letter in
code . Interestingly, if at one point in this process, the letter in
code is lowercase, the loop will increase
*value by some special number.
Going from a letter to its Morse code: globally, the inner loop starts with a
*value = 2 and increase it by 1 each time the iteration moves on to the next letter in
code until the letter in
code is the letter in my line. It will then print
_DAH . Let’s see some examples:
*letter = 'E' then
*value = 2 and since
E is the first letter in
code it goes back to the middle loop and calls
putchar(*code_copy ? _DAH(*value) : '?')` . Value of
*code_copy == 'E' here so expression above becomes
putchar(_DAH(2)) As seen above
_DAH(2) returns/prints the binary value of
0as ‘.’ and ‘-’. The output of this expression will be
. This is indeed the Morse code for E.
*letter = 'I' then to start
*value = 2 , since
I is the third letter in
code the inner loop will exit with
*value == 4 .
putchar(_DAH(4)) will print out
.. , the Morse code for I.
We can observe a pattern, as the order chosen for the letters in
code is such that the letters have Morse codes that are equal to the binary values of their index in the
code string plus 2. Of course, this is not perfect. Remember the inner loop special cases ? If
*code == 'a' , the loop will skip that
*value or said otherwise, this
*value or index does not map any Morse code. If
*code == 'b' , the loop will skip that
*value and shift by
'b' — 'a' == 1 , which means from the on, the Morse code for the letters in code will map the value of their index in the string plus 3. An so on, next comes a
d which will shift that new mapping by
'd' — 'a' == 3 .