Worst abuse of the C preprocessor (IOCCC winner, 1986)

IOCCC (The International Obfuscated C Code Contest) is an annual programming contest for C “obfuscated” code.

Obfuscated code is machine code that is deliberately written to be hardly understood by humans. It can be used for different purposes but it is mainly done for security reasons, by making it obscure to avoid tampering, hide implicit values or conceal the logic used.

In 1986 the award for the category “worst abuse of the C preprocessor” was given to Jim Hague for writing the following piece of code:

Apart from the particular formatting, what jumps to the eye is the number of “unnecessary” macros and the repetitive use of and variations.

If we compile the code at this point we see many warnings. Among them, two for the implicit declaration of and . After that step, we can run the code, and as we provide sequences of ascii characters, it returns sequences of “.” and _.

This is actually Morse code (the one used in telegraph communications back in XIX century). When decoded it reverses back to HELLO, WORLD.

Let’s first try to perform the pre-processor job and replace the macros by their values. After a bit of reformatting, this is what we have:

We see the three functions we expected: , , and . We also see an external variable , a long string. is like the function from the standard library, printing a char at a time.

And what about ?

As long as the argument is a number that takes more than 2 bits to write, it calls the function again, stripping the number from its last bit. The output will be part of the argument printed as and masking for and , i.e. the number in binary format, and it will return the second leftmost digit. As an example, if we call , 5 being in binary, it will call . That is the base case. it prints (nothing) and return . Then it will print and return . If we want to print that we have to call which outputs which actually corresponds to 3 written in binary. is a rather obfuscated function, which will not print/return in binary but

The main function

The outer loop: Each time the user enters a new line, it creates a buffer, and reads a line from the standard input to the buffer. The function either returns the buffer it takes as an argument or in this case. This loop also assigns an address to which it will use in the inner loop. This loop will print a new line after completion of the two inner loops.

The middle loop: it will loop through each letter in the string obtained above. As it moves from one letter to another, it will either print it using seen above or print a and add a space. As we looked at above, we used integers as arguments. It works fine as letters, ASCII characters like to be more precise, and are small integers in C.

The inner loop: it sets the value of to 2, and look at the value of the letter at this point. means if the letter is lower case, use its upper case version. The octal or serves as a mask to change the one bit that is different for a lowercase letter than an uppercase. Of course is not the most obvious one would more easily come to mind. Knowing this, we can realize this inner loop will iterate as long as the variable does not have a match in the string. As it iterates, it will increase by 1 and move on to the next letter in . Interestingly, if at one point in this process, the letter in is lowercase, the loop will increase by some special number.

Going from a letter to its Morse code: globally, the inner loop starts with a and increase it by 1 each time the iteration moves on to the next letter in until the letter in is the letter in my line. It will then print using . Let’s see some examples:

then and since is the first letter in it goes back to the middle loop and calls . Value of here so expression above becomes As seen above returns/prints the binary value of as ‘.’ and ‘-’. The output of this expression will be This is indeed the Morse code for E.

then to start , since is the third letter in the inner loop will exit with . will print out , the Morse code for I.

We can observe a pattern, as the order chosen for the letters in is such that the letters have Morse codes that are equal to the binary values of their index in the string plus 2. Of course, this is not perfect. Remember the inner loop special cases ? If , the loop will skip that or said otherwise, this or index does not map any Morse code. If , the loop will skip that and shift by , which means from the on, the Morse code for the letters in code will map the value of their index in the string plus 3. An so on, next comes a which will shift that new mapping by .