Worst abuse of the C preprocessor (IOCCC winner, 1986)

IOCCC (The International Obfuscated C Code Contest) is an annual programming contest for C “obfuscated” code.

Obfuscated code is machine code that is deliberately written to be hardly understood by humans. It can be used for different purposes but it is mainly done for security reasons, by making it obscure to avoid tampering, hide implicit values or conceal the logic used.

In 1986 the award for the category “worst abuse of the C preprocessor” was given to Jim Hague for writing the following piece of code:

Apart from the particular formatting, what jumps to the eye is the number of “unnecessary” macros and the repetitive use of DIT and DAT variations.

If we compile the code at this point we see many warnings. Among them, two for the implicit declaration of __DIT and _DAH. After that step, we can run the code, and as we provide sequences of ascii characters, it returns sequences of “.” and _.

$ ./a.out hello, world

.... . .-.. .-.. --- --..-- .-- --- .-. .-.. -..

This is actually Morse code (the one used in telegraph communications back in XIX century). When decoded it reverses back to HELLO, WORLD.

Let’s first try to perform the pre-processor job and replace the macros by their values. After a bit of reformatting, this is what we have:

We see the three functions we expected: main, _DAH, and __DIT. We also see an external variable __DAH__ , a long string. __DIT is like the putchar function from the standard library, printing a char at a time.

And what about _DAH ?

As long as the argument is a number that takes more than 2 bits to write, it calls the function again, stripping the number from its last bit. The output will be part of the argument printed as and . masking for 1 and 0 , i.e. the number in binary format, and it will return the second leftmost digit. As an example, if we call _DAH(5) , 5 being 101 in binary, it will call _DAH(2) . That is the base case. it prints (nothing) and return 10 & 1 == 0 so . . Then it will print . and return 101 & 1 == 1 so -. If we want to print that we have to call __DIT(_DAH(5)) which outputs .- which actually corresponds to 3 written in binary. _DAH(n) is a rather obfuscated function, which will not print/return n in binary but n — 2 .

The outer loop: Each time the user enters a new line, it creates a buffer, and reads a line from the standard input to the buffer. The function either returns the buffer it takes as an argument or NULL in this case. This loop also assigns an address to value which it will use in the inner loop. This loop will print a new line after completion of the two inner loops.

The middle loop: it will loop through each letter in the string obtained above. As it moves from one letter to another, it will either print it using _DAH seen above or print a ? and add a space. As we looked at _DAH above, we used integers as arguments. It works fine as letters, ASCII characters like *value to be more precise, and are small integers in C.

The inner loop: it sets the value of *value to 2, and look at the value of the letter at this point. ((*letter >= 'a') ? *letter & 223 : *letter) means if the letter is lower case, use its upper case version. The octal 233 or 10010011 serves as a mask to change the one bit that is different for a lowercase letter than an uppercase. Of course 223 is not the most obvious one 137 would more easily come to mind. Knowing this, we can realize this inner loop will iterate as long as the *letter variable does not have a match in the code string. As it iterates, it will increase *value by 1 and move on to the next letter in code . Interestingly, if at one point in this process, the letter in code is lowercase, the loop will increase *value by some special number.

Going from a letter to its Morse code: globally, the inner loop starts with a *value = 2 and increase it by 1 each time the iteration moves on to the next letter in code until the letter in code is the letter in my line. It will then print *value using _DAH . Let’s see some examples:

*letter = 'E' then *value = 2 and since E is the first letter in code it goes back to the middle loop and calls putchar(*code_copy ? _DAH(*value) : '?')` . Value of *code_copy == 'E' here so expression above becomes putchar(_DAH(2)) As seen above _DAH(2) returns/prints the binary value of 0as ‘.’ and ‘-’. The output of this expression will be . This is indeed the Morse code for E.

*letter = 'I' then to start *value = 2 , since I is the third letter in code the inner loop will exit with *value == 4 . putchar(_DAH(4)) will print out .. , the Morse code for I.

We can observe a pattern, as the order chosen for the letters in code is such that the letters have Morse codes that are equal to the binary values of their index in the code string plus 2. Of course, this is not perfect. Remember the inner loop special cases ? If *code == 'a' , the loop will skip that *value or said otherwise, this *value or index does not map any Morse code. If *code == 'b' , the loop will skip that *value and shift by 'b' — 'a' == 1 , which means from the on, the Morse code for the letters in code will map the value of their index in the string plus 3. An so on, next comes a d which will shift that new mapping by 'd' — 'a' == 3 .

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store