I recently wrote a small, but cryptic “Hello World” program in Python:
Since /r/shittyprogramming seemed to like it, I thought I’d post a write-up on how it (actually) works.
First, let’s deobfuscate a little:
A lot easier to read now, though still not very easy to understand.
Essentially, we are ‘emulating’ a tiny custom CPU that has access to a very small amount of memory (the
data variable). Even if you’re not that familiar with how CPUs and memory work, the design should still be simple enough to understand.
Our memory is laid out just like it’s defined in the code:
You can think of each item in the list as holding a value at a specific address in memory.
data retrieves the value stored at memory address 0, for example. Each value in
data is exactly one byte in size.
The first two addresses act as memory-mapped IO ports (I’ll explain those later on).
The third address stores what is called the program counter register, or PC for short. The PC holds the address of the next instruction to execute. It is initialized to 5 here, because
data holds the address of the first instruction,
0x43. More on instructions in a bit.
Next, we have at
data the accumulator register, or
ACC for short. This is a general-purpose register, for storing whatever the programmer of this CPU wants. Lots of real CPUs have a register like this for storing the result of arithmetic operations. In our case, it’s just a free slot in memory that we can read and write to, with a useful side effect, covered in the next paragraph.
data lies the zero flag register, or
Z for short. In almost all CPUs, there will be a flag register where certain bits in that register being 1 or 0 will signify something has happened in the CPU. For example, if the result of an addition is 0, then the bit representing the
Z flag may be set to 1. This CPU is very minimal, so the
Z flag is the only flag. It is automatically set to 1 whenever the value in ACC becomes 0, and is reset when it changes again to something not 0.
Only the first 5 addresses are ‘special’. Values in
data and onwards hold instructions for the actual program the CPU will be executing.
Instructions are encoded in a special format. Every instruction is exactly one byte, which helps simplify implementing the CPU a bit.
The format of an instruction is as follows:
Each letter represents one bit. The
O bits represent the opcode, which determine which instruction the CPU is going to execute. The
Y bits represent the parameters for the instruction. In this CPU, every instruction has exactly 2 parameters, which we will call simply
There are only two
O bits, which means there can be at most 4 different instructions in this CPU. This is actually (more than) enough for our needs.
Here are the opcodes paired with their corresponding instructions:
00: Store the value x at the address y
mov x, [y]
01: Store the contents of address x at address y
mov [x], [y]
10: Add x to the contents of address y, storing the result at address y
add x, [y]
11: Add the contents of address x to the contents of address y, storing the result at address y
add [x], [y]
The mnemonics are just convenient names for the instructions, and should be familiar if you’ve written assembly before. They follow the format
<op_name> <src>, <dst>, meaning that the result of the operation is stored at the address specified by the second argument. In some assembly syntaxes,
<src> are the other way around, i.e.
<op_name> <dst>, <src>.
x (without brackets) means the literal value represented by
x is used as the argument, and
[x] means that the value stored at the address
x is used as the argument.
y is always surrounded by brackets since it always refers to an address that we end up storing the result into.
I actually didn’t end up using
add x, [y] in the Hello World program, but I kept it anyway.
With that, let’s try understanding the first instruction, which is stored at
0x43 in binary:
01 000 011
The first two bits signify that this is the
mov [x], [y] instruction. In particular,
x here is 0 and
y is 3 (binary
011). Since they both refer to addresses, this means that we’re taking the value stored at address 0 and storing (i.e. copying) it into address 3. In case you forgot, address 0 is the I/O input port (
IO_IN) and address 3 is the accumulator (
Reading from address 0 is handled differently by the CPU than reading other addresses. When the CPU sees that address 0 is read from, it will first write one byte of ‘input’ (we’ll see how this is implemented later) to address 0, then lets the operation proceed as normal.
So, ultimately this instruction will read one byte of input and store it into
The decoded program
I’ll save a bit of time and just write the mnemonics for all of the instructions in the program:
mov ,  mov ,  add ,  mov 5,  mov , 
Replacing special addresses with their name:
mov IO_IN, ACC mov ACC, IO_OUT add Z, PC mov 4, PC mov ACC, ACC
The full program should be a little clearer now.
- Read a byte of input, store it into
- Output the value of
- Add the value of
- Set the value of
- Move the value of
That 5th instruction may seem a little weird. It’s actually not necessary here due to how the CPU is implemented, but I decided to keep it as an example of what’s called a no-op instruction. It does nothing but waste a little bit of CPU time, and is often implemented as moving the value of a register into itself.
The 3rd instruction is a jump instruction in disguise. Jump instructions are instructions that change the flow of control, sometimes based on particular conditions (like certain flags). A jump changes the value of
PC to something specific, so that the next instruction to be executed is not simply the one in the next memory address.
Rather than having a separate instruction for jumping to a specific address if the
Z flag is flipped on, I simply add the value of
Z register to
PC using the already-available
add instruction. If the
ACC does not contain 0, then 0 is added and nothing changes (i.e. the next instruction is executed as normal). But if it’s 1, then the CPU will skip over the next instruction (since the
PC is also automatically incremented by 1 after every instruction), thereby changing the operation.
The 4th instruction is what’s usually called an unconditional jump, i.e. changing the value of
PC without checking for any conditions. Here, we store 4 into
PC. Taking into account that
PC is incremented after every instruction, this means that the value of
PC will be 5 after this instruction is executed, and so the CPU will go execute the first instruction again.
If the value of
Z was 1, the CPU will execute the 5th instruction instead of the 4th, and so the no-op will be executed. When our CPU reaches the end of memory (i.e. the value in
PC is too big), the CPU stops executing. This is why I mentioned that the no-op wasn’t necessary, since
PC would have a too-big value in it if 1 was added to it.
Implementing the CPU
That was a lot to go through! But now comes the (slightly) simpler part: implementing the CPU.
Here is the outline of what the CPU actually does:
- Read the byte stored at the address specified by
- Decode the byte into an instruction: the opcode
oand the arguments
- Handle each of the four possible instructions
- Increment the value in
Zto 1 if
ACCis 0, and 0 otherwise
The relevant code:
The first line just checks the stop condition we described earlier: does
PC (found at
data) specify an address that is too high? If it’s not too high, then we proceed.
b is the byte stored at the address specified by
y are extracted by masking and shifting bits. I won’t go into this part here, but essentially this is how you extract specific bits of a number.
Next comes a very simple
if block, with four branches – one for each possible instruction.
00 is simple: take
x and store it at address
y. We can also see special treatment for I/O: if the destination address was 1 (the
IO_OUT address), then we additionally print the value of
IO_OUT to the console.
01 is a little more complex, because we have to deal with input as well as output. If the source argument
x is the
IO_IN address (0), then we need to first store a byte of ‘input’ at
The ‘input’ is actually just implemented as a list of the ASCII values of the string “Hello World” followed by a terminating 0, declared at the top of the code:
This is also why the program checks that
ACC is 0. If
ACC is 0, that means we’ve read the whole string, and can terminate the program. As an aside, this is how strings are represented in a lot of languages, and this representation is usually called null-terminated.
After we store the first byte of the input in
data, we remove the first byte of
i by doing
i = i[1:]. This could’ve also been implemented by keeping track of the current index into the input and incrementing it.
Also, there is a little check to make sure that there is still input left to write:
This actually isn’t necessary since I never read from
IO_IN after reading all the input, but it might be a nice safety feature for a different application.
The actual storing is very straightforward:
IO_OUT address is handled the same way as for opcode
This is the
add x, [y] instruction, which is very simple to implement. As previously mentioned, it is also not actually used and so could’ve have been left unimplemented.
Finally, this is
add [x], [y]. It is also very simple, since we don’t care about I/O for
add in our CPU.
Now, we simply increment
PC and update the
Z flag by checking if
ACC is 0:
With that, we’ve successfully implemented a tiny CPU.
That’s actually it for the underlying logic, but it looked a bit too clean so I decided to make it a little harder to read.
The first step was really simple: use less variables. Rather than have separate variables for
y, I just copy-pasted their definitions anywhere they were used. I also renamed
p. The result is somewhat intimidating:
The second obfuscation is a little more involved: obfuscate the “Hello World” string.
I first performed a bitwise operation called right-rotating on each character. Right-rotating simply means that each bit gets shifted to the right (in this case by 1), but the right-most bit becomes the left-most bit. This differs from shifting, where the right-most bit would simply disappear and the left-most would just be 0. Rotating preserves all the bits, which means that I can perform the reverse operation, left-rotating, to get back the original value.
I then added
0x20 to each character (since the 0 at the end wasn’t a visually-representable ASCII code, and
0x20 is ASCII for the space character).
This (beautiful) line of code undoes those operations to get back the original character:
Finally, the memory was also obfuscated into a string. The obfuscation is very obvious here, namely adding
0x21 to each value (to make them visible ASCII characters that look a bit different to the obfuscated input).
This post ended up being a lot longer than I expected. Hopefully, it was informative.
Most of the concepts here would probably be covered in an introductory class on microprocessors or other low-level computer hardware. Real CPUs are a lot more complicated (usually) but, as shown here, it doesn’t actually take too much to design a functioning (though severely limited) one. Regardless, this is definitely a fun way to obfuscate simple programs and gain a better understanding of how a CPU might work.
As an aside, the implementation of the CPU is not quite perfect. Python doesn’t use 8-bit integers by default, so the code should really be masking the last 8 bits before storing anything to memory. You could say that the
while loop should check if
PC has a negative value as well, but with no
sub instruction and no way to represent negative numbers, it is a bit difficult to get
PC to be negative. But hey, it works well enough for “Hello World”.