The Von Neumann Mistake, and NaNoWriMo
Welcome to the October issue of Hacker Chronicles!
The nights are getting chilly in California. My wife and I have a yearly game of chicken of who’s the first to turn on the heat, so I’m writing this to you clad in three layers, wool socks, and a beanie. #winning
I’m about to start a month of intense writing. See NaNoWriMo below.
In this month’s letter, I have a nerdy monthly feature for you, going into a fundamental part of how hacking works. Enjoy!
/John
Writing Update
I recently crossed the 40k mark for the sequel to Identified. Some of the research I’ve done for this novel has been on the shelf for years and I finally used some of that material. It was for a scene in a very special environment that I had found photos of. I love collecting visuals that inspire me. Elena Chernyshova’s photos of Norilsk were a major reason for the final part of Identified taking place there.
NaNoWriMo – National Novel Writing Month
A few years ago, I joined the National Novel Writing Month, or NaNoWriMo. It’s a nonprofit which encourages and builds communities around creative writing. Last year they had over 400k writers participating. It’s not just a US thing – the German section for instance has 18k members, and it goes down to per-city communities like the one in Stockholm, Sweden with 3k members.
The annual writing month is November and the official goal is to write 50k words. As you can see above, I’m at 40k in my sequel and that has taken me several months, so 50k in thirty days is a tall order. But driving yourself to write a lot in a relatively short period of time opens new doors and makes you realize that you can write a novel. 50k words is a bit above 200 pages printed.
I’m all geared up for this. I’ve taken some time off work to get extra hours, I will get out of bed two hours earlier in the mornings to write, and I will stay away from non-work social media for the whole month. This is going to be great!
You can join too, and be my writing buddy. See my profile. You’ll see how I do day by day there.
October Feature: The Von Neumann Mistake
John von Neumann is one of the smartest humans who has worked in computer science. He collaborated with Alan Turing on pioneering computer work including artificial intelligence, founded the field of game theory, and was among the first to work on pseudo number generators which we talked about in the August newsletter.
But von Neumann was also deeply involved in the Manhattan Project which resulted in the first nuclear weapons. He used his early computers to create the hydrogen bomb.
He died of cancer in 1957 only 53 years old, and some believe it was caused by radiation exposure during his time in Los Alamos National Laboratory.
The fundamental design of modern CPUs (central processing units) and how they run software is often referred to as the Von Neumann architecture. I remember it from computer science class at university, both from an ingenious hardware design perspective and from a risky security perspective.
Let’s dig into why the Von Neumann architecture is fundamental to computer hacking!
Before the Von Neumann Architecture
Before the modern computer, there were different ones. Among the most famous is the British Colossus developed 1943-1945 to help break Nazi Germany’s cryptography. It’s considered the world’s first programmable, electronic, digital computer, and was typically operated by the Wrens – the Women’s Royal Naval Service. (Now you know where Colossus in D.F. Jones’s novel comes from.)
One thing that set these early computers apart from our modern ones was that you couldn’t run stored programs on them. Instead you configured them with switches so that they could perform one task. You can think of this as having to write the recipe every time you wanted to cook a certain dish. That is, no way to write the recipe just once and store it in a book or a digital note.
The Invention of the Stored Program
One of von Neumann’s seminal inventions was the stored-program computer.
Think back to high school math and functions like f(x). Old computers allowed you to input x but the function f was set through those switches. In other words, the previous generation of computers could take data as input but not programs.
Von Neumann realized that you can also take the function, or the program, as input. Voilà – the generalized computing machine was born. Today, you buy a smartphone and it can load websites and apps that it has never seen before and run them. Those sites and apps are stored programs your smartphone takes as input.
The Mistake
Modern computers use the same storage for both data and programs. This means computers have to know which is which. But at the most basic level, what is stored is all just ones and zeroes.
To simplify, let’s assume an 8-bit computer, like your 1980s Commodore 64 or Nintendo. For them, a piece of data or an instruction in a program has eight bits (ones and zeroes). For example:
01001100
Now the question, are the bits above an instruction or a piece of data? If it’s data interpreted as a number, it’s the number 76. If it’s data interpreted as a character, it’s the uppercase letter L. And if it’s an instruction on a Commodore 64, that instruction performs a jump to a new segment of the program (like jumping to another recipe to make a sauce).
How does the computer tell it all apart if 01001100 can mean those three things, and more? It all depends on the context. The CPU knows based on context whether to interpret the next eight bits as data or an instruction.
This mix of data and instructions, and the reliance on CPUs getting the context right, has been the underpinning of computer hacking all along. It’s about confusing the computer as to whether something is data or an instruction. If the hacker can input something as data but make the computer handle it as an instruction, the hacker has found a way to effectively reprogram the computer!
Examples of the Mistake
The idea of mixing data and code has traveled far since the original Von Neumann architecture. Here are a few examples of where the same pattern has come into play.
-
Hacking through Word documents. You may have heard that you shouldn’t open document attachments from unknown senders. It turns out Word documents aren’t just text and images. They can also contain code. They can be a way for an attacker to smuggle in an exploit.
-
Cross-site scripting. Webpages are not just text, images, and styling. They can also contain code, so called JavaScript. Being able to fool a webpage into running input data as code, results in a cross-site scripting attack.
-
Hacking through images. When a computer converts e.g. a jpeg file into an image on your screen, it does complex processing to turn bits into colored pixels. Some image formats, such as Scalable Vector Graphics, can also contain code.
Mixing Data and Instructions in Hacker Fiction
It takes a few paragraphs to explain the Von Neumann architecture and why it has facilitated computer hacking for decades. But once you get it, it’s kind of beautiful in its simplicity.
You see a couple of instances of this in my novel Identified.
First when West needs to solve a challenge to get into the Velvet Fridays hackathon. One of the challenges is to program a polyglot. That is a file that can be interpreted in several formats at once, and both as data and code. It’s extra cool when the file that delivers the attack also renders a beautiful image on the screen.
A second example in my novel is the US G20S format for classified data entries on the block chain. They look like images but also contain crypto keys. Not strictly a mix of data and code, but an example of how you can smuggle things inside of data.
Although opaque, I think The Matrix Revolutions has some of this mix of data and code going on. The Merovingian is a trafficker in information and part of the Matrix’s storage. The Trainman works for Merovingian and can move programs in and out of worlds. This indicates that information and programs are indistinguishable when in transit, which is also true in a real computer.
I think this pattern of hacks is underutilized in fiction. It’s quite simple, true, and fascinating. Don’t be surprised if you read a beautiful reveal in one of my forthcoming novels where data turns into code, or vice versa. Stay tuned. :)
Currently Reading
I got fed up with George R. R. Martin’s A Dance with Dragons. Giving up on a book or pausing it indefinitely is hard for me, even though I always tell myself that reading is for me and my entertainment. I don’t know why A Dance with Dragons was so slow going for me but it just didn’t improve.
Now I’m instead reading the 1994 thriller Kolymsky Heights by Lionel Davidson. It’s great so far! And I’m still reading The Adventures of Tom Sawyer for our daughter.