Let's begin with a brief description of OTP and how it works. Let's take the following message and key:

message = "Hello World"

key = "supersecret"

If we convert both the message and key to hex strings, we get the following:

message = "48656c6c6f20576f726c64"

key = "7375706572736563726574"

If we do a simple XOR of the two hex strings we get the following cipher-text:

cipher-text = "3b101c091d53320c000910"

If we XOR the cipher-text with the key, we can recover the plain-text. That's how OTP works. Without the key, you have no way of uncovering the plain-text.

Let's consider what happens when you have two messages encrypted with the same key. Take the following two messages and key:

message1 = "Hello World"

message2 = "the program"

key = "supersecret"

If we convert each message and the key to hex strings, and then encrypt each message using a simple XOR with the key, we'll get the following cipher-texts:

cipher-text1: "3b101c091d53320c000910"

cipher-text2: "071d154502010a04000419"

Let's say that all we have is the two cipher-texts and the knowledge that they were encrypted with a supposed OTP; however, they were both encrypted with the same key. To attack this encryption and uncover the plain-text, follow the steps below.

- Guess a word that might appear in one of the messages
- Encode the word from step 1 to a hex string
- XOR the two cipher-text messages
- XOR the hex string from step 2 at each position of the XOR of the two cipher-texts (from step 3)
- When the result from step 4 is readable text, we guess the English word and expand our crib search.
- If the result is not readable text, we try an XOR of the crib word at the next position.

Step 1 seems difficult (guessing a word that might appear in one of the messages), but when you think about it, the word "the" is the most commonly used English word. So, we'll start with assuming "the" is in one of the messages. After encoding "the" as a hex string, we'll get "746865". That takes care of steps 1 and 2. If we XOR the two cipher-texts, we'll get the following result:

cipher-text1 XOR cipher-text2 = "3c0d094c1f523808000d09"

The next step is to XOR our crib word "746865" at each position of the XOR of the cipher-texts. What we'll do is slide "746865" along each position of "3c0d094c1f523808000d09" and analyze the result. After the first XOR, we get the following result:

3c0d094c1f523808000d09

XOR 746865

----------------------------------

48656c

When we convert the hex string "48656c" to ASCII, we get the following text, "Hel". This takes us to step 5 from above. Because this looks like readable text, we can assume that the word "the" is in the first position of one message. If we didn't get readable text, we would slide 48656c one position to the right and try again (and keep repeating until the end of 3c0d094c1f523808000d09).

Note that we don't know which message contains the word "the". It could be in either message1 or message2. Next, we need to guess what the word "Hel" is when fully expanded. It could be "Help", "Hello", etc. If we guess "Hello", we can convert "Hello" to a hex string, we get "". We then XOR it with the XOR of the two cipher-texts (just like we did with "the"). Here's the result:

3c0d094c1f523808000d09

XOR 48656c6c6f

----------------------------------

7468652070

"7468652070" when converted to ASCII, is "the p". We then repeat the process, guessing what "the p" might be when expanded and then XOR that result with the XOR of the cipher-texts. Granted, guessing what "the p" might expand to is not super easy, but you get the idea. If we were to guess "the program", convert it to a hex string, and XOR it with the XOR of the cipher-texts, we'll get "Hello World".

This is called crib dragging. My suggestion is to first try " the " (note the spaces before and after). Most cipher-texts that you'll try cracking will contain that word somewhere in the text. If the result of your crib drag yields gibberish, then you can be sure " the " isn't in either of the plain-text messages. So, try another commonly used English word or phrase and keep trying until the result yields something that looks like readable text. Then you can just expand your guess and keep XORing until you uncover the plain-text messages.

In a future blog post, I'll demonstrate an implementation of a crib drag in Scala.

Hi, I implemented it in scala.Take a look at my solution to traverse and match the ciphertexts. See my github account at https://github.com/vannicaruso/studying_crypto1

ReplyDeleteNice article, but how would you do the third step if you had more than two messages ?

ReplyDeleteThanks for the comment! For three messages (as an example), you'd XOR all three messages together. If you get readable text after Step 4, you know that the "guess word" appears in ONE of the three messages. You can then XOR just two of the three messages and repeat the process to deduce which encrypted message contains the guess word.

DeleteNice article, but what if the two messages have different length. I tried with "Hello Simple World" and "the program" and couldn't get "Hel" later.

ReplyDeleteIt will still work, but the key would of course need to be the same length as the longest message, since one time pad encryption assumes the key is the same length as the message. Therefore, the suffix of the key would be unused for the shorter message(s). Does that help? If not, I can put together an article to show this. Just let me know :)

DeleteNicely explained, thank you!

ReplyDeletegreat explanation thanks alot

ReplyDeleteThank you very very much for the exlpanation!

ReplyDeleteHi, this is really a good example, but I got one question, will this method going to be work when cracking 2 cipher-texts with the same hex ? e.g. Both of them have the same hex number "3c" at the first column.

ReplyDelete