This exercise introduces hash functions and demonstrates how you can
compute file hashes using GCHQ’s browser-based tool CyberChef. You can
run CyberChef online or download it to run locally. (In the latter
case, you’ll need to unzip the archive and then open the HTML file from the
CyberChef directory in your browser.)
Run CyberChef in your browser. At the bottom of the UI, make sure that the ‘Auto Bake’ checkbox is ticked.
Examine the contents of message.txt in your browser or
by downloading the file and opening it in a text editor. Copy the entire
contents of this text file and paste them into the Input panel of
CyberChef. You will see the same text appear in the Output panel,
because we are not yet doing any processing of the input.
In the Operations menu on the left of the CyberChef UI, click on the ‘Hashing’ menu item to open up a submenu listing a large number of different hash functions. Find the entry for MD5 and hover the cursor over it to read the description. Then click and drag the MD5 operation onto the Recipe panel. The contents of the Output panel will now change to show the MD5 hash of the input, as a string of 32 hexadecimal digits:
c7cd4529f9ebe5b40ff061188ec6f5c1
(If you see a different hash value, beginning with a3, you’ve probably
just included the final newline character of the file in the Input panel,
which is not a problem…)
CyberChef computing an MD5 hash
Alter a single character of the text in the Input panel. You should see that most of the hex digits have changed. For example, if you change the word “park” to “pork”, you should see the hash change to
e231c7c4f92f282fdebd4419c8629ba9
(Again, you’ll see a different value if you’ve included a trailing newline in the input.)
This extreme sensitivity of hash function output to changes in input is known as the avalanche effect.
MD5 is too short for security purposes these days, so try using something much more secure, from the SHA-2 family.
Remove the MD5 operation by clicking the trashcan icon at the top of the Recipe panel, then find ‘SHA2’ from the ‘Hashing’ submenu and drag it onto the panel. Click on the Size parameter to choose different SHA-2 hash functions.
Notice how the hash sizes vary in direct relation to the hash function name (e.g., SHA-256 = 256 bits = an output of 64 hex digits). Notice also how all of these hashes are significantly longer than MD5.
Download psdocs.zip and unzip it. This will give you two
PostScript documents, recommend.ps and order.ps. These can be printed
on a PostScript-supporting printer or previewed with a suitable application
(e.g., evince or gv on Linux machines, or Adobe Acrobat Reader on
Windows). For convenience, we show them side-by-side in the image below.
You can see that the documents are entirely different in content.
Two different letters (recommend.ps on left, order.ps on right)
Replace the SHA2 operation with MD5 in the CyberChef Recipe panel.
Then click on the Open file as input button (the middle of the five
buttons) at the top of the Input panel and select recommend.ps. The
MD5 hash of the file will appear in the Output panel.
Now click on the Add an input tab button (the one with the ‘+’ icon)
to add a new input tab. With this new tab selected, repeat the previous
step, this time loading order.ps. If you examine the two tabs in the
Output panel closely, you will see that the two files have produced
the same hash, despite appearing to be entirely different!
This is an example of an MD5 collision. It was constructed in 2005 by Magnus Daum of Ruhr-Universität Bochum and Stefan Lucks from the University of Mannheim. The increasing feasibilty of finding such collisions is the key reason why MD5 – and, subsequently, SHA-1 – are no longer considered usable for security purposes.
Finally, try using CyberChef to compute various SHA-2 hashes of the two documents. (Note: you may need to use the Bake button each time you change the hash function, to ensure that both hashes update correctly.) You’ll find that there are no collisions for any of these functions. A collision for two inputs to one particular hash function does not imply collisions for the same inputs in other hash functions.