For the English language you will see that the word 'the' is most common and occurs about 7% in a given text. Second comes the word 'of'' with an occurrence of 3.5% and on third place comes 'and' with an occurrence of 2.3%. From this follows also that the major part of all the words occur only once in a given text.
What fun can we have with this knowledge? Actually this is a good tool in cryptography. If you are given a text that you don't recognize you can start counting the words. If you find that they obey Zipf's law the text is probably not encrypted and you just have to find somebody who understands it.

Now this does not always help you. The picture above is taken from a book called the Voynich manuscript. This documents has a very strange history and no one has so far been able to understand the script. Some have wondered if it is actually encrypted, but when you count the words you see that the text obeys Zipf's law. So is this written in a forgotten language seen nowhere else? Maybe you can solve the mystery...
No comments:
Post a Comment