Tuesday, September 12, 2006

Law abiding words

Does words follow any law? Yes they do, and it is called Zipf's law. This law states that if you count the number of occurrences of words in a text of a natural language these numbers must relate to each other in a certain way. If you order the words in a row sorting by the number of occurrences you will find that the frequency of any word is roughly inversely proportional to its rank in the table.

For the English language you will see that the word 'the' is most common and occurs about 7% in a given text. Second comes the word 'of'' with an occurrence of 3.5% and on third place comes 'and' with an occurrence of 2.3%. From this follows also that the major part of all the words occur only once in a given text.

What fun can we have with this knowledge? Actually this is a good tool in cryptography. If you are given a text that you don't recognize you can start counting the words. If you find that they obey Zipf's law the text is probably not encrypted and you just have to find somebody who understands it.

Now this does not always help you. The picture above is taken from a book called the Voynich manuscript. This documents has a very strange history and no one has so far been able to understand the script. Some have wondered if it is actually encrypted, but when you count the words you see that the text obeys Zipf's law. So is this written in a forgotten language seen nowhere else? Maybe you can solve the mystery...

No comments: