Brawling scientists, Hindu nationalism, Markov chains, mysterious ancient symbols that one man believes he can interpret... there’s a Dan Brown novel in there, I think. There’s certainly a fascinating story, which I shall attempt to explain.
The
Indus Valley scripts are associated with a
people about whom little is known except that they lived in that area around 4,000 years ago. Where they came from, where they went, whether they sliced their tomatoes along the equator or through the poles... these are things which may never be known. And the
language they spoke is also unknown.
A lot of people, for many different reasons, would like to think they know the answers, and to this end a lot of work has been done to analyse the symbols that have survived on a number of clay artefacts, a few thousand symbols in a few hundred inscriptions, most no more than three or four symbols long.
The script is unknown, as is the language that it encodes, and, given the scarcity of information that can be extracted from such a small amount of data, that is not going to change unless a bi-lingual tablet turns up identifying is as an already known language.
It is far from certain that the symbols represent language at all. That is a point which might, at some time, be determined, but so far it is still in doubt. It matters (to the people who care about these things) because writing was probably developed independently in only three or four places in history. If the Indus script encodes a language, it would be another one, and it would mean that the Indus Valley civilization was literate. This matters not only to linguisticians and historians, but also to various flavours of Indian nationalist.
Back in 2000 Michael Witzel and Steven Farmer wrote a paper demolishing the pet theory of N. S. Rajaram, who claimed to have translated these inscriptions, in the course of proving that the Harappan civilization used domesticated horses. It is generally believed that horses were introduced to that area much later. You wonder why Witzel and Farmer even bothered with the witterings of someone who clearly doesn’t have a clue what he’s talking about, but it served as a warm-up for their 2003 paper, with Richard Sproat, which attempted to show that the Indus script could not be a language, and more generally that the Harappan people could not have been literate.
Last year, Rajesh Rao and others used a technique involving Markov chains to try to detect the sort of structure they thought the inscriptions would have if they really were language. They measured the conditional entropy (a term pinched from physics but the concept is well defined in information theory and computational linguistics) of the script, in a way that they describe in the notes to the paper.
The entropy measured for the script was in the same narrow range as the (very few) real language scripts that they analysed in the same way, and far from the values of the control scripts they tested, which were artificially produced, one to have very rigid structure and the other to be almost completely random. They thus announced that this was evidence that the Indus script was a written language.
But is it? They have no way of knowing how significant the presence in that narrow range of the entropy values is. Without analysing a far larger number of natural scripts that do and do not encode language, it is not clear that any script that is used to encode information in a real situation can fall outside that range. It is easy to construct, and indeed to find, scripts which that do not represent language, but which fall in that same range. It may well be that any script that contains sufficient structure to contain information, whether or not it is given linguistically, and is employed by a real person, will tend to be in that range. The paper does not consider what the result actually means, or if it means anything at all.
Richard Sproat answered with a paper of his own, which was answered by Rao, and reanswered by Sproat. They was also a bit of vigorous debate hosted by Rahul Siddharthan at this (excellent) blog, and Mark Liberman at the Language Log got involved as well. Rob Lee et al have tried to apply the same analysis to the Pictish inscriptions, with similar results.
Sproat descended into anti-nationalist ranting, more it would seem from exasperation than from lack of arguments or from axe-grinding. Though he has made a heavy professional investment in the illiteracy of the Harappan civilization he is clearly a serious scientific researcher (and so is Rao).
Rao et al have now expanded on their previous work, fleshing out the background and context of their results in order to give their method much greater interpretative power. The matter is far from decided, and when the scientists finally agree, one way or the other, is when the nationalists will take over the fight. It promises to be fun.