Saturday, June 4, 2011

Many People May Still Know Nothing

I have a feeling that 'the wisdom of crowds' is going to be the next big thing in popular misunderstanding of difficult scientific concepts. The idea itself is simple enough- if you ask a lot of people to guess at some value, the average guess is quite likely to be somewhere close to the true value. The founding myth of the belief (the story itself is true, much of what it is taken to mean is not) is about how Francis Galton discovered that the average of the guesses made at the weight of an ox by visitors to a country fair was extremely close to the true weight of the animal.

That such an average guess would be 'quite' close to the real value, for some value of 'quite', is perhaps no great revelation, but in some cases it can be much closer than you would expect intuitively from the amount of information available to the guessers. There are two necessary conditions for this effect to work to any detectable degree- the guessers must have some information (in Galton's case they could see the ox in question and presumably had some experience of oxen in general), and the guesses must be be susceptible to linear comparison, that is, they must represent values.

Anecdotally, a TV magician last year used the idea of the wisdom of crowds to explain how he correctly  'predicted' the lottery numbers. He asked a lot of people what they thought the numbers would be. It was a diversion to hide the real trick, of course, because it fails to satisfy either condition: there is no information that the crowd could possibly have that would produce a useful 'average', and, a point too easily missed in discussions of these things, the numbers on a lottery ball are not values, they are just symbols. The 6 ball is not lower in any sense than the 12 ball. The only thing that matters is that they are different. The balls could be marked with colours from a paint catalogue, or the faces of Tiger Woods' girlfriends (I might be a bit behind with the celebrity gossip), or the shapes of random pebbles picked up on the beach. You can't take an average of something that doesn't have a spread of values.

The same applies, mutatis mutandis, to politics, in that voters are not, in fact, collectively deciding at which point on a line they want the next PM to stand, even though that is how we tend to think of parties and their policies. With markets, the problem is that, unless you are a particular kind of socialist, there is no true value to compare with the price produced by the crowd. The market price is the right price more or less by definition.

The idea is simple enough, I said, so why is it so easy to misapply it? Because to do anything with the idea you have to do understand statistics, and statistics is very tricky stuff indeed. Also, as someone on one of the comment threads linked below says, statistics is maths for boring people, so what with one thing and another we won't be seeing the press paying too much attention to the details of the concept.

Just to give an idea, here is a paper which explores some data collected from a large group of Swiss. Here is an article about it and here is a criticism of some aspects of that article. If you look at the questions the Swiss were asked, you see that their is no obvious way to estimate, even very approximately, the amount of information they could possess about the questions, so that aspect, the link between the available data and the average result, is not assessed. What we do see is the averages are wildly out, by a factor of 3 to 10 in most cases, but that the median and the geometric mean give rather closer, but still on the whole very inaccurate, answers. What this suggests is there are a number of very high guesses skewing the data, as well as the obvious fact that neither individually nor collectively did our Swiss guinea pigs have the faintest idea about any of the questions they were asked.

The paper is tough going for the non-mathematician, but the Lehrer article is readable and shows why I think we'll hear more of it. The criticism is also hard going but is worth a look for what he says about journalistic practice in these cases.

One of the difficulties is that average to most people means arithmetic mean, which is, by the way, what Galton used. But the arithmetic mean, as we said, is a very poor way of characterizing any distribution that has a number of values much higher than the concentration of most common values. And when guessing the population density of Switzerland some people are likely to be out by an order of magnitude or more. But there are other ways of finding a characteristic value. The geometric mean reduces the influence of very high values, and the harmonic mean reduces it even more, while increasing the effect of low values and of the distance between elements. You probably didn't need to know that, but there you are. There is a beauty in the harmonic mean that the other two do not possess, but, though beauty may be truth, it's not a good reason for choosing it.

There is also the median, which is the central value in a set of data points, and the mode, which is the most common value, and both have their uses in characterising data sets. You need to know which one to choose.

The point of all this is that by clicking a few links and making a small intellectual effort, you can know more about the wisdom of crowds than all the people who will shortly be talking about it as though it were the key to understanding the universe.

Tomorrow waterbirds again, I promise.

1 comment:

James Higham said...

Median, mode and area under the normal curve - I vaguely remember it.