Saturday, May 3, 2008

Averages and Actuals - Groups and Individuals

Periodically news comes out that [group X] scores lower than [group Y] on the [test of Z]. The groups and tests differ depending on who is doing the testing and where it is done. In a way it is like ethnic jokes. The group that scores higher is usually a dominant group at the time and place where the study is done. The group that scores lower is generally of some concern to the dominant group. Sometimes the work stems from real societal concern. For example: [women] score lower than [men] in [math tests]. If you are interested in math education, or sex linked traits this is a legitimate concern.

I see references to these studies in the news. Usually if follows a common news pattern:

• It is inflammatory. Everyone in the group that scores lower is bound to be insulted. The reporter can create an objective report filled with outrage on all sides "You are calling all of us stupid!" and "You are misrepresenting the work!".
• The headline and the perhaps the lead paragraph are terrible simplifications of the work and are easy to misinterpret.
• Further study and clarification show flaws in the original work or provide evidence that something unexpected is at work. These clarifications do not get the same attention as the initial headlines.
• The news is irrelevant to anyone but specialists in the field.

This particular news pattern is also present in most national crime stories (missing child, grisly murder...) and lawsuits (man sues for a gazilion dollars upon death of tropical fish).

The part that interests me today is the irrelevance of the information. To see this, let me invent a story and pretend it is true. "Women score significantly lower than men in spatial reasoning tests. Scores on spatial reasoning tests are among the best indicators of success in engineering graphics." Let me also posit that I am a manager in an engineering graphics firm that is looking for new employees. How does this information help me?

Almost any test taken by a large population shows a wide variance in skill. In most cases the results form what is called a normal distribution, the familiar bell shaped curve. This expresses the fact that there is an average level of skill and that most people's scores cluster to some degree around that average. The ends of the bell curve show that there are a some people who do extremely well (or extremely poorly) on the test.

When we hear that [group X] scores lower than [group Y] on the [test of Z], what that really means is that when we compare the curves for the two groups, the average score is lower in one group than the other. Even with a perfect test, depending on how many people take the test and who they are, we can expect some differences. Statisticians have been studying this for a few centuries and they have developed measures for how likely it is that the the differences between the averages of the test scores is just an accident of who happened to take the test or whether there may be some real underlying phenomena. Usually news that [group X] scores lower than [group Y] on the [test of Z] involves differences larger than we would expect by chance, but sometimes not by much. In our case, let's assume that women, on average, score WAY below men and that chance is extremely unlikely to be the cause of the difference.

As the hiring manager, I do not hire an average man or an average woman. I hire a specific man or a specific woman. When I am interviewing a particular person, I may be faced with a man who scores much lower than the women's average or I may be faced with a woman who scores much higher than the men's average. Here is another way to put it. Women, on average, may score lower than men. But given any particular man, regardless of how well he scores on the test, I can almost certainly find a particular woman who scores even higher.

It is hard to imagine someone more concerned with the results of testing than our hiring manager, but it turns out that our study of the difference between men and women is completely irrelevant. What matters is the particular man or woman across the desk.


seppie said...

The interesting thing about CSAP is that it is used to track both individual performance and group performance on given tasks. CSAP scores follow kids through their school careers, and schools are judged by how well the kids do on the tests as a group. Much as people complain about it, it's actually a better test than the norm-referenced, fill-in-the-bubble tests that we took as schoolchildren. But there I'm not convinced that there is such a thing as a good or useful standardized test.

Masasa said...

But what about the relevancy to those you referenced in the first part of your post, those interested in math education, or sex linked traits?

Silver Gerety said...

This is great.

I had this exact conversation about the bell curve phenomenon with a friend a couple of months ago. The problems arise when we start using general trends to justify policies that effect individuals who may or may not represent the trend. It happens all the time.

It's like that old saying... lies, damn lies, and statistics.

Masasa said...

Upon logging onto the internet a little bit ago, I saw an illustration of this type of generalized thinking.

It was a photo of a "rich looking" man with a "plain" looking woman, and a link to an article with some title like: "Clash of Class: Can couples survive the economic divide?"

Masasa said...

...didn't quite finish my thought.

By which I mean, if you were in a relationship with someone from a different socio-economic background, wouldn't the real question be "can *we* survive the economic divide?"

But then again, maybe if I was in such a relationship, I'd read it out of curiosity anyway just to see how much my experiences mesh with the average/typical experiences. That seems like a pretty natural human response, to want to put ourselves in relationship to the whole.