Understanding Statistics:

Many of these examples are modified from Huff: How to Lie with Statistics. I recommend this book highly

I. There are three kinds of lies: lies, dammed lies, and statistics - Disraeli.

II. Because statistics figure into many political battles about the environment, it is essential that citizens understand how to use statistics to inform themselves.

III. It is equally important for citizens to know when statistics are being misused or misinterpreted or manipulated. "The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify."

IV. Be careful of biased samples.

A. A sampling bias may provide the biggest source of error. Why do we sample? To save time and money. Suppose that you are an environmentalist and you want to know how many spotted owls are in a national forest that is 100 square miles. The only way that you could know exactly how many owls there are is to count every one in the forest. That would be virtually impossible. But if you carefully choose several smaller plots (say 1 mile square) in the forest, and count the number of owls in those plots, and if you assume that the proportion of owls per square mile are the same throughout the forest, you can estimate the total number of owls. If your sample is large enough and selected properly you can probably be close enough to the true number of owls for most purposes. If the sample is not large enough or not selected properly your answer may be far less accurate than an intelligent guess. But since you can represent your answer as having been arrived at scientifically, it will probably carry weight.

 

B. The average Rutgers graduate, class of 1980 makes $53,000 a year. Does this mean that if you send your child to Rutgers his financial future is assured? - of course not.

1. Its probably not possible to know the exact dollar earnings of any large group of people. You probably don't know how much money you made last year unless you only had one salaried job. People who make over $50,000 a year probably have a number of well-scattered investments.

2. This average of $53,000 a year is also undoubtedly calculated from the amounts that Rutgers graduates said that they earned. Some people when asked how much they earn, exaggerate out of pride or optimism. Others try to understate what they earn, especially on tax returns. Having done this they may hesitate to contradict themselves on any other paper. Who knows what the IRS has access to they wonder. Its possible that these two tendencies: to overstate and to understate incomes might cancel each other out. But it is not likely.

3. The third and largest source of error in this estimation comes from sampling bias.

a. This report of average salary comes from a sample. You know that it comes from a sample because common sense suggests that no one knows the addresses of all the living members of the class of 1980. There are bound to be people for whom addresses are not known after 20 years.

b. Of those whose addresses are known, many will not reply to a survey, particularly a questionnaire that asks personal information like how much money did you make. For some types of questionnaires a response rate of 5 to 10 percent would be considered quite high. An alumni survey probably has a much higher response rate, but surely not 100%.

c. So we now know that this average income figure is based on a sample composed of all the living members of the class of 1980 whose addresses are known, and who replied to the questionnaire. Is this a representative sample? That is, can this group be assumed to be equal in income to the unrepresented group, hose who cannot be reached or who do not reply?

(1) Who are the little lost sheep down in the Rutgers Alumni rolls as "address unknown"? Are they the big income earners? Are they the captains of industry? The millionaire real-estate developers? Probably not. The addresses of the rich are easy to come by. Even if they have neglected to keep in touch with the alumni office, many of the richest can probably be tracked down in volumes like Who's Who, or through other rich alumni. Its a better bet that the people who are lost twenty years after becoming Rutgers bachelors of arts or sciences have not gone onto riches. These are probably the people who have gone onto careers as the night manager at the Thunderbird Motel, or are in prison, or unemployed etc. It might take a half dozen of these people to make $53,000 dollars a year. These are the people you won't see at your class reunion if only because they can't afford the trip.

(2) Who are the people who did not respond to the survey, the people who pitched it into the nearest wastebasket? They are often the people who aren't making enough money to brag about. As Darrell Huff (the author of your article) suggests, they are a little like the woman who found a note clipped to her first pay check suggesting that she consider the amount of her salary confidential and not material for discussion with her co-workers. "Don't worry," she told her boss. "I'm just as ashamed of it as you are."

d. So now we know that the average of $53,000 a year represents that special group of alumni of the class of 1980 whose addresses are known and who are willing to stand up and tell how much they earn. Even this requires the assumption that people are telling the truth.

4. The assumption that people are telling the truth is one that can not be readily made. One kind of sampling study, marketing research, suggests that assuming that people are telling the truth can hardly be made at all. A door to door survey was conducted to study magazine readership. A simple question was asked "What magazines do people in your household read?" When the results were tabulated and analyzed it appeared that a great many people loved National Geographic, Time, and Newsweek and hardly anyone liked Playboy, People or Swimsuits Illustrated. In the end the researchers decided that if you wanted to know what certain people read, asking them was useless. You could learn a great deal more if went to their houses and telling them that you wanted to buy old magazines. Then you could count the number of Newsweeks and the number of Cosmo's. Even then of course, you don't know what people have read, you would only know what they have been exposed to.

5. The bottom line is that for a report to be worth much it must be based on a representative sample, which is one from which every source of bias has been removed.

6. The basic kind of sample is called a random sample. It is selected by pure chance from the universe, from the whole. The test of the random sample is this: Does every name or thing in the whole group have an equal chance to be in the sample? The question you should ask about random samples is: where did the researcher find their sample? Every 10th name in the phone book? You only get people who own phones. Voter's registrations? you only get people who have voted in the past few years. Supermarkets or malls? you only get people who shop. What time was the survey conducted at the mall? During the day you get unemployed people, retired people, or people who work close by on their lunch hour. Later in the evening you would get people who don't have small children.

7. A more economical substitute for random sampling , often used in opinion polling and market research is called stratified random sampling. Unfortunately, stratified random samples offer another opportunity for bias. You need accurate information concerning the proportions of the different strata in the universe. You want to accurately represent what is going on in the real world but you can't afford to interview a large random sample, so you don't want to miss categories of people. So you tell your interviewers that you want them to interview a certain number of men and women, a certain number between the ages of 20 and 30, 31 and 40 and so on. A certain number of farmers, a certain number of school teachers etc.

a. That all sounds easy enough but it can be quite difficult. On the issue of man or woman, an interview will have little trouble discerning the correct answer. When it comes to age, answers may become a little more biased. One solution is to pick only people that you are sure are less than 30, and those you are sure are between 30 and 40. But then you are biasing your sample against people who may be 39 but look a little older. As to occupation, what about school teachers who are part time farmers.

b. On top of this, how do you get a random sample within the stratification?

c. Suppose you are an interviewer who must complete 20 interviews today. You see two people on a street corner. One is well-dressed, well groomed and is smiling. The other is wearing leather Harley-Davidson gear, a crossed daggers tattoo, and a sneer. Which person are you going to approach? Do you think your fellow interviewers would make a different decision?

8. As the author of your article notes, the operation of a poll comes down in the end to a running battle against sources of bias, and this battle is conducted all the time by reputable polling organizations. What the reader of reports must remember is that the battle is never won. You should never accept that "67 percent of the American people are against" something without asking the question, " 67% of which American people?

V. Beware of averages.

A. Suppose for a moment that I am an importer of rare wood from the tropical rain forest. Some environmentalists criticize me saying that I am partially responsible for the destruction of the rainforests. I tell them that if I didn't import this wood many people in South America would lose their jobs and that would lower their standard of living. To prove my point I naturally trot out some statistics. Before I started importing this wood the average income of the people who live near where the wood is being cut was $800 a year. Now that I am importing this wood the average income is $5000 a year. So I conclude that without the money cutting the wood provides, the average person would still be starving. The environmentalists claim that all I am doing is speeding up the destruction of the rainforest. The average income of the people who live near where the wood is being cut is still only about $800 - starvation wages. The statistics we both quote concerning average wages are absolutely true and each bolsters our own arguments. How can this be?

B. This is the beauty of doing your lying with statistics. Both of these figures are legitimate averages, legally arrived at. Both represent the same data, the same people, the same incomes. The trick is to use a different kind of average in the two cases. The word "average" has a very loose meaning so that often it is meaningless. This is a trick often used sometimes innocently, sometimes to purposely mislead. If someone tells you that something is an average, you still don't know very much about it unless you ask an important question. "What kind of average are you using". That is, "how did you calculate this average".

C. An "average" is a loose term for what statisticians refer to as a measure of central tendency. There are three measures of central tendency.

1. Mean. The figure I quoted - an average of $5000 is a mean income. That is it is the arithmetic average of the incomes of all of the families that live near where the wood is being cut. You get the mean by adding up all of the incomes and dividing by the number that there are. The mean includes everyone, including those who have grown quite rich because they are the bosses or distributors of the wood, and those who make nothing because they had an accident when they were wood cutting and can now no longer work.

2. Median. The figure the environmentalists used $800 is a median. It tells you that half of the people in question made more than $800 and half made less.

3. Mode. I could also have used the mode which is the most frequently met with figure in a series. In this case if there are more people making $1000 a year than any other amount $1000 would be the average. In this community, if there are now many wood cutters who now make the same wage of $1000 this might very well be the mode.

4. If you are dealing with many of the human characteristics that fall within what is called the "normal distribution" (the familiar bell shaped curve) these three measures of central tendency (averages) would each be quite similar. In many cases however, including the one we are talking about, the distribution is quite skewed. When that happens, the mean and the median can be quite far apart.

D. Let's have another example. Suppose you and two partners own and operate a medium sized polluting factory. You pay each of your 100 employees an average wage of $20,000 and you and your three partners each make $100,000 in salary. This means that you are paying out $2,300,000 in salaries. Thus, the average salary in your company is roughly $23,000 a year. Say you have had a good year manufacturing plastic bowling trophies or whatever it is you make and you find that you now have $300,000 in profits to split up between you and your partners. That would put the average profit made by you and your two partners at an average of $100,000. Now you find out that the government is about to pass some new clean air legislation. You find out that it would cost you about $100,000 to buy the scrubbers that would clean up your dirty emissions. You go to your local legislator and vehemently oppose the legislation, claiming that such an expense would wipe out all of your company's profits. How can you tell such a lie? With statistics of course. It's really quite simple. All you need to do is to raise your salary and that of your partners. Pay yourselves each an extra $100,000. Now you can tell the senator that you are paying out 2,600,000 in wages, (think of what would happen to this community if we went out of business senator!). That's an average wage of just under $26,000 and a profit of just $100,000. So senator, you can see that if this clean air legislation is passed, it will put us out of business.

E. One more thing about averages. You should be skeptical even when you are told how the average was calculated. When a magazine reveals that the median age of its readers is 34 years and that their readers' mean income is $41,000 you should have a question. Why did they use median as the average for age but mean as the average for income. Is it because they wanted to attract new advertisers and the mean income was higher than the median income?

 

F. Also be aware that an arithmetic mean tells you nothing about the range of values that may have gone into the calculation. Many manufacturers make the mistake of making furniture or other goods that are designed to fit average adults. The problem is that the average man is around 5 foot 8 inches. The average woman may be 5 foot 3 inches. The world is roughly divided between men and women which would make the average adult around 5 foot 5.5 inches. But there are certainly men and women who are much taller and much shorter than this average. The danger of designing for the average adult is that there are few such people. What you end up with are products that are designed not to fit the majority of adults.

G. As another example of reporting means but not ranges or standard deviations are the familiar tables listing the milestones of child development. If doctor Spock says that the average child learns to walk by 12 months, any child that fails to walk by its first birthday must therefore be considered retarded. At 12 months half of the parents of 1 year old will be convinced that their children are advanced because they walked before their first birthday, the other half will be wondering from which side of the family their child inherited its intelligence.

VI. Next consider what the author of your article refers to as "the little figures that are not there".

A. Here is an example. A television advertisement tells you that an independent laboratory found that users of a new all natural, environmentally safe, "cruelty free" toothpaste had 23% fewer cavities than users of the leading brand. In fact, the new toothpaste is made largely out of baking soda and doesn't have any fluoride. From experience you know that the leading brand has fluoride, a proven cavity preventative. How can the toothpaste maker tell such a lie? Use statistics.

B. What the advertisement doesn't tell you is that the sample in the independent lab test consisted of exactly 12 people. This is a statistically inadequate sample, but just right for the purposes of the advertisement. Ask any small group of people to use the leading brand for 6 months and count the number of cavities they have. Then switch to the new toothpaste and count cavities. One of three things can happen. These people can have distinctly more cavities, distinctly fewer cavities, or about the same number of cavities after using the new toothpaste. If the independent lab test shows that your small group has distinctly more cavities, or even about the same number of cavities as when they used the leading brand, the new toothpaste maker files the results. Sooner or later however, through chance, a test group is going to show a big improvement worthy of a whole advertising campaign.

C. The importance of using a small group is that when using a large group, any difference produced by chance is likely to be a small one.

D. You can prove that the results of small samples don't really mean anything by tossing a penny ten times. Everyone knows that when you toss a coin it will come up heads half the time and tails half the time. Try it.

E. What does the claim by a drug company for its headache remedy that "you can't buy a stronger pain formula without a prescription", imply? - the product is the strongest pain reliever on the market. What does it really mean: there are other products as strong as ours. In fact most aspirins come in 5mg doses.

F. How about: 50% of the people who used our new-cold medicine had their symptoms cleared up after just 7 days. - As a humorist once remarked: "proper treatment will cure a cold in seven days, but left to itself a cold will hang on for a week.

G. How about this claim: As the result of new water projects sponsored by the communist government of Romania, clean water is now available to every Romanian citizen. The trick here is to use the word available. Does this mean that every citizen now enjoys bathing in sparkling clear water. It does not. They merely have it "available" which could mean that there is a well in the central square of the town which is 10 miles down the road.