My previous attempts to elucidate this theme in various blog-comments postings don’t seem to have had any effect, so let’s go through it all one more time. There is a fundamental misrepresentation, a basic exaggeration, being perpetrated by companies doing telephone opinion polling, which makes their claims of “accuracy” false and meaningless.
Modern opinion polling rests on several assumptions that generally have enough validity that nearly all scientific observers agree on them. However, one of the most basic of assumptions is, I hope to show, simply not capable of being realized in the field, given certain basic behaviors and attitudes in the American public as a whole.
The first basic assumption that the modern polling industry relies on is the idea of measuring a population by the technique of “random sampling.” To put it as concisely as possible, the idea is measuring a whole population by measuring a small part of the population; the scientific assumption that’s being made is that you are getting a “random sample,” a small part of the whole that can serve as a valid example of the whole.
Let’s take the example of loose socks in a big drawer, hundreds of loose socks, some of them white, some of them black, and some of them red. You want to get an idea of how many socks you have of each color, without counting all the socks. Now, if all the socks are thoroughly mixed up, you can just close your eyes, reach in and grab a big handful, and that’s a “random sample.” There was no attempt to get more of one color than another, The socks were all mixed up and not sorted in color-groups in various areas already, there was no “I’m reaching two inches more to the right and not to the left because it looks like more white ones over here,” no bias of any kind in your selection, and again those socks really were thoroughly mixed, so your sample of the entire sock drawer is indeed a truly random sample.
The second assumption is that you can then count your random sample, figure out that you have 40% white socks, 35% black socks and 25% red socks in your sample, and that this is a good representation of your total sock drawer. Knowing your statistical theorems very well, there are then formulas you can apply to compare your sample size to your total “population” of socks, and say, based on these formulas (sorry, it was 35 years ago I studied this stuff in college, we both need to hang out at websites like this), “I have confidence that percentages found in my random sample will match the percentages in my total population, plus or minus 2 percent, 95% of the time.”
Now look carefully at what we’re saying here at the end of the previous paragraph. The sample population is never going to be a perfect representation of the total population. However, based on various experiments that have been done with physical objects which can be mixed up pretty randomly, our formula tells us that with a total population of X and a sample population of Y, in 19 times out 20 that we pull a sample of that size, the percentages we find in the sample will be within Z% of the percentages in the total population. However, because our samples are truly random, 1 time out of 20 that we pull a sample of size Y from a population of size X, we will get a whacky, strange distribution, and the percentage we find in the sample will NOT be so close to the percentages in the total population.
This is glossed over in the typical reporting on polling, where at the most they’ll say something like “847 adults were contacted by telephone for this poll, with a margin of error of +/- 5%.” When you go to the official report of the poll, they will have the more scientific language like “the margin of error for the total sample is +/- 4.5% at the 95% confidence level,” and they assume you know what that means, which is what I tried to explain in the previous paragraph. 19 times out of 20, IF THEY HAVE GOTTEN A TRULY RANDOM SAMPLE, data from a sample of this size should come within a range of 9 points of what we would find if we could truly interview everyone in America. The 20th time, the sample data could be even farther from the total population.
However, the whole problem is they just can’t get a truly random sample by telephone. The official report of a typical poll will say “based on 1007 interviews with adults August 8th to 11th 2010.” The number they need to tell us that they don’t tell us, is how many calls they had to make to get 1007 valid interviews.
Have you ever made 100 or more “cold calls” to the general population for any reason? I’ve done it at least five time, using lists of varying quality, for different political causes since 1994; especially in 2006 and also in 2008, I tried to be an avid caller for the phone banks organized by MoveOn-dot-org, which would typically give us lists of registered Democrats to call in states across the country.
On the West Coast, in making 100 calls, you might typically get 35 answering machines, 20 busy tones, 5 non-English speakers, 5 grumpy immediate hang-ups, 5 undoubtedly sweet old grandmas & grandpas with very little connection to the world of current politics, and 2 or 3 whackos of indeterminate age and background with absolutely no connection to the world of reality. That’s over 70% of your calls wasted, and if you got 20 or 25% of your calls to a competent adult who allowed you to give your message, that’s pretty good.
When MoveOn had us calling New York it was over 50% answering machines and in Connecticut it was over 70% with answering machines, many of which had threatening messages for non-approved callers, and less than 10% of calls answered by actual human beings. By contrast Texas was relatively full of talkative people, and Indiana was practically a land of “Leave It to Beaver”stereotypes minding their landlines, something like 60 to 70% of calls to Indiana Democrats found competent adults ready to talk.
Overall, in the average American phone poll, you’re going to be very lucky to get 30% of your calls to be picked up by a competent adult, even at that rate you have to call something like 3300 numbers to get 1000 interviews, and it’s much more likely that to get to 1000 completed interviews, they’re having to make 4000 to 5000 calls. And it is in those 3 to 4 thousand calls that don’t result in completed interviews, that the perfect randomness needed to achieve the stated margin of error is being lost.
Some people are concerned that survey calls are only being made to landline phones, not cell phones, and that there would be significant differences in the two populations. Many survey companies, however, are trying to reach cell phones too (using computer-generated random phone numbers, in the hope a certain percentage will reach valid cell numbers). What I’m trying to say is that there are probably subtle yet not-insignificant differences among all the types of situations you reach in trying to use telephones as a survey tool. On a given day, the answering machines, busy tones and not-at-homes may not be evenly distributed among Republicans and Democrats, and the care that the survey company’s interviewers take to be patient with the marginally coherent seniors will have a huge effect on the results obtained from that 5 or 10% of the population. When 60 to 70% of American adults are just not available to even hear you say “would you like to participate in our survey today?,” you just can’t say that you are actually getting a random sample of the public. The “tail” of people who will talk to you is wagging the “dog” of the people who can’t/won’t talk to you, there are scores of psychological and sociological self-selection factors which make it highly unlikely that the minority you can reach by telephone is representative of the majority you can’t reach by telephone.
The biggest problems come with the “refuse to take the survey” segment – which can get up to 20% or more of your total calls, including the “grumpy immediate hang-up”response. I say it is extremely problematical to assume this segment is evenly distributed among all political feelings, and that depending on the season and the region and persons depressed or optimistic about their political tendency’s prospects, there can be all sorts of social and personal dynamics which cause liberals, or conservatives, or other identifiable subgroups, to disproportionately refuse to take the survey – or to disproportionately volunteer to take the survey – in other words, making the whole sample not actually random. And finally it’s almost certain that the 3-5% who manage to live in this modern world without telephones at all have very different characteristics than the majority who do have telephones; however, this group may also be a group with very little voting participation as well.
This failure of telephone surveys to reach a truly random sample is easily visible, I believe, in the published results of public opinion polls in the relatively few cases where there are multiple companies reporting on the same election races (Presidential races and high-profile Senate and Governor races). If these various companies were all reaching truly random samples, the reported results should be much closer to each other’s results than the published findings we see, which is that different companies can report results that differ by 5 to 7 points or more – in other words, differences that are equal to or greater than the claimed “margin of error.” And when we track the results of different companies on the same question over time, and we get more than 20 published results where the highest and lowest results, even on similar dates, are 8 to 10 points apart, all on polls which supposedly have scientific margins of error less than that range, then we have clear, empirical evidence that these companies are not actually getting scientific results which fall within the claimed margin of error 19 out of 20 times. They aren’t getting truly random samples, and their true margin of error is much higher than claimed.
I do believe that the variations from true randomness that are experienced in telephone polling are themselves highly variable from one day to another: the busy-tones and refuse-to-answers may be tilted to conservatives one day, and toward liberals on another, and that these variations are actually helping the polling companies stay close, in their results, to the trend in the population as a whole (which, remember, is the “unknown” value that polling is attempting to measure).
Yet when we consistently see that differing companies trying to measure the same election race or issue find results that differ by 5 points or more, it should be clear that the claims of a “scientific” calculation of “margin of error at the 95% confidence level” are simply not being achieved in telephone polling, because they are not actually achieving a random sample of the population. Yes, we can regard the results of a particular poll as a “snapshot in time” of the public attitudes – but when they claim a margin of error of plus-or-minus 4 points, or 5 points, the problems of lack of randomness in their sample means that the true margin of error is probably 2 or 3 times what they claim – when they claim a 4% margin of error, it’s probably an actual margin of error or plus-or-minus 8 to 12 percentage points.
So please, readers and writers, don’t treat poll numbers as if they are dependable, closely accurate or God-given. There is a reasonable chance that the polling company got it right – IF you mentally double the stated margin of error. Yet always remember, the claimed margins of error in American public polling are not being achieved, because no polling company can actually achieve a truly random sample of the public in the age of answering machines, and the prevailing social and personal attitudes that lead 5 to 20% of the Americans you can reach to say, “no thanks, I won’t answer your questions today.”