When you learn new things, you can develop misconceptions. Maybe it’s the result of something you didn’t understand correctly. Maybe it’s the way the instructor explains something. Or maybe, it’s something unspoken, something you assume or infer from what was said. Here are six misconceptions about statistics you might have gotten from Stats 101.
Misconception 1: “Statistics is Math“
How could you not come to believe this? Even before you took Stats 101, you learned you had to take the course to fulfill a math requirement. It was taught by the Math Department. Then when you took the course, it was all numbers. Homework and exams were almost all about calculations. Stat 101 was all math. Statistics must be all math too.
Statistics uses numbers but numbers are not the primary focus of statistics, at least to most practitioners. Applied statistics is a form of inductive reasoning that uses math as one of its tools. It also uses sorting for ranks, filtering for classification, and all kinds of graphics. The point of using statistics is to discover new knowledge and solve problems through the use of inductive reasoning involving numbers. It’s not just about doing calculations. That’s why it’s required for college majors in business, social sciences, and many other disciplines. That’s why it’s taught by professors in all those disciplines, too. Yes, it’s required for math degrees and is taught by math professors at many schools. That’s so there will be mathematical statisticians who will invent statistical tools for the applied statisticians to use. You can love statistics and be good at statistical thinking even if you think you hate math.
Misconception 2: “Statistics Requires a Lot of Data“
Stats 101 doesn’t teach you how to work with individual pieces of information, like a solitary measurement, or a picture, or eyewitness testimony. Statistics uses data, lots of data, the more data the better. The number of samples is a term in almost every equation. And anyway, that’s what the law of large numbers says, the more data the better the results.
The number of samples you really need for a statistical analysis is contingent on how much resolution you want. Think of the resolving power of a telescope or a microscope, or the number of pixels in a computer image. The greater the resolution, the more detail you’ll see. It’s the same way with statistics (https://statswithcats.wordpress.com/2010/07/17/purrfect-resolution/).
What’s more important than the number of data points is the quality of the data points. In statistics, the quality of a set of data point is how well the data points represent the population from which they are drawn. But representative data can be incredibly difficult to generate. How do you decide which registered voters are actually likely to vote in the next election? How do you decide who might use a product you might want to sell?
The number of samples is easy to determine. The quality of the samples is virtually impossible to determine. Nevertheless, what you should remember is that more data may be better but better data are always best.
Misconception 3: “Data are Dependable“
In Stats 101, you do a lot of number crunching. You use small datasets and big datasets, real data and fake data, but never were you told to delete data. You figured that data are like facts. You don’t delete them for any reason or you will bias your results.
Data are messy. Most newly generated datasets have errors, missing observations, and unrepresentative samples. Some population properties may be under-represented or over-represented. There may be samples that should not be included in the analysis, like replicates, QA samples, and metadata. All these problems with data require a lot of processing before an analysis can begin (https://statswithcats.wordpress.com/2010/10/17/the-data-scrub-3/). In fact, data scrubbing often consumes the majority of a project budget and schedule, but you have to do it anyway.
Misconception 4: “Statistics Provides Unique Solutions“
In all the problems your Stats 101 instructor solved in class, and all the homework assignments you did, and all the exams you took, there was only one “right answer” to a question. So, any statistical analysis should provide the same results no matter who does it.
Even if two statisticians start with identical data sets, they may not come to identical results, and sometimes, even identical conclusions. This is because they may make different assumptions and scrub the data differently. Furthermore, there may be more than one way, even many ways, to approach a problem (https://statswithcats.wordpress.com/2010/08/22/the-five-pursuits-you-meet-in-statistics/). There may also be different statistical analysis techniques that can be used, or even different options within the same technique (https://statswithcats.wordpress.com/2010/08/27/the-right-tool-for-the-job/). It would probably be more surprising for two statisticians to calculate the same results from a dataset than for them to have some differences. Just like most problems in the real world, there may have more than one right answer from a statistical analysis.
Misconception 5: “Statistics Provides Unambiguous Results“
Results are either significant or they’re not. That’s pretty unambiguous.
Statistical results are based on data and assumptions about the data. Change the number of samples and you change the resolution of the statistical procedure. Change the data or the assumptions and you change the estimates of variability. Change the resolution or the estimates of variability and you have different results. There is indeed uncertainty in uncertainty. Sometimes uncertainty brings with it ambiguity.
Is there really a difference between Type I error rates of 0.049 and 0.051? Many decision makers who never got past Stats 101 think so. But interpretations of these results are based on the assumptions and biases a statistician brings with him. One statistician might take a firm stance and say “significant” and another might say, “maybe not.” Results have uncertainty; interpretations have ambiguity, and decisions have risks. That’s statistics.
Misconception 6: “It’s Easy to Lie with Statistics“
Darrell Huff wrote “How to Lie with Statistics” in 1954 (http://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728/ref=pd_sim_b_2). Michael Wheeler wrote “Lies, Damn Lies, and Statistics: The Manipulation of Public Opinion in America” in 1976 (http://www.amazon.com/Lies-Damn-Statistics-Manipulation-Opinion/dp/0393331490/ref=sr_1_17?ie=UTF8&qid=1298231730&sr=8-17). John Allen Paulos wrote “Innumeracy: Mathematical Illiteracy and Its Consequences” in 1988 (http://www.amazon.com/Innumeracy-Mathematical-Illiteracy-Its-Consequences/dp/0809058405/ref=ntt_at_ep_dpi_1). Joel Best wrote “Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists” in 2001 (http://www.amazon.com/Damned-Lies-Statistics-Untangling-Politicians/dp/0520219783/ref=sr_1_3?ie=UTF8&qid=1298231253&sr=8-3).
So it must be pretty easy to lie with statistics since everybody is doing it.
It’s hard to do statistics right but it’s also a lot of work to do them wrong, too. You have to collect data, crunch the numbers, and cook up your story, or perhaps more correctly, cook up your story, make up the data, and call the press conference. But if you’re going to mislead an audience, it’s much easier to use made up facts, phony anecdotes, and illogical conjectures. So why do so many people, particularly politicians, even bother lying with statistics? It’s because numbers provide credibility. If you have little credibility yourself, using numbers can confer the illusion of expertise. And that is why people use statistics in the first place.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.
There are some statistics beyond sample techniques, especially here in the former Soviet Union. Most of Rosstat activities are, say, accounting, e.g., count of all births and all deaths, than doing rates and other interesting conclusions from this computing. It is closer to initial (German) statistics than to British (modern|Galton?) tradition.
Great point. Statistics is many things to many people in many places.
I never received misconceptions 2-6 in my stats class as the professor always let us know how this stuff functions in the real world (that is what statistics is meant to be about). I do have a major problem with your first “misconception.” What do you mean statistics isn’t like “real” maths because it is not about the calculations but about discovering new knowledge? What is your conception of real mathematics? Have you ever even taken a mathematics course outside of statistics? All of mathematics is about gaining new knowledge, about solving problems and sometimes even how those problems are solved. Frankly I’m no fan of statistics, there are times when statistics is used in a fascinating way to solve a real world problem (Watson likely used some fancy statistical methods for example) but I for one am a much bigger fan of the mathematics behind statistics. And no I’m not talking about grinding through 95% confidence intervals or running your finger across numbers in a gaussian distribution, I am talking about the deep mathematical analyses, such as measure theory, that justifies probability theory and statistics. It is a fascinating topic and you shouldn’t turn your nose at it because it’s “mathematics” whatever you may thing that is.
The point I was trying to make is that applied statistics is more a way of thinking and less a way of calculating, so a person can love statistics even if he or she hates math . I didn’t intend to belittle math, only try to mitigate the fear of math in some readers.
“applied statistics is more a way of thinking and less a way of calculating, so a person can love statistics even if he or she hates math”
But the statistical way of thinking is similar to the mathematical way of thinking.
The idea that math is about numbers and calculating is its own misconception #1, which seems to come from what people learn of math in high school.
If you look at the entirety of what constitutes math, you see subjects like geometry, topology, mathematical logic, category theory, and many others, most of which are about much more than numbers and calculating, and some of which don’t involve that at all, instead focus on reasoning about some subject via abstract mathematical models.
What all of these mathematical disciplines have in common is that they involve the creation, study, and application of formal systems, in which problems are analyzed by applying a set of formal rules to arrive at conclusions. Numbers and calculating are just one special case, or branch of that. Statistics is another such special case.
Of course, in applied statistics as in applied math, you necessarily go beyond the math itself. But this doesn’t disqualify statistics as a mathematical discipline – it’s just that applied statistics is not a pure mathematical discipline.
Pingback: Tweets that mention Six Misconceptions about Statistics You May Get From Stats 101 | Stats With Cats Blog -- Topsy.com
Well put and I don’t disagree. You can certainly say the same of any discipline. Geology isn’t just about rocks. Psychology isn’t just about mental illness. Art isn’t just about color. And, of course, math isn’t just about numbers. As you say, “The idea that math is about numbers and calculating is its own misconception #1, which seems to come from what people learn of math in high school.” Further, in the best-selling statistics book of all time, How to Lie with Statistics, Darrell Huff says “There is terror in numbers. … Perhaps we suffer from a trauma induced by grade-school arithmetic.” From their pre-college experiences, many students associate math with number crunching and statistics with math. My objective in saying that it is a misconception that “Statistics is Math,” even highlighting it as the number one misconception, was to dispel that illusion that so many people have about statistics being “terror in numbers.” If students can’t get by that perception, none of the other “misconceptions” mean anything.
As to whether statistics is a discipline of math, I believe it’s a matter of perspective. Certainly mathematical statistics is, but perhaps not the many areas of applied statistics. Is econometrics or biostatistics a subdiscipline of math because they employ statistics? Is math a subdiscipline of languages because it uses a language to present its results (not to mention all those Greek letters). Does it even matter? I believe it does because students in majors other than statistics should be able to view statistics as a tool, like writing or public speaking, which they’ll use in their future work.
So my point remains that a student can love statistics even if they think they hate math because statistics uses numbers but numbers are not the primary focus of applied statistics.
Pingback: Searching for Answers | Stats With Cats Blog
Statistics has taught me some new things. I have always hated math but now I have something new to work with. Even though I don’t always get what I’m suppose to be doing it gets difficult for me. Stats use numbers and the numbers could be something that I’m assuming could be done one way but should be doing another way. I am Happy that I have taken this class with Professor Wilcox. Thank you