Five Eras in the Evolution of Probability and Statistics

Statistics isn’t a new thing. It dates back at least fifty centuries beginning as counts in the form of tally marks for keeping track of crops, animals, people, and time. From there, it evolved with the demands of government and business, supported by academic inquiry and the growth of technology.

The Era of Prehistoric Statistics

Statistics began with the simple act of counting. It sounds unremarkable today. Humans learn to do it in childhood. Even some animals can do it. But thousands of years ago, it was momentous. There was no precedent for it. The concept was unheard-of; there was no process to follow. Then came tally marks as both a tool to count and to remember counts.

As counting became more sophisticated in the ancient world, tally marks evolved into symbols. Those symbols evolved into numbers. Numbering systems were then created to allow for bigger numbers by using place values where the position of a digit determines its value.

Sumerians and later Babylonians had numbering systems around the 30th century BC. Roman numerals appeared in the 9th century BC while Egyptian and Greek numbering systems arose in the 4th and 5th centuries BC. Hindu-Arabic numerals originated in India in the 6th or 7th century and were introduced in Europe in the 12th century by Leonardo Pisano (aka Fibonacci), an Italian mathematician.

Ancient rulers used these numbering systems to perform censuses of their citizenry and inventories of their crops. Early scholars even used statistical principles to perform rudimentary calculations for solving real-world problems. No doubt, the governments also used numbers to administer taxes.

The Era of Emerging Statistics

Early developments of number systems laid the groundwork for the evolution of statistics and probability as distinct fields of study. Still, before the 16th Century, ideas about numbers were more philosophical than mathematical. The notable exception came in the mid-1500s when Italian mathematician Gerolamo Cardano calculated the probabilities of dice throws.

Statistics became established in society by the mid-1600s when governments began tabulating their resources and economies beyond just making a census of their citizenry. John Graunt made statistical inferences from public health data. Most notably, Blaise Pascal and Pierre de Fermat established the mathematical foundation for probability. In the 1700s, Reverend Thomas Bayes introduced the conception of probability that evolved into Bayesian statistics.

The 18th and 19th centuries were when the fundamental concepts and the mathematical underpinnings of statistics were developed.

The Era of Blooming Statistics

The 1800s began with French mathematician Adrien-Marie Legendre describing the method of least squares in a book about calculations in astronomy. He used least-squares as a way to weigh errors. German mathematician Carl Friedrich Gauss claimed to have developed the method a decade earlier. He had used it in his work developing the Normal distribution, which may have been discovered sixty years earlier by French mathematician Abraham de Moivre.

English nexialist Sir Francis Galtondevelopedfoundational concepts involving correlation and regression while working in genetics. His student, Karl Pearson, further developed his ideas. The Pearson product-moment correlation coefficient was named after him for his efforts.

Sir Ronald Fisher was dubbed the father of modern statistics for his work on the analysis-of-variance (ANOVA) and the statistical design of experiments. William Sealy Gosset pioneered small-sample experimental design under the pseudonym Student including the development of the t-distribution and test of statistical significance.

Even with all the advancements, or maybe because of them, cynics railed against “lies, damn lies, and statistics.” A few decades earlier, Pearson had also pointed out that “correlation does not imply causation.” Nonetheless, advances made during the 19th century secured probability and statistics as a rigorous mathematical discipline.

By the early 1900s, statistical analysis were being integrated in a wide variety of scientific disciplines as well as in industry and government.

The Era of Mature Statistics

The mature era of statistics began in the 1940s when World War II brought an explosion of technology. Statisticians played their part in war efforts, benefiting greatly from the recognition, but the biggest boost for statistics came with the introduction of programmable computers. The government led the way in analyzing data, both census and business data, with their new computer resources. Universities, pollsters, and businesses followed. Suddenly, numbers were everywhere. Every aspect of American life was enhanced by statistics. Nine out of ten doctors said so.

Then in 1954, Darrell Huff published “How to Lie with Statistics,” perhaps as an admonition against believing the results of statistical analyses without careful consideration. It is ironic that Huff, a journalist with no formal training in statistics, would later testify before Congress in the 1950s and the 1960s against the statistical relationship between cigarette smoking and lung disease.

Nevertheless, Huff’s book became a best-selling book on statistics. It sold over a million copies in English by the 2000s and has been translated into over twenty languages. It has also spawned numerous books and articles echoing the same theme of caution in consuming information involving numbers. Unfortunately, many people take these books (or just their titles) to mean that all presentations that use statistics are dishonest. This unfounded belief has returned statistics to the false notoriety it suffered from over a century before.

The Era of Pervasive Statistics

After the end of World War II, the popularity of statistics was poised to explode again. The dinosaur mainframes were replaced by hordes of small personal computers with user-friendly (by comparison) statistical software and programming languages.

The rise of business use of statistics led to the ascendency of data science, the melding of statistics with data engineering, programming, and domain expertise. It was thought by some to be a necessity in the age of computers, internet communications, big data, and high-stakes decision making with profit motives. The creation of Amazon in 1994 ultimately led to an expansive use of statistics and artificial intelligence (AI) to increase sales, a trend that has spread to many other consumer businesses.

Academia also played a part in the rebirth of statistics after the 1950s. Statistics began to be a requirement for more and more degrees outside of STEM, including some degrees in history, archaeology, geography, agriculture, journalism, graphic communications, library science, culinary science, and linguistics.

One repercussion of academia’s new focus on statistics has been that more people consider themselves to be competent in statistics after learning just a few concepts and formulas in HS. Some consider themselves to be experts after taking Stats 101. While not being expert enough to conduct an analysis of their own, they still consider themselves to be knowledgeable enough to argue using numbers on social media. That’s still better than opinionated arm-waving or hearing lies, damn lies, and statistics all the time, though. Right?

At the same time that statistics was evolving into a bigger, more complex discipline, an even bigger revolution grew in mass communications.

Technology. The internet evolved from a secure communications system for military researchers in the 1960s to a consumer necessity in the 1990s. Satellites and wireless technologies are now spreading capabilities for communication everywhere.

Telephony. Wireless phones first appeared in the 1970s and evolved into smartphones by the 1990s. Mobile phones became as popular as landlines in the 2000s, creating issues with statistical surveys because the demographics of the owners were different. By the 2010s, more people owned wireless phones than landlines. Not many people rely exclusively on landlines anymore.

Messaging. Instant messaging began in the 1970s and has evolved and expanded since. Bulletin Boards Systems (BBSs) were popular in the 1980s and 1990s. Internet Relay Chat (IRC) began in the 1990s and is still being used. Audio podcasts date back to the 1980s but didn’t achieve much popularity until the 2000s. Text blogging began in the 1980s and expanded into video blogging in the 2000s. YouTube appeared in 2005.

Social Media. Social media began in the 1990s with GeoCities, Classmates, and SixDegrees. In the 2000s, Friendster and Myspace appeared only to be overtaken by Facebook in 2004 and Twitter in 2006. Two decades later, there are too many platforms to mention.

Books. Numbers differ, but there are from a half million to four million book titles published every year. eBooks represent only a fraction of that amount but while the growth of hardcopy book publishing is static, ebook publishing is increasing exponentially. eBook technologies have been around since the 1930s but didn’t become popular until the 2000s. By 2010, Amazon was selling more ebooks than hardcopy books. Amazon currently holds over 60 thousand titles related to statistics, including over 2,000 introductory statistics titles, over 3,000 college statistics titles, and over 500 high school statistics titles.

Business. A critical change in how we receive news began in the 1970s when the major outlets decided to require their news divisions to be profitable. ABC led the way with non-traditional news programming, like 20/20 and Nightline, and other networks followed. The aim to be profitable pressured news outlets to sacrifice quality for quicker news releases. In the 1980s, all three major networks were bought by larger corporations, which increased demands for higher profits.

Government. Two major changes in Federal regulations transformed how information was made available to the public. From 1949 until 1987, the Fairness Doctrine required licensed radio and television broadcasters to devote some airtime to discussing controversial matters of public interest and to air contrasting views on those matters. The Telecommunications Act of 1996 increased the number of television stations that a single company could own, which led to a major consolidation of media outlets. Before the Act, fifty companies controlled the media in America; by 2011, only six did. These policies had the effect of limiting which news stories were presented and how they were framed. At the same time, smaller information providers multiplied on the internet giving people greater latitude in picking their sources of information. Not all of those new information outlets provided the same level of veracity, however. Who told them?

Society. There has always been gossip, rumors, unverified reports, propaganda, and other falsehoods passed from person-to-person in society. But today, mass communications has magnified the availability of those falsehoods leading to the proliferation of fake news in the media. The number of websites and podcasts spreading fake news increases every day with little regulation. There are even sites for generating fake news articles and memes, alternative facts, and spurious correlations. Sometimes it’s difficult to distinguish between what is real information, what is satire and parody, and what is fake. This makes understanding and trusting statistics, and all of science in general, all the more difficult.

All of these changes happened in about forty years. Depending on when you were born, you may not even recognize how life worked before these advancements. Certainly, they have had an enormous impact on how the science of statistics is being presented in the media. This is why it is essential to become an informed consumer of statistical information.

Read more about the fundamentals of statistics in Chapter 1 of Stats with Kittens.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , , , | Leave a comment

The Caitlin Clark Effect

Even if you’re not a hard-core fan of the Women’s National Basketball Association, you may have heard of the Caitlin Clark Effect. The hypothesis is that Caitlin Clark, the first pick in the 2024 WNBA draft by the Indiana Fever, is the cause of the unprecedented increase in league attendance, attributable to her extraordinary talent and charisma.

This scatterplot shows the average attendance per game during the 2023 and 2024 seasons for the 12 teams in the WNBA. The plot shows that overall attendance at WNBA games increased by 48% from 2023 to 2024, with most of the teams seeing 16% to 68% more fans.

The Indiana Fever were a MAJOR outlier to this trend, however. They had a 319% increase in attendance from 2023 to 2024, which is why the team’s attendance plots in the upper left corner of the graph far away from the rest of the teams. They went from the low end of team attendance in 2023 to the highest in 2024.

The effect is even more impressive considering the entire history of league attendance at over 6,000 games from 1997 to 2025. Attendance had dropped from about 10,000 attendees per game before 2000 to about 6,000 just before Clark was drafted. That’s a loss of about 160 attendees per league game averaged over 24 seasons. It doesn’t sound like much, but it adds up. It’s like a slow oil leak from your car. You keep seeing a few drops of oil on the garage floor and before long, your check-engine light comes on. After Caitlin Clark was drafted, average attendance increased by 5,000 attendees per game by 2025. That’s what’s called Caitlin Clark Effect, but is she the cause of it?

One widely cited set of criteria used to evaluate causality was described in 1965 by Austin Bradford Hill, a British medical statistician. His nine criteria are

  • Strength. A relationship is more likely to be causal if the correlation coefficient is large and statistically significant.
  • Specificity. A relationship is more likely to be causal if there is no other likely explanation.
  • Temporality. A relationship is more likely to be causal if the effect always occurs after the cause.
  • Gradient. A relationship is more likely to be causal if a greater exposure to the suspected cause leads to a greater effect.
  • Plausibility. A relationship is more likely to be causal if there is a plausible mechanism linking the cause and the effect.
  • Coherence. A relationship is more likely to be causal if it is compatible with related facts and theories.
  • Analogy. A relationship is more likely to be causal if there are proven relationships between similar causes and effects.
  • Consistency. A relationship is more likely to be causal if it can be replicated.
  • Experiment. A relationship is more likely to be causal if it can be verified experimentally.

Hill’s criteria were established for experimental studies. For observational studies, the Consistency and Experiment criteria do not apply.

Evidence that Caitlin Clark is the cause of the great increases in WNBA attendance includes:

  • Temporality is supported by the great increases in average game attendance, as shown in the graphs.
  • Gradient is supported by the continued and increased attendance even after her rookie WNBA season.
  • Coherence is supported by similar increases in merchandise sales for Clark and other players in the WNBA.
  • Analogy is supported by similar increases in attendance for other notable players in sports, like Michael Jordan, Pelé, Walter Payton, Tiger Woods, Wayne Gretzky, and many others.
  • Specificity is supported by her team, the Indiana Fever, having by far the largest increase in attendance despite the introduction of new players to other teams from the 2024 draft.
  • Plausibility is supported by the great interest Clark drew in college basketball at Iowa.

Ironically, the strength of the correlation in the data, the criterion most people associate with causality, isn’t a factor. In fact, it is the opposite. Clark’s presence is an outlier to the trend in historical attendance. So, while correlation does not always imply causation, causation does not always imply correlation.

While no individual criterion is foolproof, causality is more convincing when many of Hill’s criteria are met. Yet despite these observations, there are still people who do not believe that Caitlin Clark is the cause of the increase in WNBA attendance. What do you think?

Read more about data relationships and causation in Chapter 7 of Stats with Kittens.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , , , , , , | Leave a comment

Discover Stats with Kittens and Stats with Cats

Discover statistics with felines for support in my new books, Stats with Kittens and  Stats with Cats.

Stats with Kittens, Growing up with Data, Charts, Surveys, Correlations, and Other Floofy Playthings is aimed at readers who want to learn the practical basics of statistics without taking a formal course. It explains why statistics is integral to modern society and why many schools require an introductory course in statistics to get a degree. Stats with Kittens covers the history of statistics, causality, critical thinking, bad science, and how to deal with statistical jargon. It provides a stepwise approaches to evaluating presentations of statistical analyses that appear in the media. It’s not a textbook; it has kittens. Stats with Kittens is available in paperback and hard cover.

Stats with Cats, The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis (2nd edition) is aimed at readers who have completed a class in introductory statistics and now want to conduct their own analyses. The book is about applied statistics, from designing datasets and deciding what analyses to conduct, to writing the final report. It’s not a textbook; it has cats. Stats with Cats is available in paperback and hard cover.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , , , , , , | Leave a comment

The Stats with Kittens Song

The Stats with Kittens Song

[Verse 1]
I enrolled in math last spring like “la-di-da,”
Now I’m googlin’, “what are statistics, haha?”
My mind started bogglin’, thought I was screwed now
Til I opened up this book, took a look and MEOW!
Yeah on each page there is some tiny floof
Getting me through these formula proofs
Whiskers twitching at “standard deviation”
Furballs for emergency grade remediation.

[CHORUS]
Ohhh, I’m studying with kittens,
They’re my emotional support,
Every time I see percent signs
I require their report.
If the numbers start to chase me
And my brain begins to scream in pain
There’s a kitten on a football
Who wants me to up my game

[Verse 2]
“Correlation’s not causation!”
Preaches a kitten in a tie,
Meanwhile I’m just nodding like
“Okay now I understand why.”
There’s a chapter on jargon but
I still got no idea what it means,
But those kittens look so confident,
That I am trusting in their beans.

[CHORUS]
Ohhh, I’m studying with kittens,
Adorable ears and pretty paws,
Pouncing on the hardest formulas
And showin’ all the outliers their claws
When I finally pass this crazy class,
It won’t be just cuz I improved my mind—
It’ll be this team of tiny kittens
They are working overtime!

[Bridge]
Flashcards on my bedroom floor didn’t help (NO!)
Highlighters in twenty colors maybe more didn’t help (NO!)
Standard textbooks didn’t help, they put me to sleep (UH OH!)
But those hits of kitten cuteness are exactly what I need (LET’S GO!)

[Verse 3]
So when fall arrives and I take my seat,
Instead of trembling hands and cold feet,
I’ll whisper, “Kitties, don’t fail me now,”
And somehow, wow, I’ll remember how.
No longer do I find statistics so chaotic and random
Because I read the book now I’m part of the fandom
So when my classmates ask for help lookin’ stricken
Imma use what I learned to do and pull out a kitten

[CHORUS]
Ooooh yeah, I’m studying with kittens,
Now significantly less afraid,
With them frolickin’ through my field notes
I know I got it made to pass the grade.
So when there’s no margin for error
You must learn math and get it done
Buy this book of poofy kittens
And you’ll make statistics fun!

[Outro]
Yeah, buy “Stats with Kittens” by Charlie Kufs
And you’ll make statistics fun!

You can purchase the paperback version of Stats with Kittens at:
https://www.amazon.com/Stats-Kittens-Growing-correlations-playthings/dp/B0FSCFF9YD
and the hardcover version at: https://shop.ingramspark.com/b/084?params=kqb8wXHfmW90CbCimX7kOGRTsnDodZclXljzF7XtGYi

You can purchase the paperback version of Stats with Cats at:
https://www.amazon.com/Stats-Cats-Domesticated-Statistics-Analysis/dp/B0FYRMT83S/
and the hardcover version at:
https://shop.ingramspark.com/b/084?params=ZXmrNmrJuO17K8Qpat4THNVuj8jRkNLUksuzuADSHiQ

#statswithkittens, #statswithcats, #behindthebook, #moodboost, #catsandbooks, #statisticsmadeeasy, #learningstatistics, #learnstatistics, #stats101, #statistics101, #statisticshelp, #statshelp, #highschoolstatistics, #collegestatistics, #statisticsinlife, #statisticsstudent, #statsanxiety, #statisticsanxiety, #teachingresources, #statisticsresources, #statisticsselfhelp, #statistics

Posted in Uncategorized | Tagged , , | Leave a comment

Stats with Cats, 2nd Edition

The second edition of my book, Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis, is now available on Amazon, IngramSpark, and in bookstores

Stats with Cats is aimed at readers who have taken an introductory course in statistics, call them Stats 101, and want to conduct their own analyses either at work or in their personal lives. The book is about applied statistics, from designing datasets and deciding what analyses to conduct, to writing the final report, which are often neglected in Stats 101.

Stats with Cats consists of 31 Chapters in 7 Parts.

Part I reviews the jargon and concepts you heard in Stats 101. It explains the basic jargon of statistics, including data, samples, and variables; the variety of measurement scales that can be used to characterize phenomena; variance and why it is such a fundamental concept in statistics; what models are and how statistics uses models to create models; the fundamental assumptions inherent in statistical inference and what happens when the assumptions are violated; and the five broad goals a statistical analysis may have.

Part II is about the skills, tools, materials, plans, and resources you’ll need to conduct your own analyses. Included are descriptions of how organizations can rely on good data analysis to make decisions; how to decide if you should do an analysis yourself or get someone to do it for you; how to set up a data analysis project so that it gets done right the first time; what software and information sources you’ll need; and what problems you may encounter.

Part III will show you how to create your own datasets for analysis, including deciding: what variables to measure and how to measure them; how to select samples; types of data that may be suited for statistical analysis; how to recognize and control sources of variability; and how to put samples and variables into a format that statistical software can analyze.

Part IV is about how to preprocess data so it is ready for analysis. Included are descriptions of: kinds of errors that occur in real-world datasets and how to find and correct them; and ways you can augment your dataset to make your analysis more thorough.

Part V connects the practical framework of applied statistics to the academic content of Stats 101. It includes descriptions of procedures and hints for what to calculate, what to plot, and what to look for when you first explore your data; how to present information in simple windows; and how you can analyze text.

Part VI explores advanced data analyses that are used to build and evaluate models. Included are discussions of: the process of creating a statistical model; advanced statistical analysis techniques you didn’t hear about in Stats 101; and why even the most credible models can fail and what you might do about it.

Part VII is about aspects of practical data analysis, applied statistics, that aren’t usually mentioned in Stats 101. Examples include: how to write data analysis reports; how to comment on someone else’s statistical analysis even if you don’t know a lot about statistics; and suggestions for how you can practice the things you’ve learned.

Stats with Cats contains 56 figures, 43 tables, and over 500 images of cats in 568 pages. There’s also a 444-word Glossary at the end so that you can look up any unfamiliar jargon you might encounter. Stats with Cats has a Flesch-Kincaid Grade Level readability of about 10.2 (9.9 to 11.7), so it is suitable for both young adult and mature readers.

In the second edition, I deleted esoteric cultural references that didn’t age well, internet links that went dead, and screen-captures of high-end statistical software that few people have access to. I added chapters on data-driven organizations, graphical windows for presenting data, how to analyze text, and how to write statistical reports. I’ve also added more pictures of cats.

And the cats? They provide emotional support for people who experience math anxiety. When the statistics become too intense, there’ll be a picture of a cat nearby to restore calm. They are your domesticated guides to statistics, models, graphs, and other breeds of data analysis. Come for the cats; stay for the stats.

So, whether you’re a business person or other professional who has to conduct some statistical analyses, or supervise someone else who is conducting a statistical analysis, or review a statistical analysis done by someone else, this is a book you’ll need to read. And if you just want to use some statistics to manage and explore your personal life, this is a book you’ll want to read.

Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis (2nd edition) as a paperback or hardcover. Read them to your cats. Discover more about using statistics at my Stats with Cats blog and other thought-provoking observations at my Random Terrabytes blog. Join other fans at my Instagram and LinkedIn pages.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Stats with Kittens

My new book, Stats with Kittens, Growing up with data, charts, surveys, correlations, and other floofy playthings, is now available on Amazon.

Everybody needs to understand statistics. It’s an essential part of everyday life in America, more so than any other type of higher math. Statistics isn’t just mathematics, it’s inductive reasoning with numbers. It’s a fundamental component of modern literacy.

Stats with Kittens is aimed at readers who want to understand statistics well enough to avoid misleading data analyses and the media articles about them. It describes the many ways in which they can be misleading and shows how to identify those issues so you become an informed consumer of statistics. It explains why many students have to take an introductory course in statistics to get their degree or professional certification.

Stats with Kittens has an average Flesch-Kincaid Grade Level of 11.5 (11th grade), which is considered suitable for 16 year olds and above. Its readability ranges from 9.6 (Chapter 2 on Probability) to 12.6 (Chapter 8 on Data Models). In its 524 pages, Stats with Kittens has 75 technical figures, 41 tables, and over 500 pictures of kittens. The kittens are provided to ease math anxiety and make your reading experience tolerable if not enjoyable. The book uses metaphors and analogies, real-life examples, and step-by-step approaches to explain how to understand presentations about statistical analyses. It also has a Glossary of over a thousand definitions to facilitate comprehension.

Stats with Kittens explains how you can deal with the jargon you’ll hear in presentations involving statistics. Some of the jargon involves repurposed words, that is, English words with alternative meanings. These may be difficult to identify because you have to consider the context in which the words appear. The easiest to spot are eponyms (terms named after a person) and special words (unique words used exclusively in statistics). These are less common, at least until you get to more advanced levels of statistics.

Stats with Kittens discusses a variety of fundamental questions individuals new to statistics may have. Examples include:

  • How is statistics used to analyze data?
    There are five approaches—describing, classifying, testing, predicting, and explaining (Chapter 1).
  • Where do probabilities come from?
    Probabilities come from four sources—logic, data, models, and oracles (Chapter 2).
  • How are data measured?
    Data measurement involves three elements—benchmark, process, and judgment (Chapter 3).
  • How is data variability controlled in statistics?
    Variance control involves the three Rs—Reference, Replication, and Randomization (Chapter 3).
  • What is important to look for in statistical graphics?
    Statistical graphics are defined by the three Fs—Foundation, Framework, and Facade (Chapter 4).
  • Why do people often incorrectly criticize polls?
    Six reasons people criticize polls are too few participants, they didn’t ask me, only landline users were interviewed, they asked the wrong questions, the results were predetermined, and the results were wrong (Chapter 5).
  • Are there different kinds of data relationships?
    There are at least nine types of data relationships: direct, feedback, common-cause, mediated, stimulated, suppressed, inverse, threshold, and complex (Chapter 7).
  • What is important to look for in data relationships?
    Look for trends (temporal, spatial, categorical, hidden, and multivariate), patterns (shocks, steps, shifts, cycles, and clusters), and anomalies (censoring, unequal variances, and outliers) (Chapter 7).

It also provides steps for how to approach a variety of challenges in understanding data, including: what to look for in statistical surveys; how to decide if correlation implies causation; how to evaluate statistical presentations; and how to become a critical thinker.

Stats with Kittens consists of nine chapters that describe what you’ll need to know to be an informed consumer of statistics. Included are traditional topics like probability, descriptive statistics, graphing, hypothesis testing, and regression, as well as topics taught less often, such as the history of statistics, causality, critical thinking, evidence, fallacies, and bad science. The Chapters are:

  • Chapter 1. Introduction. Why you should learn about statistics and what you should know before starting.
  • Chapter 2. Probability. What probability and odds are, where they come from, and how they influence our lives.
  • Chapter 3. Description. How describing datasets is easier than describing people once you know what to look for.
  • Chapter 4. Graphs. What you need to know about statistical graphics to assess their validity. It’s much more than you were taught in HS.
  • Chapter 5. Surveys. How to measure intangible, changing opinions. Everybody thinks surveys are easy to conduct but they are, in fact, the most difficult type of statistical analysis to get right.
  • Chapter 6. Comparisons. How to find significant and meaningful differences between populations represented by data. Statistical testing has been used for centuries but tests are complicated and easy to get wrong.
  • Chapter 7. Relationships. How data metrics can be related and why it’s hard to tell when correlation implies causation.
  • Chapter 8. Models. How statistical models of a data relationship are created and how to spot common faults that others may overlook.
  • Chapter 9. Literacy. How to recognize possible issues in data and statistical analyses to assess the validity of information and arguments presented in technical reports and media stories.

Stats with Kittens is not a textbook, but it is still a valuable resource for students, professionals, and everyone in between. It provides help for comprehending the data-driven analytics you might encounter in sports, social media, the news, and most anything you follow in life. It is an asset for those looking to expand their knowledge of the world.

Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis (2nd edition) as a paperback or hardcover. Read them to your cats. Discover more about using statistics at my Stats with Cats blog and other thought-provoking observations at my Random Terrabytes blog. Join other fans at my Instagram and LinkedIn pages.

Posted in Uncategorized | Tagged , , , , , , , , | Leave a comment

Feeling Significance

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Science and Cookies

Creating science is like making cookies—you need a recipe, ingredients, and tools to combine the ingredients and bake the dough.

  • The recipe is the scientific method.
  • The ingredients are the knowledge of the discipline and the data from the experiments.
  • The tools are logic and philosophical principles.
  • The dough is the raw results.
  • The cookies are the interpreted results that have been peer-reviewed, reported in professional publications, and debated in the discipline community.

If you’ve ever made cookies, you know that if you use quality ingredients and follow the recipe, everything will probably turn out fine. It helps if you have some experience with the tools you’ll use and with making cookies in general. Making science is kind of like that.

The Recipe

The Internet has scores of websites that aim to explain the scientific method, often as infographics. Some are more detailed than others, some have steps that others don’t. Even so, in real life, it’s more complex than you might imagine.

The scientific method is not a rigid formula, it’s more of a guideline for what things to include in research and when to include them. It’s different from “scientists’ methods,” which are just practices individual researchers use often because they have found them to work in the past. For example, they might limit their experiments to thirty samples because that’s what they were told by their thesis advisor. They’re like how every experienced cookie maker will put their own personal stamp on their results, say by decorating their products.

Although the scientific method doesn’t change, how it is implemented does. For one, how researchers design and implement an observational study is very different from how they design and implement an experimental study. Different mindsets, different populations and phenomena, and different hypotheses, but both types of study still rely on the scientific method.

Statistical studies follow the same basic steps as for the scientific method, only there is more attention paid to fundamental statistical concepts, such as populations, scales of measurement, variance control, and statistical assumptions.

Here’s what the scientific method for statistical studies looks like:

  1. Make an observation, have a thought, or get in an argument on Twitter.
  2. Do background research. Somebody may have already invented that wheel. Remember the geologist’s old adage, a month in the field will save you an hour in the library.
  3. Define the research question to be investigated. Determine if the research will be observational or experimental as this will establish what statistical designs will be applicable. Note whether the question involves data description, comparison, or relationships as this will influence what statistical techniques will be applicable.
  4. Depending on the information available on the research question, either:
     A. Collect more observations anecdotally to refine the question for a preliminary study, or
     B. Design a preliminary study to answer the question and identify needs for additional data, or
     C. Design a confirmatory study to answer the question definitively.
  5. Define the phenomenon to be investigated and the metrics that will be used to characterize the phenomenon. Identify the instruments and procedures for generating data on the metrics. Determine if the procedures and instruments will provide appropriate accuracy and precision. Identify scales of measurement for all metrics as this will influence what statistical techniques will be applicable.
  6. Define the characteristics of the population to be investigated. Decide what kinds of inferences might be made to the population. Identify an appropriate sampling scheme for obtaining a representative sample from the population. Select sample collection locations, frame, or group assignments, as appropriate. Identify appropriate variance control approaches of reference, replication, and randomization.
  7. Develop a hypothesis that can be tested. Write Null and Alternative hypotheses (see Chapter 6). Estimate the number of samples that will be needed for the analysis considering the number of grouping variables and tests to be carried out.
  8. Collect data using appropriate quality control and variance reduction procedures. This is the crux of the research. If the data collection is faulty, either because of a bad design or implementation, the research study is a failure. If the data analysis is problematical, it can be repeated so long as the data are good.
  9. Process and analyze the data. All analyses start with data scrubbing and an exploratory data analysis. Further analyses will depend on the objective of the study—classify/identify, compare, predict/explain, or explore. Look for violations of assumptions.
  10. Test the hypothesis and reevaluate as necessary. Make and test predictions based on the hypothesis. Draw conclusions and report findings.

Both the scientific method and cookie making can be viewed as either once-and-done or iterative processes depending on the scope of the goal. Deep scientific research usually involves many experiments based on evolving knowledge, but so too can the search for the very best recipe for peanut butter cookies. Some scientific research involves a single, straightforward experiment, just to find out something. Sometimes you make cookies just to try out a new recipe.

The Ingredients

The ingredients of the scientific method are domain expertise (i.e., the knowledge of the discipline) and the data from the experiments. Even before you think about collecting data from an experiment, you need to know your stuff. You can’t make cookies if you don’t know where the kitchen is.

You need domain expertise to create hypotheses and generate data, and you need data to test hypotheses and create results. Data are the main ingredient. They are the evidence that will support or refute your research hypothesis.

There are many ways that data go wrong just as there are many ways that baking ingredients can be stale or contaminated. When you’re making cookies, it’s not uncommon to substitute for an ingredient if you don’t have it or if you want to try something different. You might substitute non-gluten flour for all-purpose flour or add cinnamon just because you like the taste. With data, you might correct errors, replace outliers, or add data transformations. You have to use the best ingredients you can.

The Tools

The tools of the scientific method are the logic and philosophical principles that are used to construct the research question, hypothesis, and experimental design. Logic is more than just the fallacies, it encompasses methods of reasoning and constructing arguments. Philosophical principles are like goals or guidelines for developing a research project. Examples include:

  • Empiricism. Knowledge comes from experience and observation.
  • Rationalism. Science must be based on facts and logical reasoning rather than on opinions, emotions, and belief.
  • Inclusiveness. Incorporating all aspects of domain knowledge into a research question.
  • Universality. Being true or appropriate for all situations.
  • Parsimony. Simplicity of a research question. Also referred to as Occam’s Razor or the Law of Economy.
  • Reductionism. Simplifying a complex phenomenon into discrete, fundamental elements.
  • Refutability. The ability of a hypothesis to be disproven. In statistical testing, this is managed with effect size, confidence, power, and other test details.

These tools of the scientific method aren’t discussed much, but clearly, they are essential elements in creating science. Like tools used in making cookies, mixers and ovens, for instance, you don’t have to know a lot about how they work if you’re just licking the beaters.

From Dough to Cookies

If you’re making cookies, once you finish making the dough, you bake it to complete the process. If you’re conducting research, once you finish analyzing the data, you document your work to complete the process. Reporting research results is like baking cookie dough—it puts all the efforts into parts that can be consumed by anyone, any time, any place.

There’s no guarantee that either a research report or a cookie will be good or even “as expected.” There might have been accommodations or shortcuts taken that affected the results. The research design, the recipe, may have been inferior. There may have been steps taken to optimize research results, like searching for significance (Chapter 6). That’s adding extra sugar to a cookie recipe; it seems good but others won’t be able to use the recipe and get the same results.

How results get packaged will affect how they are perceived. Cookies can be cut into shapes and decorated, then arrayed on a platter or stored in a zipper-storage bag. Research reports can be kept private or released to the public. They can be aimed at a particular audience, from non-technical to expert. They can be placed in peer-reviewed journals or reported in the main-stream media. Each type of publication appears different to the readers. There will be different types of comments, debates, and follow-up. Some people will be satisfied and some will want more.

Expectations matter, though they shouldn’t. Reports written by experts that appear in prestigious publications are accepted without challenge just as cookies from professional bakers are expected to be good tasting. But these expectations are not always fulfilled. Sometimes the recipes aren’t followed adequately or the ingredients are substandard. Some results are bad to begin with and some go stale over time. When that happens, just make more cookies

What is necessary with both research and cookies is to be an unbiased, informed consumer. But this is often not easy. As Carl Sagan once said, “We live in a society exquisitely dependent on science and technology, in which hardly anyone knows anything about science and technology.” In that regard, research and baking are quite different.

Posted in Uncategorized | Tagged , , , , | Leave a comment

When Science Goes Wrong

Science is our perception of how things work. The scientific method is how we determine what is the current state of our science. Science is the product of the successful application of the scientific method. They are not the same. For one thing, while science changes; the scientific method is constant.

When people say “trust the science” what they really mean to say, or should mean to say, is “trust the scientific method.” Science is constantly in a state of flux. It is never settled because there are always new things to learn. In the 1950s, I was taught that there were electrons having a negative charge, protons having a positive charge, and neutrons having no charge. My grandparents never learned about any of these when they went to school, it was all too new and unsettled. Today, there are more subatomic particles than I can count. I don’t even know what is taught about them in high school.

There are many ways that the scientific method can be perverted, if not ignored altogether, to produce erroneous results. Most research characterized as bad science is probably the result of bias on the part of the researcher. Sometimes, it is a consequence of the topic not having a theoretical basis or being near the limits of our current understanding. And, of course, in rare cases, it is intentional.

Categories of bad science go by many names, all of which are pejorative. Category definitions vary between sources and some topics have been given as examples in more than one category. Sometimes the negative connotations are used to discredit research that challenges mainstream scientific ideas. Like an ad hominem argument, invoking terms related to bad science have been used to silence dissenters by preventing them from receiving financial support or publishing in scientific journals.

Pathological Science

Pathological science occurs when a researcher holds onto a hypothesis despite valid opposition from the scientific community. This isn’t necessarily a bad thing. Most scientific hypotheses go through periods when they are ignored in favor of the accepted hypothesis. It is only with persistence and further research that a hypothesis will be accepted. Sometimes the change is evolutionary and sometimes the change is revolutionary. The change from the Expanding-Earth hypothesis to the Continental-Drift hypothesis was revolutionary; the change from the Continental-Drift hypothesis to Plate Tectonics was evolutionary.

The pathological part of pathological science occurs when the researcher deviates from strict adherence to the scientific method in order to favor the desired hypothesis or incorporate wishful thinking into interpretation of the data. Usually, the hypothesis is experimental in nature and is developed after some research data have been generated. The effects of the results are near the limits of detectability. Sometimes, other researchers are recruited to perpetuate the delusion.

Researchers involved in pathological science tend to have the education and experience to conduct true science so their initial results may be accepted as legitimate. Eventually, though, failure to replicate the results damages its credibility.

Cold fusion is considered by some to be an example of pathological science because all or most of the research is done by a closed group of scientists who sponsor their own conferences and publish their own journals.

Pseudoscience

Pseudoscience involves hypotheses that cannot be validated by observation or experimentation, that is, are incompatible with the scientific method, but still are claimed to be scientifically legitimate. Pseudoscience often involves long-held beliefs that pre-date experiments, consequently, it is often based on faulty premises. While less likely to be popular in the scientific community, pseudoscience may find support from the general public.

Examples that have been characterized as pseudoscience include numerology, free energy, dowsing, Lysenkoism, graphology , body memory, human auras, crystal healing, grounding therapy, macrobiotics, homeopathy, and near-death experiences.

The term pseudoscience is often used as an inflammatory buzzword for dismissing opponents’ data and results.

Fringe Science

Fringe science refers to hypotheses within an established field of study that are highly speculative, often at the extreme boundaries of mainstream studies. Proponents of some fringe sciences may come from outside the mainstream of the discipline. Nevertheless, they are often important agents in bringing about changes in traditional ways of thinking about science, leading to far-reaching paradigm shifts.

Some concepts that were once rejected as fringe science have eventually been accepted as mainstream science. Examples include heliocentrism (sun-centered solar system), peptic ulcers being caused by Helicobacter pylori, and chaos theory. The term protoscience refers to topics that were at one point mainstream science but fell out of favor and were replaced by more advanced formulations of similar concepts. The original hypothesis then became a pseudoscience. Examples of protosciences are astrology evolving into the science of astronomy, alchemy evolving into the science of chemistry, and continental drift evolving into plate tectonics.

Other examples of fringe science include Feng shui, Ley lines, remote viewing, hypnotherapy and psychoanalysis, subliminal messaging, and the MBTI (Myers–Briggs Type Indicator). Some areas of complementary medicine, such as mind-body techniques and energy therapies, may someday become mainstream with continuing scientific attention.

The term fringe science is considered to be pejorative by some people but it is not meant to be.

Barely Science

Barely science might be perfectly acceptable science except that it is too underdeveloped to be released outside the scientific community. Barely science may be based on a single study, or pilot studies that lack the methodological rigor of formal studies, or studies that don’t have enough samples for adequate resolution, or studies that haven’t undergone formal peer review. Researchers under pressure to demonstrate results to sponsors or announce results before competitors are the sources. Consumers see barely science more than they know.

Junk Science

Junk science refers to research considered to be biased by legal, political, ideological, financial, or otherwise unscientific motives. The concept was popularized in the 1990s in relation to legal cases. Forensic methods that have been criticized as junk science include polygraphy (lie detection), bloodstain-pattern analysis, speech and text patterns analysis, microscopic hair comparisons, arson burn pattern analysis, and roadside drug tests. Creation sciences, faith healing, eugenics, and conversion therapy are considered to be junk sciences.

Sometimes, characterizing research as junk science is simply a way to discredit opposing claims. This use of the term is a common ploy for devaluing studies involving archeology, complementary medicine, public health, and the environment. Maligning analyses as junk science has been criticized for undermining public trust in real science.

Tooth-Fairy Science

Tooth-Fairy science is research that can be portrayed as legitimate because the data are reproducible and statistically significant but there is no understanding of why or how the phenomenon exists. Placebos, endometriosis, yawning, out-of-place artifacts, megalithic stonework, ball lightning, and dark matter are examples. Chiropractic, acupuncture, homeopathy, therapeutic touch, and biofield tuning may also be considered to be tooth-fairy sciences

Cargo-Cult Science

Cargo-cult science involves using apparatus, instrumentation, procedures, experimental designs, data, or results without understanding their purpose, function, or limitations, in an effort to confirm a hypothesis. Examples of cargo-cult experimentation might involve replication studies that use lower-grade chemical reagents, instruments not designed for field conditions, or data obtained using different populations and sampling schemes. In a case of fraudulent science involving experimental research on Alzheimer’s disease, over a decade of research efforts were wasted by relying on the illegitimate results.

Coerced Science

Coerced science occurs when researchers are compelled by authorities to study sometimes-objectionable topics in ways that promote speed in reaching a desired result over scientific integrity. There are many notable examples. During World War II, virtually every major power pushed their scientists and engineers to achieve a variety of desired results. In the 1960s, JFK successfully pressured NASA to land a man on the Moon. In the 1980s, Reagan prioritized efforts on his Strategic Defense Initiative (SDI) even though the goal was considered to be unachievable by experts. Many governments restrict research on their country’s cultural artefacts to individuals who agree to severe preconditions including censorship of announcements and results.

Businesses, especially in the fields of medicine and pharmaceutics, place great pressure on research staff to achieve results. For example, Elizabeth Holmes, founder of the medical diagnostic company Theranos, was convicted of fraud and sentenced to 1114 years in prison. Businesses are also known to conceal data that would be of great benefit to society if they were available. Examples include results of pharmaceutical studies (e.g., Tamiflu, statins) and subsurface exploration for oil and mineral resources.

Academic institutions predicate tenure appointments in part on journal publications and grant awards, both of which rely on researchers finding statistical significance in their analyses (p-hacking, see Chapter 6).

Taboo Science

Taboo science refers to areas of research that are limited or even prohibited either by governments or funding organizations. Sometimes this is reasonable and good. For example, research on humans has become more and more restrictive after the atrocities that occurred during World War II. During the Cold War, U.S. military and intelligence agencies obstructed independent research on national security topics, such as encryption.

Some taboos, however, are promoted by special-interest groups, such as political and religious organizations. Examples of topics that are difficult for researchers to obtain funding for include: effectiveness of methods to control gun violence; ancient civilizations, archeological sites,  artefacts, and STEM capabilities; health benefits of cannabis and psychedelics; resurrecting extinct species; and some topics in human biology such as cloning, genetic engineering, chimeras, synthetic biology, scientific aspects of racial and gender differences, and causes and treatments for pedophilia.

Fraudulent Science

Fraudulent science consists of research, experimental or observational, in which data, results, or even whole studies are faked. Creation of false data or cases is called fabrication; misrepresentation of data or results is called falsification. Plagiarism and other forms of information theft, conflicts of interest, and ethical violations are also considered aspects of fraudulent science. The goals of fraudulent science are usually for the researcher to acquire money including funding and sponsorships, and enhance reputation and power within the profession.

Unfortunately, there are too many examples of fraudulent science. Perhaps the most notorious is the 1998 case of Andrew Wakefield, a British expert in gastroenterology, who claimed to have found a link between the MMR vaccine, autism and inflammatory bowel disease. His paper published in The Lancet, which was retracted in 2010, is thought to have caused worldwide outbreaks on measles after a substantial decline in vaccinations. Wakefield later became a leader in the anti-vaxx movement in the U.S.. Another infamous example involves faked images in a 2006 experimental study of memory deficits in mice, which subsequently led to an unproductive diversion of funding for Alzheimer’s research.

Sometimes, fraudulent actions are subtle and go unnoticed even by experts. Examples include pharmaceutical studies designed to accentuate positive effects while concealing undesirable side effects. Sometimes, well-meaning actions have unforeseen ramifications, such as when definitions of medical conditions are changed resulting in patients being treated differently. Examples include obesity, diabetes, and cardiac conditions.

From 2000 to 2020, 37,780 professional papers have been retracted because of fraud (The Retraction Watch Database [Internet]. New York: The Center for Scientific Integrity. 2018. ISSN: 2692-465X. Accessed 4/13/2023. Available at: http://retractiondatabase.org/). Those retractions are considered to represent only a fraction of all fraudulent science.

It’s Not All Bad

Clearly, science and scientists are wrong on occasion even when they don’t intend to be. That is to be expected. Even if the scientific method isn’t all that difficult to understand it is incredibly difficult to put into practice, simplified flowcharts notwithstanding. As a consequence, scientific studies are too often poorly designed, poorly executed, misleading, or misinterpreted. Most of the time, this is inadvertent though sometimes not.

While this may seem like a fairly dismal portrayal of science, bear in mind that the vast majority of today’s science is real and legitimate. The difference between bad science and true science that strictly follows the scientific method is that true science will eventually correct illegitimate results.

Posted in Uncategorized | Tagged , , , , , , , , , , , , | 2 Comments

How to Tell if a Political Poll is Legitimate

Four things to look for.

Why People Hate Polls

It’s probably true that everybody has taken a survey at some point or other. What’s also probably true is that most people think polling is easy. And why not? Google has a site for creating polls. Social media sites and blogging sites provide capabilities for conducting polls. There are also quite a few free online survey tools. Why wouldn’t people believe that just anybody could conduct a survey.

Perhaps as a consequence of do-it-yourself polling, there is no end to truly bad, amateur polls. But, there are also well-prepared polls meant to mislead, some overtly and some under the guise of unbiased research. Some people have accordingly come to believe that information derived from all polls is biased, misleading or just plain useless. Familiarity breeds contempt.

Like any other complex practice like medicine, statistical polling isn’t an exact science and can unexpectedly and unintentionally fail. But for the most part, it is legitimate and reliable even if the public doesn’t understand it. However, ignorance breeds contempt too.

Ignorance leads to fear and fear leads to hate.

People are comfortable with polls that confirm their preconceived notions, confirmation bias, yet they lambaste polls that don’t confirm their beliefs because they don’t understand the science and mathematics behind statistical surveying. This is experienced equally by both sides of the political spectrum. Nonetheless, surveys are relied on extensively throughout government and business to support their work. And, of course, politicians live and die by poll results.

Poll haters usually focus on six kinds of criticisms:

  • The results were decided before the poll was conducted.
  • The poll only included 1,000 people out of 300,000,000 Americans
  • The results should only apply to the people questioned
  • The poll didn’t include me
  • The poll only interviewed subjects who had landlines
  • The poll didn’t ask fair questions

I didn’t make these criticisms up. I compiled them from Twitter threads that involved political polls. I explain why these criticism might be correct or not at the end of the article.

If you want to assess whether a political poll really is legitimate, there are four things you should look at. It helps if you know some key survey concepts, including population, frame, sample and sample size, interview methods, question types, scales, and demographics. If you do, skip to the last section of this article for the hints. Otherwise, read on.

Critical Elements of Surveys

The terms poll and survey are often used synonymously. Traditionally, polls were simple, one-question, interviews often conducted in person. Surveys were more elaborate, longer, data gathering efforts conducted with as much statistical rigor as possible. Political “who-do-you-plan-to-vote-for” polls have evolved into expansive instruments to explore preferences for policies and politicians. You can blame the evolution of computers, the internet, and personal communications for that.

Polling for presidential elections increased dramatically after the 1980s as the numbers of personal computers and polling organizations grew. https://statswithcats.net/2010/06/13/reality-statistics/

Polls on social media are for entertainment. Serious surveys of political preferences are quite different. There is a lot that goes into creating a scientifically valid survey. Scores of textbooks have been written on the topic. Furthermore, the state-of-the-art is constantly improving as technology advances and more research on the psychology of survey response is conducted.

Here are a few critical considerations in creating surveys.

Survey Source

As you might expect, the source of a political survey is important. Before 1988, there were on average only one or two presidential approval polls conducted per month. Within a decade, that number had increased to more than a dozen. By 2021, there were 494 pollsters who conducted 10,776 political surveys. Fivethirtyeight.com graded 93% of the pollsters with a B or better; 2% failed. Of the pollsters, two-fifths lean Republican and three-fifths lean Democratic. Notable Republican-leaning pollsters include: Rasmussen; Zogby; Mason-Dixon; and Harris. Notable Democratic-leaning pollsters include: Public Policy Polling; YouGov; University of New Hampshire; and Monmouth University.

Topics

The topics of a political survey are simply what you want to know about certain policies, events, or individuals. Good surveys define what they mean by the topics they are investigating and do not push biases and misinformation. They account for the relevance, changeability, and controversiality of the topic in the ways they organize the survey and ask the questions.

Population

The population for a survey is the group to which you want to extrapolate your findings. For political surveys in the U.S., the population of a survey is simply the population of the country, or at least the voters. The Census Bureau provides all the information on the demographics (e.g., gender, age, race/ethnicity, education, income, party identification) of the country that surveys need.

Frame

The frame is a list of subjects in the population that might be surveyed. Frames are more difficult to assemble than population characteristics because the information sources are more diverse and not centralized. Sources might include telephone directories, voters lists, tax records, membership lists of public organizations, and so on.

Sample

The survey sample is the individuals to be interviewed. More individuals are needed than the number of samples desired for the survey because some individuals will decline to participate. The sample is usually selected from the frame by some type of probability sampling. Usually, stratified-random sampling is used to ensure all the relevant population demographics are adequately represented. This establishes survey accuracy.

Getting the population, frame, and sample right is the most fundamental aspect of a survey that can go wrong. Professional statisticians agonize over it. When something goes wrong, it’s the first place they look because everything else is pretty straightforward. Sometimes identifying problems in surveys is near impossible.

Sample Size

Sample size is simply the number of individuals who respond to the survey. Sample size (and a few other survey characteristics) determine the precision of the results. One of the first things critics of political polls cite is how few subjects are interviewed. A challenge in survey design is to select a large enough sample size to provide adequate precision yet not too many samples that would increase costs.

Four reasonable assumptions reduce the complex equation for the margin-of-error to 1/√n.  https://statswithcats.net/2011/05/08/polls-apart/

Most political polls use 500 to 1,500 individuals to achieve margins-of-error between .5% and 2.6%. (If you’ve taken Stats 101, the margin-of-error is the 95% confidence interval around an average survey response.) Using more than 1,500 individuals is expensive and doesn’t increase precision much (as shown in the chart).

Interview Methods

There are many methods used to provide questions to individuals in a survey, including: in-person, telephone, recorded message, mail and email, and websites. Each has its own advantages and limitations. Some surveys use more than one method in order to test the influence of the interview.

Questions

The questions that are included in a survey are often a focus of critics. The construction of survey questions is an arduous process involving eliciting information on a topic so to not influence the resulting answer. It sounds simple but to a professional survey designer, it seldom is. The structure of questions shouldn’t be vague, leading, or compound, nor should it employ double negatives. The choice of individual words is also important to ensure they do not introduce bias, are not offensive or emotion-laden, nor may be misleading, unfamiliar, or have multiple meanings. Jargon, slang, and abbreviations/acronyms are particularly taboo. Sometimes surveys have to be presented in different languages besides English depending on the frame. Questions also have to be designed to facilitate the analysis and presentation of results.

Types of Questions

Asking a question in plain conversation doesn’t require the rigor that is needed for survey questions. In a conversation, you can rephrase and follow-up when you don’t get an answer that can be used in an analysis. You don’t have that flexibility in a survey; you only get once chance. You have to construct each question so that respondents are forced to categorize their responses into patterns that can be analyzed. There are quite a few ways to do this.

Open Ended Questions

The most flexible type of question is the open-ended question, which has no predetermined categories of responses. This type of question allows respondents to provide any information they want, even if the researcher had never considered such a response. As a consequence, open-ended questions are notably difficult to analyze. They are almost never used in legitimate political polls.

Closed ended Questions

Closed-ended questions all have a finite number of choices from which the respondent has to select. There are many types of closed-ended questions, including the following eight.

1. Dichotomous Questions — either/or questions, usually presented with the choices yes or no.

Dichotomous questions are easy for survey participants to understand. Responses are easy to analyze. Results are easy to present. The drawback of dichotomous questions is that they don’t provide any nuances to participant answers.

2. Single-Choice Questions — a vertical or horizontal list of unrelated responses, sometimes presented as a dropdown menu. The responses are often presented in sequences that are randomized between respondents.

Single-choice questions are easy for survey participants to understand. Responses are easy to analyze. Results are easy to present. The drawback of single-choice questions is that they can’t always provide all the choices that might be relevant. In the sample question, for example, there are a lot more issues that a participant might think are more important than the seven listed.

3. Multiple-choice Questions — like a single choice question except that the respondent can select more than one of the responses. This presents a challenge for data presentation because percentages of responses won’t sum to 100%

Multiple-choice questions are somewhat more difficult for survey participants to understand because participants can check more than one response box. Survey software helps to validate the responses. Those responses are more difficult to analyze because it’s almost like having a dichotomous question for each response checkbox. Results are more difficult to present clearly because percentages can be misleading. The advantage of multiple-choice questions is that they provide some comparative information about the choices in an efficient way.

4. Ranking Questions — questions in which respondents are supposed to place an order on a list unrelated items.

Ranking questions are relatively easy for survey participants to understand but rank-ordering takes more thought than just picking a single response. Responses are much more difficult to analyze and present. The advantage of ranking questions is that they provide more comparative information about the choices than multiple-choice questions.

5. Rating Questions — questions in which respondents are supposed to assign a relative score on unrelated items. The score is on some type of continuous scale. Responses might be written in or indicated on a slider.

Rating questions are relatively easy for survey participants to understand, although anything requiring survey participants to work with numbers presents a risk of failure. Responses are easy to analyze and results are easy to present, though. The drawback of rating questions is that they take participants longer to respond to than Likert-scale questions.

6. Likert-scale Questions— like a single-choice question in which the choices represent an ordered spectrum of choices. An odd number of choices allows respondents to pick a middle-of-the-road position, which some survey designers avoid because it masks true preferences.

Likert-scale questions are easy for survey participants to understand. Responses are easy to analyze and present. The drawback of Likert-scale questions is that they are less precise than rating questions.

7. Semantic-differential Questions — like a Likert or rating scale question in which the choices represent a spectrum of preferences, attitudes, or other characteristics, between two extremes (e.g., agree-disagree, conservative-progressive, important-unimportant). It is thought to be easier for respondents to understand.

Semantic-differential questions are easy for survey participants to understand. Responses are easy to analyze once the responses are coded. Results are easy to present. The drawback of semantic-differential questions is that they are not supported by some survey software.

8. Matrix Questions — Questions that allow two aspects of a topic to be assessed at the same time. Matrix questions are very efficient but also too complex for some respondents.

Matrix questions are very efficient but also difficult for some survey participants to understand. Responses are easy to analyze and present because they are like multiple Likert-scale questions.

Issues with Questions

One common issue with questions in political surveys is constrained lists, in which only a few of many possible choices are provided. Then the results are presented as the only choices selected by respondents. This happens with multiple-choice, ranking, and matrix questions. For example, a survey might ask “what’s the most important issues facing the country?” with the only choices being “abortion,” “immigration,” “marriage,” and “election fraud,” and then reporting that Americans believe abortion is a major national issue. Constrained questioning is not soundly-acquired, legitimate survey information.

There are many other issues that question creators have to consider.

  • It is preferable to construct questions similarly to facilitate respondent understanding.
  • The types and complexities of the questions and the number of choices will influence the type of interview and the length of the survey.
  • Long surveys suffer from participant drop-out. This may cause questions to have different precisions (because of different sample sizes) and even different demographic profiles.
  • When questions are not answered by respondents, the missing data that must be considered in the analysis. Requiring answers is not a good solution because it may cause some respondents to leave the survey, worsening the drop-out rate.
  • If the order of the questions or the order of the choices for each question may be influential, they should be randomized.
  • Some questions may need an other option, which is difficult to analyze.
  • Demographic questions must be included in the survey so that comparison to the population is possible.
  • Interviewee anonymity must be preserved while still including demographic information.
  • Focus groups, pilot studies, and simultaneous use of alternative survey forms are sometimes used for evaluating survey effectiveness.

Creating survey questions is not as simple as critics think it is.

How to Tell if a Political Poll is Valid

People criticize political polls all the time. Some criticisms are reasonable and valid based on flawed methods, and others are just a reflection of the poll results being different from what the critic believes. Critics fall on all sides of the political spectrum.

Six Criticisms of Poll Haters

Most people probably wouldn’t criticize, or for that matter, even care about political polls if they didn’t have preconceived notions about what the results should be. If they do see a poll that doesn’t agree with their preconceived notions, they are quick to find fault. Some of their criticisms could have merit, but usually not. Here are six examples.

Too Few Participants

Critics of political polls can’t seem to understand that a sample of only a few hundred individuals can be extrapolated to the whole population of the U.S., over 300 million, if the survey frame and sample are appropriate. What the number of survey participants does influence is the survey precision. So, this criticism would be true if the sample size were small, say less than 100. This would make the margin of error about ±10%, which would be fairly large for comparing preferences for two candidates. However, most legitimate political polls include at least 500 participants, making the margin of error about ±4.5%. Large political polls might include 1,500 participants resulting in a ±2.6% margin-of-error. This criticism is almost always unjustified.

They Didn’t Ask Me

If the survey frame and sample are appropriate, the demographic of the critic is already represented. This criticism is always unjustified.

The first political poll dates back to the Presidential election of 1824. Probability and statistical inference for other applications is hundreds of years older than that. The science behind extrapolating from a sample representative of a population to the population itself is well established.

This criticism is about the frustration a critic has when the survey results don’t match their expectations. It is a form of confirmation bias. The results just mean that the opinion of the critic doesn’t match the population.

Only Landline Users Were Interviewed

This criticism has to do with how technology affects the selection of a frame and a sample. The issue dates back to the 1930 and 1940s when telephone numbers were used to create frames. The problem was that only wealthy households owned telephones so the frame wasn’t representative of the population. Truman defeated Dewey regardless of what the polls predicted.

The issue repeated in the 1990s and 2000s when cell phones began replacing landlines. For that period, neither mode of telephony could be relied on to be representative of the U.S. population. By the 2010s, cell phone users were sufficiently representative of the population to be used as a frame.

Today, using telephone lists exclusively to create frames is a known issue. Most big political surveys use several different sources to create frames that are representative of the population.

They Asked the Wrong Questions

This criticism probably isn’t about gathering information about the wrong topics. It is probably critics thinking that the questions were biased or misleading in some ways. It’s probably true that this criticism is made without the critic actually reading the questions because that information is seldom available in news stories. It has to be uncovered in the original survey analysis report.

This criticism may have merit if the poll didn’t clearly define terms, or used slang or jargon. Professional statisticians usually ask simple and fair survey questions but may on occasion use vocabulary that is unfamiliar to participants.

The Results Were Predetermined

This is a bold criticism that isn’t all that difficult to invalidate. First, no professional pollster is likely to commit fraud, regardless of the reward, just because their business and career would be in jeopardy. Look at the source. If it is any nationally known pollster who has been around for a while, the criticism is unlikely.

If the source is an unknown pollster, look at the report on the survey methods. They might suggest poor methods but that wouldn’t necessarily guarantee a particular set of results. If there was an obvious bias in the methods, like surveying attendees at a gun show, it should be apparent.

If there is no background report available on the survey methods, this criticism would merit attention. In particular, if the survey results were prepared by a non-professional for a specific political candidate or party, skepticism would be appropriate.

The Results Are Wrong

There are many things that can go wrong with a survey. Criticisms that a political poll is wrong are usually suppositions based on confirmation bias. Compare the poll to other polls researching the same topics during the same timeframe. If the results are close, within the margins-of-error, the polls are probably legitimate.

Criticisms based on suspect survey methods are difficult to prove. The only way to determine that a political poll was truly wrong is to wait until after the election and conduct a post-mortem.

Even when a professional pollster designs a survey, unexpected results can occur. This was the case in the 1948 Presidential election. More recently, polls conducted before the 2016 Presidential election did not correctly predict the winner. New methods were put in place but the polls conducted before the 2020 election also had discrepancies. What polling organizations haven’t considered yet is that the polls were correct but voter suppression measures affected the results. In other words, the polls correctly predicted the intent of the electorate but voters could not express their preferences on election day because of administrative barriers.

Four Hints for Assessing Polls

Don’t get fooled into believing results you agree with or disbelieving results you don’t, called confirmation bias. Don’t get distracted by the number of respondents. You have to dig deeper to assess the legitimacy of a poll.

You won’t be able to tell from a news story if a poll is likely to be valid. You have to find a link to the documentation of the original poll. If there is none, search the internet for the polling organization, topic, and date. If there is no link to the poll, or if the link is dead or leads to a paywall, the legitimacy of the poll is suspect.

When you find the poll documentation, look for four things:

  1. Who conducted the poll? Are they independent, unbiased, and reputable? Try searching the internet and visiting https://projects.fivethirtyeight.com/pollster-ratings/. A poll conducted for a candidate or a political party is not likely to be totally legitimate.
  2. What was the progression from population to frame to sample? This is very difficult for non-statisticians to assess; it’s even difficult for statisticians to work out. It’s not just a matter of polling whoever answers a phone or visits a website. Participants have to be weighted for population demographics and cleared from any potential biases. In short, if the process is complex and described in detail, it is more likely to have been valid than not.
  3. Were the questions simple and unbiased? Was the sentence structure of the questions understandable? Were any confusing or emotion-laden words used? Did the questions directly address the topics of the survey? Were the questions presented in close-ended types so that the results were unambiguous? You have to actually see the questions documented in the survey analysis report to tell. Also, check to see how the interviews were conducted, whether autonomously or in person. It probably won’t matter. Sophisticated surveys might use more than one interview method and compare the results.
  4. Does it explore demographics? Any legitimate political survey will explore the background of the respondents, things like sex, age, race, party, income, and education. Researchers use this information to analyze patterns in subgroups of the sample. If the poll doesn’t ask about that information, it’s probably not legitimate.

There will always be something that might adversely affect the validity of a poll. Even professional statisticians make mistakes or overlook minor details. But, these glitches will probably be impossible for most readers to spot. If you as an average consumer see something in the population, frame, sample, or questions that is dubious, you may have cause to critique. Otherwise, don’t expose your ignorance by complaining about not having enough participants.

Learn how to think critically and make it your first reaction to any questionable poll you may encounter.

Posted in Uncategorized | Tagged , , , , , , , , , , , | 1 Comment