Reality Statistics

During the 1970s, statistical analyses were done on mainframe computers that were as big as elephants. They were sequestered in their own climate-controlled quarters, waited on command and reboot by a priesthood of system operators.

Conducting a 1970s era statistical analysis was an involved process. To analyze a dataset, statisticians first had to write their own programs. Some used standalone programming languages, like FORTRAN or COBOL, or the language of one of the few commercially available statistical software packages, like SAS or SPSS. There were no GUIs (Graphical User Interfaces) or code writing applications. The statistical packages were easier to use than the programming languages but they were complicated and expensive mainframe programs. Only the government, universities, and major corporations could afford their annual licenses, the mainframes to run them, and the priesthoods to care for them.

Once you had coded the data analysis program, you had to wait in line for an available keypunch machine so you could transfer your program code and all your data onto 3¼ by 7⅜ inch computer punch cards. After that, you waited so you could feed the cards through the mechanical card reader. Finally, you waited for the mainframe to run your program and the printer to output your results. When you picked up your output from the priesthood who tended the sacred processing units, sometimes all you got was a page of error codes. You had to decide what to do next and start the process all over again. Life wasn’t slower back then, it just required more waiting.

A computer and a cat are somewhat alike - they both purr, and like to be stroked, and spend a lot of the day motionless. They also have secrets they don't necessarily share.  --John Updike

A computer and a cat are somewhat alike – they both purr, and like to be stroked, and spend a lot of the day motionless. They also have secrets they don’t necessarily share. –John Updike

In the 1970s, personal computers, or what would eventually evolve into what we now know as PCs, were like mammals during the Jurassic period, hiding in protected niches while the mainframe dinosaurs ruled. Before 1974, most PCs were built by hobbyists from kits. The MITS Altair is generally acknowledged as the first personal computer, although there are more than a few other claimants.

As with biological species, it’s sometimes difficult to say when a new technological species originated. What differentiates a calculator from a microcomputer from a personal computer? Digital electronics were developed in the 1930s and 1940s. By the 1950s, plans and kits for microcomputers, some analog some digital, were available from several companies. The MIT Altair was probably the first complete non-kit PC to be produced in quantity. In 1975, MITS sold about 6,000 Altairs. This model of the Altair used a new version of BASIC from a small, unknown, startup company, Micro-Soft.

By 1980, PC sales had reached almost a million per year. Then in 1981, IBM introduced their 8088 PC. Over the next two decades, the number of IBM-compatible PCs sold annually increased to almost 200 million. From the early 1990s, sales of PCs have been fueled by Pentium-speed, GUIs, the Internet, and affordable, user-friendly software, including spreadsheets with statistical functions. MITS and the Altair are long gone, now seen only in museums, but Microsoft has survived, evolved, and dominated the top of the code chain.

Statistical analysis has changed a lot in a generation. Punch cards and their supporting machinery are extinct. Mainframes are an endangered species, having been exiled to specialty niches by PCs that fit in backpacks. Inexpensive statistical packages that run on PCs, on the other hand, have multiplied like rabbits. All of these packages have GUIs. Even the venerable ancients, SAS and SPSS, have evolved point-and-click faces (although you can still write code if you want). Now you can run even the most complex statistical analysis in less time than it takes to drink a cup of coffee.

The maturation of the Internet also created many new opportunities. You no longer need to have access to a huge library of books to do a statistical analysis. There are thousands of websites with reference materials for statistics. Instead of purchasing one expensive reference, you can now consult a dozen different discussions on the same topic, free. If you find a book you want to keep as a handy reference, you can buy electronic access to it. No dead trees need clutter your office. If you can’t find a reference book with what you want, there are discussion groups where you can post your questions. Perhaps most importantly, though, data that would have been difficult or impossible to obtain a decade ago are now just a few mouse clicks away. It’s almost as if some great intelligence designed things to happen this way. Uh huh.

So, with computer sales skyrocketing and the Internet becoming as addictive as crack, it’s not surprising that the use of statistics might also be on the increase. Consider the trends shown in this figure. The red squares represent the number of computers sold from 1981 to 2005. The blue diamonds, which follow a trend similar to computer sales, represent revenues for SPSS, Inc., the makers of the software formerly known as Statistical Package for the Social Sciences. So, sales of at least one of the major pieces of statistical software have also grown substantially over the past decade. They probably all have.

Personal Computers, SPSS Revenues, and Presidential Polls.With the availability of more computers and more statistical software, you might expect that there may be more statistical analyses being done. That’s a tough trend to quantify, but consider the increases in the numbers of political polls and pollsters since the 1990s.

Before 1988, there were on average only one or two presidential approval polls conducted per month. Within a decade, that number had increased to more than a dozen. In the figure, the green circles represent the number of polls conducted on presidential approval. This trend is quite similar to the trends for computer sales and SPSS revenues. Correlation doesn’t imply causation but sometimes it sure makes a lot of sense.

Perhaps even more revealing is the increase in the number of pollsters. Before 1990, the Gallup Organization was pretty much the only organization conducting presidential approval polls. Now, there are several dozen. These pollsters don’t just ask about Presidential approval, either. There are a plethora of polls for every issue of real importance and most of the issues of contrived importance. Many of these polls are repeated to look for changes in opinions over time, between locations, and for different demographics. And that’s just political polls. There has been an even faster increase in polling for marketing, product development, and other business applications. Even without including non-professional polls conducted on the Internet, the growth of polling has been exponential. So, there should be no doubt that there are many more statistical analyses being done today than even a decade ago.

Times change. There are no more elevator operators because untrained riders can just press a button and the doors close automatically. Gas station attendants, store cashiers, and bank tellers are being replaced by self-service mechanisms. Entertainers—actors, dancers, singers, and writers—compete for work with wannabes on reality TV shows like American Idol. Statistics too has changed. Data analysis is no longer the exclusive domain of professionals. Bosses who can’t program the clock on their microwave think nothing of expecting their subordinates to do all kinds of data analyses. So if there can be reality TV, why not reality statistics too? Are you ready for the challenge?

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.combarnesandnoble.com, or other online booksellers.

About statswithcats

Charlie Kufs has been crunching numbers for over thirty years. He retired in 2019 and is currently working on Stats with Kittens, the prequel to Stats with Cats.
This entry was posted in Uncategorized and tagged , , , , , , , , , . Bookmark the permalink.

7 Responses to Reality Statistics

  1. Pingback: Polls Apart | Stats With Cats Blog

  2. Pingback: Industry Background | emodiste's Blog

  3. Pingback: Searching for Answers | Stats With Cats Blog

  4. Pingback: The Evolution of Data Science … As I Remember It | Stats With Cats Blog

  5. Pingback: The Evolution of Data Science … As I Remember It – Data Science Austria

  6. Pingback: AI News - The evolution of Data Science … as I remember it

  7. Pingback: How to Tell if a Political Poll is Legitimate | Stats With Cats Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s