Try This At Home

We are all awash in statistics. Every day, we see the probability of precipitation, the results of opinion polls, changes in the stock market, your grades in school, or the batting average of the baseball team you follow. It’s surprising, then, that many people believe that data analysis is something you do only in school or at work. The evolution of computer hardware and software that has fed the growth of statistics didn’t stop at the door to your office or school. Why should your use of statistics stop there?

No skill improves without practice. You can practice your data analysis skills at home without making it feel like homework. Start with something you love to do, like your favorite hobby or interest. Design a study to answer some question that is interesting to you. Collect the data and then do your analysis and see what happens. Here are eight ideas for how to do that.

Personal Behaviors

Ever wonder where the time goes? Time is a major component of many data analyses, so what better place to start than an analysis of your own time. Keep a timesheet of what you do each day for at least a month. For example, categorize how you spend your day into work/school, commuting, chores, errands, sleep, and personal time. Before you start, write down how you think you allot your time to each category. Then calculate the percentages from your data. How close are your predictions to the actual percentages? Do the percentages change much from day to day? Do they vary by day of the week?

There may also be some specific activities you might want to collect data on, like how much you smoke, drink, do drugs, look at porn, gamble, curse, and watch reality TV. Keep this data in a hidden directory on your computer.

Don’t make me hungry. You wouldn’t like me when I’m hungry.

Consumption

Have you ever been on a diet and kept a food diary? You can expand this concept to create a dataset. Convert the types and amounts of foods you eat in a day into estimates of calories. Get a pedometer to estimate your exercise. Record your weight. Then see if you can see any correlations between your weights, the foods and calories you eat, your exercise, the season, and anything else you record. You might find the information quite valuable.

You may already be keeping track of your car’s mileage and fuel costs. It’s a good way to see the effects of driving styles, seasons, maintenance, and other factors on your miles per gallon. If you keep your household financial data on software, like Quicken, you can do many analyses and graphs of your spending patterns. For example, do you spend more on lattes than laundry?

If you have a cell phone, put your usage records in a spreadsheet. Figure out the minimum, maximum, and average amount of time you spend on the phone in a day. Who do you talk to the most often and for the longest duration? What is your most connected time of day and day of the week? Save these records so your family can sue Nokia in thirty years after you die from brain cancer.

Other good sources of data you can analyze are your utility bills. Some utility companies will report your past year of electricity, gas, oil, and water consumption, as well as some supporting information such as average temperature. In fact, they may have many years of your energy usage data that they can retrieve for you. You can use the data to test the effects of seasons, vacations, holidays, energy conservation measures, and more significant lifestyle changes, like the kids finally moving out.

Screen Time

Do you relax by watching TV, surfing the Internet, playing video games, or all three? Keep a log of how much time you spend in front of a view screen. You might record date, day of the week, the weather, hours watching TV, hours surfing the Internet, hours playing video games, hours sleeping, and so on, every day for a month. What is the average proportion of your day that you spend looking at a view screen? Does it vary by day of the week or by weather? At the end of a month, revise your data collection to look at other ways you spend your time? Does the act of collecting the data influence how you spend your free time?

Hobbies

Everybody has hobbies and interests that they enjoy, so why not use your favorite pastimes as opportunities to design statistical studies and collect data you can practice analyzing. Here are a few ideas for what you might do.

  • Hunting and Fishing – Record where you hunt or fish, what bait or other aids you use, the time, the weather, and what results you had. Likewise with treasure hunting, record where you search, what detector settings you use, the time, the weather, and what results you had. Are there any notable patterns?
  • Gardening—Keep a diary (or better, a spreadsheet) of how much time you spend in your garden, what you do, and the weather. What proportions of you time do you spend planting, weeding, maintaining, and harvesting? How do the percentages change with the date and the weather? If you plant seeds, do you get similar germination rates for the same plant from different suppliers?
  • Reading—Keep a log of what you read and when you read it. How much of your reading is for enjoyment versus work? What are your reading preferences? Format (books, ebooks, magazines)? Genre (e.g., nonfiction, science fiction, religion, mystery, romance)? Are there differences in how fast you read different genre or formats?
  • Music—Build a database of music; music you like and music you don’t like. Include variables like genre, length, artist, year released, theme of lyrics, instruments, time, key, and so on. Set up a rating scale for each song as the dependent variables and see if you can find patterns that explain why you like or dislike the music that you do. Extend your findings to artists you haven’t listened to before. You may even find something unexpected, like Prisencolinensinainciusol.

Kids and Other Pets

If you have a youngster in the family, start early recording height (length) and weight. Don’t just make marks on a doorframe; set up a spreadsheet to organize your data. Do this daily for a few months. How much variation in height and weight occurs from day to day? Is the variation natural or attributable to how you measure the variables? Graph the data over time. Are there changes in growth rates? How do the height and weight compare to standards for the age and species? How much can you change the frequency of data collection without losing the resolution you need to see changes?

Medical Conditions

If you have any chronic medical condition, start collecting relevant data using equipment you can find at most drug stores. For example, you might record your weight, heart rate, blood glucose, blood pressure, and temperature. Be sure to note the date and time of each measurement. You can also record qualitative variables like what and when you ate, how you feel, what exercise you did, and so on. Put the data in a graph and show your Doctor on your next visit. She or he may be impressed enough to prescribe you some medical marijuana.

Sports

No matter how you like sports—professional, amateur, personal, or fantasy—you’ll always be served a side dish of statistics. Relish the experience by analyzing data in ways no one else has. Google sabermetrics to see what I mean. Figure out what baseball player is paid the most per hit. What basketball player scores the most points per minute played? Is there a relationship between height and the number of catches a receiver makes? You can find data for almost every sport imaginable on the Internet, no vuvuzela needed.

Politics

Don’t get me going on politics. Suffice it to say that you could spend a lifetime and not analyze all the data that is currently available for free from government web sites. If you come up with anything good, write a blog about it. Most political blogs are fanatical fluff made of anti-data. Annihilate them with a real analysis.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.combarnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , , , , , , | 4 Comments

Reality Statistics

During the 1970s, statistical analyses were done on mainframe computers that were as big as elephants. They were sequestered in their own climate-controlled quarters, waited on command and reboot by a priesthood of system operators.

Conducting a 1970s era statistical analysis was an involved process. To analyze a dataset, statisticians first had to write their own programs. Some used standalone programming languages, like FORTRAN or COBOL, or the language of one of the few commercially available statistical software packages, like SAS or SPSS. There were no GUIs (Graphical User Interfaces) or code writing applications. The statistical packages were easier to use than the programming languages but they were complicated and expensive mainframe programs. Only the government, universities, and major corporations could afford their annual licenses, the mainframes to run them, and the priesthoods to care for them.

Once you had coded the data analysis program, you had to wait in line for an available keypunch machine so you could transfer your program code and all your data onto 3¼ by 7⅜ inch computer punch cards. After that, you waited so you could feed the cards through the mechanical card reader. Finally, you waited for the mainframe to run your program and the printer to output your results. When you picked up your output from the priesthood who tended the sacred processing units, sometimes all you got was a page of error codes. You had to decide what to do next and start the process all over again. Life wasn’t slower back then, it just required more waiting.

A computer and a cat are somewhat alike - they both purr, and like to be stroked, and spend a lot of the day motionless. They also have secrets they don't necessarily share.  --John Updike

A computer and a cat are somewhat alike – they both purr, and like to be stroked, and spend a lot of the day motionless. They also have secrets they don’t necessarily share. –John Updike

In the 1970s, personal computers, or what would eventually evolve into what we now know as PCs, were like mammals during the Jurassic period, hiding in protected niches while the mainframe dinosaurs ruled. Before 1974, most PCs were built by hobbyists from kits. The MITS Altair is generally acknowledged as the first personal computer, although there are more than a few other claimants.

As with biological species, it’s sometimes difficult to say when a new technological species originated. What differentiates a calculator from a microcomputer from a personal computer? Digital electronics were developed in the 1930s and 1940s. By the 1950s, plans and kits for microcomputers, some analog some digital, were available from several companies. The MIT Altair was probably the first complete non-kit PC to be produced in quantity. In 1975, MITS sold about 6,000 Altairs. This model of the Altair used a new version of BASIC from a small, unknown, startup company, Micro-Soft.

By 1980, PC sales had reached almost a million per year. Then in 1981, IBM introduced their 8088 PC. Over the next two decades, the number of IBM-compatible PCs sold annually increased to almost 200 million. From the early 1990s, sales of PCs have been fueled by Pentium-speed, GUIs, the Internet, and affordable, user-friendly software, including spreadsheets with statistical functions. MITS and the Altair are long gone, now seen only in museums, but Microsoft has survived, evolved, and dominated the top of the code chain.

Statistical analysis has changed a lot in a generation. Punch cards and their supporting machinery are extinct. Mainframes are an endangered species, having been exiled to specialty niches by PCs that fit in backpacks. Inexpensive statistical packages that run on PCs, on the other hand, have multiplied like rabbits. All of these packages have GUIs. Even the venerable ancients, SAS and SPSS, have evolved point-and-click faces (although you can still write code if you want). Now you can run even the most complex statistical analysis in less time than it takes to drink a cup of coffee.

The maturation of the Internet also created many new opportunities. You no longer need to have access to a huge library of books to do a statistical analysis. There are thousands of websites with reference materials for statistics. Instead of purchasing one expensive reference, you can now consult a dozen different discussions on the same topic, free. If you find a book you want to keep as a handy reference, you can buy electronic access to it. No dead trees need clutter your office. If you can’t find a reference book with what you want, there are discussion groups where you can post your questions. Perhaps most importantly, though, data that would have been difficult or impossible to obtain a decade ago are now just a few mouse clicks away. It’s almost as if some great intelligence designed things to happen this way. Uh huh.

So, with computer sales skyrocketing and the Internet becoming as addictive as crack, it’s not surprising that the use of statistics might also be on the increase. Consider the trends shown in this figure. The red squares represent the number of computers sold from 1981 to 2005. The blue diamonds, which follow a trend similar to computer sales, represent revenues for SPSS, Inc., the makers of the software formerly known as Statistical Package for the Social Sciences. So, sales of at least one of the major pieces of statistical software have also grown substantially over the past decade. They probably all have.

Personal Computers, SPSS Revenues, and Presidential Polls.With the availability of more computers and more statistical software, you might expect that there may be more statistical analyses being done. That’s a tough trend to quantify, but consider the increases in the numbers of political polls and pollsters since the 1990s.

Before 1988, there were on average only one or two presidential approval polls conducted per month. Within a decade, that number had increased to more than a dozen. In the figure, the green circles represent the number of polls conducted on presidential approval. This trend is quite similar to the trends for computer sales and SPSS revenues. Correlation doesn’t imply causation but sometimes it sure makes a lot of sense.

Perhaps even more revealing is the increase in the number of pollsters. Before 1990, the Gallup Organization was pretty much the only organization conducting presidential approval polls. Now, there are several dozen. These pollsters don’t just ask about Presidential approval, either. There are a plethora of polls for every issue of real importance and most of the issues of contrived importance. Many of these polls are repeated to look for changes in opinions over time, between locations, and for different demographics. And that’s just political polls. There has been an even faster increase in polling for marketing, product development, and other business applications. Even without including non-professional polls conducted on the Internet, the growth of polling has been exponential. So, there should be no doubt that there are many more statistical analyses being done today than even a decade ago.

Times change. There are no more elevator operators because untrained riders can just press a button and the doors close automatically. Gas station attendants, store cashiers, and bank tellers are being replaced by self-service mechanisms. Entertainers—actors, dancers, singers, and writers—compete for work with wannabes on reality TV shows like American Idol. Statistics too has changed. Data analysis is no longer the exclusive domain of professionals. Bosses who can’t program the clock on their microwave think nothing of expecting their subordinates to do all kinds of data analyses. So if there can be reality TV, why not reality statistics too? Are you ready for the challenge?

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.combarnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , , , | 7 Comments

Why Do I Have To Take Statistics?

When you were in school, you probably asked the question, why do I have to take statistics?” Your adviser told you: “because it’s required for the degree.” “But why,” you said “why would I ever need to use statistics?”

Everybody who has completed high school has learned some statistics. There are good reasons for that. Your class grades were averages of scores you received for tests and other efforts. Most of your classes were graded on a curve, requiring the concepts of the Normal distribution, standard deviations, and confidence limits. Your scores on standardized tests, like the SAT, were presented in percentiles. You learned about pie and bar charts, scatter plots, and maybe other ways to display data. You might even have learned about equations for lines and some elementary curves. So by the time you got to the prom, you were exposed to at least enough statistics to read USA Today. In college, you’ll find that most majors require some statistics. Why? Consider the following.

Statistics is an integral part of everyday life in America. Without statistics, there would be no U.S. Census, IRS audits, Nielsen ratings of TV shows, political polls, and consumer preference surveys. Our society couldn’t function without being able to figure out tax brackets, insurance rates, stock prices, and online matchmaking. We couldn’t predict the outcome of elections before the polls close. There would be no standardized tests, no ACT, GRE, TOEFL, MBTI, or CATs (MCAT, LCAT, PCAT, and VCAT). Amazon.com couldn’t tell us what we want to buy. Baseball announcers would have nothing to talk about between pitches. It would be anarchy.

If you’re still not convinced that you need to learn statistics, keep reading.

The use of statistics is common to almost all fields of inquiry—social and natural sciences, sports, business, education, library and information science, and even music and art. Its popularity is attributable at least in part to its applicability to any type of data. Statistical methods can be used for analyzing data based on natural laws, theories, or nothing in particular. If you can measure it, you can analyze it with statistics. If you’re creative enough, you can even analyze things you can’t measure very well.

So why do your advisors want you to take statistics? Here are a few of the reasons.

  • Statistics provide a starting point and a course of action—If you’re in the natural sciences, you’ll probably have some basic principles, laws, or at least theories to start with in analyzing data. Even some of those were discovered or verified by statistical observation. If you’re in the social sciences, business, economics, or most other fields, though, you’re got little to go on besides statistics. Anecdotes aren’t worth much. Statistics gives you a place to start by having you focus on the population, so you know what to sample, and the phenomenon, so you know what to measure and how to measure it. Once you have laid this groundwork, statistics has you define alternative hypotheses to weigh and provide a variety of methods to analyze the data.
  • Statistics give you more ways to analyze data— Statistics is a colossal workshop with more tools than you could ever use in a career. Statistics allows you to describe, correlate, detect differences, group, separate, reorganize, identify, predict, smooth, and model. And it’s not just the variety of tools for doing different things, there are also many tools for doing the same thing in different ways. Want to find the center of a data distribution? You can use the arithmetic mean, the geometric mean, the harmonic mean, trimmed and winsorized means, weighted means, the median, the trimean, or the mode. Each has its own special use, like the variety of types of screwdrivers used by a mechanic. With a statistician’s toolbox, you can gain far more insight from your data than you might from any other type of analysis.
  • Statistics examine both accuracy and precision—Any marksman will tell you that it’s not enough to be able to hit a target. You have to be able to hit it where you aim and do it consistency. That’s accuracy and precision. Many analytical techniques focus on accuracy and forget all about precision. But variability, uncertainty, and risk don’t go away by just ignoring them. Statistics is all about understanding variability.
  • Statistics examine both trends and anomalies—Most forms of analysis focus on finding similarities and patterns in data. Statistics, in particular, can be used to find linear and nonlinear trends, cycles, steps, shocks, clusters, and many other types of groupings. What’s more, statistics can be used to identify and explore divergent or anomalous cases, which don’t fit general patterns. Sometimes it is these outliers rather than the trends that reveal the information most crucial in an analysis.
  • Statistics tells you how much information you need—In data analysis, more is not always better. It’s not unusual to have too much data to make sense of using only graphs and tables. Statistics provides a variety of ways to help you decide about how many samples you need to achieve a certain objective. Statistics provides ways to judge the quality of the data and compensate for misleading variability. Statistics can also tell you if your data are redundant, and if so, provide ways to reassemble the data more efficiently.
  • Statistics provide standardization— You can usually convince people who are reviewing your work that your data analysis is legitimate because it uses well-known, professionally accepted, statistical procedures. Likewise, it’s easier to use statistics as the basis for any standardized procedures you specify that others use because most people know some statistics. For example, Government regulations frequently require the use of statistics to report and analyze data sets, such as crime rates, pharmaceutical effectiveness, environmental impact, occupational safety, public health, and educational testing.

So you see, statistics has a lot to offer you, whether there is a strong theoretical basis to your field of practice or not. That’s why your advisers want you to learn about it.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.combarnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , | 19 Comments

Stats With Cats: What’s inside

Stats With Cats is a great companion to any introductory textbook in statistics. You won’t find a lot of equations or descriptions of the central limit theorem, probability, and hypothesis testing. You can find that information in traditional statistical texts. What you will find are topics like data scrubbing, minimizing variance, model building, and critiquing statistical reports. You’ll need these skills to complete your own statistical analyses. Think of Stats with Cats as a textbook for Statistics 101.5.

Following the words of Samuel Johnson, Stats With Cats tries “to make new things familiar and familiar things new” by using graphics, examples, stories, quotes, songs, obscure cultural references, way too many analogies, and a little bit of humor. Hopefully, you’ll recognize which is which.

Stats With Cats will be available mid-January, 2011. Here are some of the other things you’ll find inside.

PART I. The Lost Treasures of Statistics 101
Chapter 1. Reality Statistics
Chapter 2. Data Speak
Chapter 3. Designer Datasets
Chapter 4. Hellbent on Measurement
Chapter 5. Catch an Error by the Tail
Chapter 6. The Zen of Modeling
Chapter 7. Assuming the Worst
Chapter 8. Perspectives on Objectives

PART II. Frisky Business
Chapter 9. The Statistical Do-It-Yourselfer
Chapter 10. Manage to Get It Right
Chapter 11. Weapons of Math Production
Chapter 12. Tales of the Unprojected

PART III. Is that a Dataset in your Pocket?
Chapter 13. In Search of … Variables
Chapter 14. Not-So-Simple Samples
Chapter 15. The Heart and Soul of Variance Control
Chapter 16. Functional File Formats

PART IV. Statistical Foreplay
Chapter 17. Getting the Numbers Right
Chapter 18. Getting the Right Numbers
Chapter 19. Kicking the Data Tires
Chapter 20. Teaching Old Data New Tricks

PART V. A Model for Modeling
Chapter 21. Modelus Operandi
Chapter 22. The Land Beyond Statistics 101
Chapter 23. Models and Sausages

PART VI. Saving the World One Analysis at a Time
Chapter 24. Grasping at Flaws
Chapter 25. The TerraByte Zone

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.combarnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , , , , , | 1 Comment