In theory, if you have the free time, you can calculate any statistic you might need using nothing more than a pencil and paper. After all, it’s just matrix mathematics. With a lot of data or a complicated procedure, though, you might need a lot of free time. A generation ago, that’s how most statistics were calculated. Most people didn’t have computers, or calculators for that matter. Slide rules … maybe. Now, there is an abundance of hardware and software to ease the tedium. Having a statistician’s version of Norm Abram’s workshop to use actually makes analyzing data a lot of fun.
Whether you’re planning a career in statistics or just looking to analyze your current dataset, you’re going to need software to do the calculations. Yes, there are some people who still calculate descriptive statistics manually, but this practice is so prone to errors that it’s only applied to very small datasets. And yes, there are some people who develop their own statistical routines, usually with R, a programming language for statistics available for free under a General Public License, or matrix manipulation software like matlab, maple and mathematica. Unless you’re a mathematical statistician developing a new statistical technique, though, you won’t need to take this approach if you don’t want to. There’s plenty of software available. All you need to know is the kind of statistical analyses you’re likely to use and your price range.
Software for General Statistics
With a few exceptions, almost all of the statistical software you’ll find is geared to the most common types of statistical analysis, including descriptive statistics, hypothesis testing, correlation and regression, and analysis of variance. Software used for statistical analysis can be grouped into five categories:
- Web-based Calculators—Web sites that perform simple statistical calculations can be found at statpages.org/. This is the low end of cost, but also usability. You usually have to enter your data and edit it manually, so it’s not really suitable for production work.
- Spreadsheets—You probably already have a copy of Microsoft Excel or some other spreadsheet software on your computer. If you are a beginner at data analysis, you’ll find that you can accomplish most of what you want to do using spreadsheet software. Advanced data analysis may be more of an issue, though. Some statisticians advise against using spreadsheet software, particularly Excel, citing three reasons. First, Excel doesn’t do some calculations and graphs that statistical packages do. Well, of course it doesn’t. It’s a spreadsheet program that sells for less than $200 (by itself, not part of Office) compared to statistical packages that cost ten times as much. Big deal. Second, Excel’s calculated probabilities are incorrect, reportedly in the third decimal place. OK, but if you would base a decision solely on whether a probability is 0.051 instead of 0.049, you really don’t understand the nature of statistical testing (more on this in another blog). And third, Excel’s random number generators are not of research quality. Yup, so if you’re planning to do Monte Carlo simulations with Excel … well, don’t (not necessarily because your answer will be wrong as much as because some people will think it is wrong).
- Basic Statistical Software—This category includes software that is used mainly for less sophisticated types of statistical analysis. Most can be purchased for less than about $500. Key examples include StatsDirect, In Stat, Analyze It, and Assistat.
- Intermediate Statistical Software—This category includes software that can be used for many types of statistical analysis except some of the more sophisticated techniques like multivariate analysis. Most but not all are a single module and cost less than about $1,000. Examples include NCSS, Statistix, Costat, Origin, Prostat, Soritec, MVSP, and Simstat.
- Major Statistical Packages—This category includes software that can be used for a variety of purposes. Most have a base module and a variety of optional add-on modules. They are usually purchased through annual licenses specifying a number of users, and cost more than about $1,000 (in some cases, way over). Some of the major packages like SAS and SPSS have been around since the mainframe days of the 1960s. Others like Statistica are products of the 1980s development of personal computers. Other examples include S-Plus, Stata, Systat, Minitab, and Statgraphics.
Data analysis programs typically have spreadsheet screens for data because statistical calculations use matrices, and after all, a spreadsheet is really just a matrix. They also have utilities for both data management and graphing, which are essential for any type of data analysis. Most all statistical software has graphical user interfaces (GUIs) and many also allow you to write your own code for specialized applications. Almost all have downloadable demos, usually fully functional (at least for basic statistics) for 30 days.
To conduct an analysis with statistical software, you enter or upload your data, scrub it (a whole other discussion), then pick from the program’s menus the graphing or analysis procedure you want to run. Submenus will pop up with all the specifications and options for the procedure. So, it’s quite easy to do a lot of statistical analyses with just a few mouse clicks but you really have to understand what all those specifications and options are about.
All of the software packages have their fans, especially the major packages. SPSS was created in the 1960s by graduates of Stanford who continued development at the University of Chicago. It used to be called Statistical Package for the Social Sciences, which is why it’s still very popular in the social sciences. SPSS was bought by IBM in 2009. SAS, formerly called the Statistical Analysis System, was developed in the early 1970s by professors at North Carolina State University. S-Plus started out as a programming language developed by Bell Laboratories in the 1980s. Minitab was created by professors at the Pennsylvania University in the 1970s from statistical spreadsheet software developed at the National Institute of Standards and Technology (NIST). It’s now focusing on Six Sigma statistics procedures for managing quality.
There is no real best statistical software. They’re all pretty good, dollar-for-dollar. A lot of what determines a user’s preference is what software is (was) available at their college or the place they work. For example, if you go (went) to Penn State, you probably think Minitab is the best. If you work at a pharmaceutical company, you probably use SAS because that’s what the entire pharmaceutical industry uses. Social scientists like to use SPSS. If you like programming your own procedures you’re probably a proponent of the R programming language for statistics.
Assuming you don’t have access to software through your school or work, you can evaluate your software needs by answering three questions:
- How sophisticated are the statistical techniques you need to use?
- How often would you likely need to use the software?
- How much do you have to spend for the software?
If you are planning on doing only one analysis, see if you can use what you have. You may be able to do all your calculations in a spreadsheet program or use free software or web-based software. If you are going to do full-time statistical consulting and you can’t afford a license for a major package, bite the bullet and learn R. Another option would be to buy a basic or an intermediate package and move up as you can afford to. If you’re only going to be an occasional user, any of the statistical packages will be better than using a spreadsheet (except perhaps for dataset scrubbing), so purchase whatever you can afford.
If you aren’t acquainted with statistical software, conduct a web search or start at en.wikipedia.org/wiki/List_of_statistical_packages. Explore the web sites you find to be sure that the software has the statistical procedures you think you will be using. Almost all of the sites have free downloads, such as brochures, white papers and demonstration software. Don’t download the demo software until you’re ready to make a decision. Most demos are good for only 30 days after which the software won’t work even if you download a new copy.
Software for Specialized Applications
There are a few kinds of analysis you might run into that will require specialized software. For example, have you ever seen an icon plot using sparklines or Chernoff faces? How about a ternary diagram or a piper plot? Some day you may have to produce one of these specialized graphics. Software you could look into would include: Sigmaplot, Origin, AquaChem, GraphPad, EasyPlot, Delta Graph, and Grapher.
If you ever have to do time-series analysis, you could start with some of the high-end statistical packages. Or, you could look into specialized software including Autobox, Eviews, ForecastX, and RATS. If you have to produce maps, find a GIS expert to help you. If you’re committed to doing it yourself, try Surfer. If you’re not into meteorology or geology, you probably don’t run into orientation data very often, but if you ever do, get Oriana. For critical-path scheduling, try Microsoft Project or P5, an update to Primavera Project Planner, now a product of Oracle. There’s also software for resampling statistics, control charts, ANOVA, neural networks, nonparametric statistics, power analysis, Bayesian statistics, data mining and many other specialties.
The software market changes rapidly. The big packages keep getting bigger, spawning optional modules from procedures that used to be part of the basic package. At the same time, new statistical software appears, usually for specialized application. Spreadsheet software is also becoming more sophisticated. Introductory statistics classes are now taught with spreadsheet software; even calculators are a thing of the past. So do some research and get the software that’s best for your situation.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.
I’m glad you mentioned R. If you’re even remotely interested in doing statistical work in academia, R is definitely the way to go. It’s free, not difficult to learn, and has a huge community supporting it. Rarely will you have to write your own functions as there are thousands of packages available for any sort of statistics you’ll need to do.
My two cents.
BTW, since you didn’t supply any links, here are a few (for R, that is):
R project home page:
Rseek (a good search engine for R material):
R help mailing list (where to find answers)
R journal (the official publication of R related articles):
R bloggers (an unofficial publication of R articles, from over 75 bloggers)
your sharing, it’s very useful
After reading you site, Your site is very useful for me .I bookmarked your site!
Pingback: Ten Tactics used in the War on Error | Stats With Cats Blog
Pingback: HOW TO WRITE DATA ANALYSIS REPORTS. LESSON 1—KNOW YOUR CONTENT. | Stats With Cats Blog
Very good info. Lucky me I recently found your site by chance (stumbleupon).
I’ve saved as a favorite for later!
May I just say what a comfort to uncover someone who genuinely
understands what they are talking about online. You actually know how to bring an
issue to light and make it important. A lot
more people really need to read this and understand this side of the story.
I can’t believe you are not more popular since you certainly possess the gift.
I’m amazed, I have to admit. Seldom do I come across a blog that’s both equally educative and amusing, and without a doubt, you
have hit the nail on the head. The problem is something which too few people are
speaking intelligently about. I’m very happy
that I found this during my hunt for something concerning this.
Greetings from Florida! I’m bored at work so I decided to browse your site on my iphone during lunch break.
I love the information you present here and can’t wait to take a look when I get home.
I’m surprised at how fast your blog loaded on my phone ..
I’m not even using WIFI, just 3G .. Anyhow, good site!
Appreciating the time and energy you put into your site and detailed information you provide.
It’s awesome to come across a blog every once in a while that isn’t the same unwanted rehashed
material. Great read! I’ve saved your site and I’m adding your RSS feeds to my Google account.
I do not even know how I ended up right here, however I believed this put up was great.
I don’t recognize who you’re but definitely you are going to a famous blogger if you are not already.
You need to be a part of a contest for one of the most useful blogs online.
I’m going to recommend this blog!
Pingback: Ten Ways Statistical Models Can Break Your Heart | Stats With Cats Blog
Pingback: How to Write Data Analysis Reports in Six Easy Lessons | Stats With Cats Blog
Pingback: Searching for Answers | Stats With Cats Blog
Pingback: How to Analyze Text | Stats With Cats Blog