Some people are content with visualizing their data sets using pie charts and bar charts. If you really want to analyze data, though, you need to know how to pick the best graph for the job. Here’s the first step in doing just that.
Creating a professional quality graph requires thinking about three elements:
- Foundation — The fundamental type of graph selected on the basis of the characteristics of the variables to be plotted.
- Framework — The specifications of the coordinate system, the axes, the points being plotted, and the aim of the graph.
- Appearance — The labeling and other details that make the chart easier to understand.
This blog concerns the first of the three elements — the foundation of a graph.
Variables and Samples
To understand graphs, you have to understand two terms — variables and samples. Variables contain the pieces of information, the data, you collect from or about each of your samples. Variables are also called measures, metrics, scores, and attributes. Variables are represented by the axes of your graph.
A sample in statistics is usually thought of as a portion of a population. In graphing, samples are the individual pieces of a statistical sample-of-a-population. These individual pieces might be referred to as observations, subjects, patients, students, objects, items, measurements, entities, records, cases, or individuals. There are many terms for samples because there are many different types of populations. Medical studies have samples who are patients but may also collect blood (or other types of) samples. Education studies have samples that are students, classes, schools, and so on. Environmental studies may collect samples of air, soil or rock, plants or animals, or water. Samples in sports are usually players, athletes, or teams. Industrial samples are usually items or products. Get the idea? Samples are the points that will appear on your graph.
So in a sentence, variables determine where on a graph the samples are to be placed.
The first thing you’ll need to do to create a professional graph is to select the foundation — the underlying type of graph. This foundation is based on the scales on which the variables are measured.
Scales describe the relationship between successive levels of a measurement of a variable. Just as there are many different types of musical scales based on the intervals between steps of pitch, there are different measurement scales based on the intervals between scale values. For graphing, there are three types of scales to consider:
- Nominal scales describe categories with no mathematical relationship with each other.
- Ordinal scales have ordered categories, like counts, but the intervals between the steps may or may not be equal.
- Continuous scales have no breaks between steps and the intervals between steps are equal.
Think of nominal scales as stepping stones, they can be arrayed in any manner. Ordinal scales are like stairs; steps occur in an order with gaps between them. Continuous scales are like ramps having a smooth and continuous transition from low to high.
Take Olympic boxing as an example. Gold Medalist Katie Taylor of Ireland is a 132-pound (continuous scale) lightweight (125-132 pound interval of an unequal-interval ordinal scale) woman (nominal scale). Another example is the length of a football (American) game. Games are divided into four quarters (equal-interval ordinal scale) each fifteen minutes long (continuous scale, though the fractions of minutes are converted to seconds), which in real time typically lasts about four hours (continuous scale).
Time scales are especially important in graphing because they are the basis for so many statistical relationships. Time is always measured on at least an ordinal scale, but the intervals between time periods are not always equal. Geologic time, for example, has more to do with rock layers and fossils than with age even though the unit of the time scale is years. Real time can be measured in years, months, weeks, days, hours, minutes, and seconds. As long as you don’t mix the units, the intervals will be equal.
Plotting ordinal scales can be problematical when the intervals the scales represent are not equal. Worse, there are cases in which the sizes of the intervals are unknown. Consider these examples:
|Equal intervals:||Counts, annual measurements|
|Unknown intervals:||Likert scales, many biomedical and psychometric scales|
|Unequal intervals:||Geologic time, mineral hardness, weight classes in sports, civil service pay grades|
Ordinal scales are often used for describing obscure concepts like satisfied-unsatisfied and good-bad which are used often on opinion surveys. No one can say whether differences between the scale levels are equal or not. Still, it may appear that way when graphed leading to misleading interpretations.
Changing the scale of a variable to be graphed is feasible in most cases and useful in some, but not always a good idea. That’s because by changing scales you are changing the information content of the variable, and hence, it’s meaning.
Converting a continuous scale to an ordinal scale leaves information by the wayside. You reduce precision, and sometimes, accuracy. Still, there are reasons for doing so, most typically, to reduce extraneous measurement variability.
Katie Taylor’s lightweight weight class is a good example. If Katie (or anyone else for that matter) were to weigh themselves repeatedly over the course of a day, they would find that their weight would fluctuate by a few pounds just due to normal bodily processes, like sweating. An intense workout can reduce weight by several pounds while rehydrating by drinking fluids will add weight. Consequently, using an ordinal scale weight class can reduce the apparent variability. Katie’s weight may change but not her weight class. Using an ordinal scale can also facilitate comparisons. It is much easier to find Katie a lightweight opponent than it would be to find her a 132-pound opponent.
Histograms are another example of how converting from a continuous scale to an ordinal scale can facilitate comparisons, in this case, between a distribution of data measurements and a theoretical mathematical model. However, the interval you choose for the ordinal scale can greatly affect the appearance of the graph. There are better ways to compare sample distributions to theoretical models but they are more complicated than histograms.
Converting an ordinal scale to a continuous scale requires the addition of information. This can be a difficult but useful process because graphing is simpler if your variables are measured on continuous scales. Three commonly used conversion processes are:
- Standardization to percentages by dividing each measurement by the total of the measurements. The converted data will fall into the range 0% to 100%.
- Normalization to z-scores by subtracting the man of a variable from each measurement and then dividing by the variable’s standard deviation. Most of the converted data will be close to 0 and fall into the range -2 to 2.
- Indexing to a more representative value by dividing each measurement by values that facilitate comparisons. A good example is the Consumer Price Index.
There are many other ways to convert ordinal scales to continuous scales, for instance, by multiplying or dividing two ordinal scale variables. But, only make the conversion if it makes theoretical sense for your analysis.
Basic Types of Graphs
There are more kinds of graphs than most people, even data analysts, would ever need to know about, but they are really all just variations of a few basic graphs.
|Scatter plots:||Plots of two continuous-scale variables. Usually only points are shown without connecting line.|
|Line plots:||Plots with an ordinal-scale variable on the horizontal axis and an ordinal or continuous-scale variable on the vertical axis. Points usually are connected by lines.|
|Bar charts:||Plots with an ordinal or continuous-scale variable on the vertical axis and a nominal or ordinal-scale variable on the horizontal axis. Sometimes the axes are reversed. Data are usually represented by bars instead of points.|
Start with these basic types. As you consider the framework of the plots (i.e., axes, focus, objective, data dimensions, and priority) you’ll see how the basic graph types can easily take on many different appearances.
Scales and Types of Graphs
In graphing, selecting an appropriate type of graph depends on whether the variables you want to plot are measured on ordinal or continuous scales or both. Nominal scales don’t enter into selecting a type of graph even though they are used often in graphing, usually to compare groups of data.
Here is an overview of the relationships between variable scales and the types of graphs they can be used with.
First Variable Scale
Second Variable Scale
|Continuous||Ordinal: equal intervals||Line, scatter if the ordinal variable has many levels|
|Continuous||Ordinal: unequal intervals||Bars|
|Ordinal: equal intervals||Ordinal: equal intervals||Line, scatter if the ordinal variables have many levels|
|Ordinal: equal intervals||Ordinal: unequal intervals||Bars|
|Ordinal: unequal intervals||Ordinal: unequal intervals||Bars|
|HINT: Start simply, usually with no more than two variables. Experiment with the framework and appearance of the graph to make it tell the story.|
Once you understand the scales of your variables, you can choose the basic kind of graph that will work best with your data. But remember, there are other types of graphs, subspecies of the basic graphs, variations and extensions of these graphs, combinations of graphs, and graphs that go by a variety of different names. For now, focus on the basic types of graphs. Make sure you start with a good foundation.
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.
Thanks, great article and a reference that I will use again. One thing I was wondering is: when is it appropriate to transform the scale using the log function?
Take a look at “Fifty Ways to Fix Your Data”
Not everyone would agree, but I think if the log-transformed scale works better than the original scale (statistical artifacts aside), it should be used. After all, no one knows what scale nature (or God or whatever) used to create the phenomenon. A good example is pH.
Thanks. I just read your “Fifty Ways to Fix Your Data” article and it was also excellent. I’m going to keep the table at the end for reference when I do analysis in the future, as it is a great summary of the possible data transformation options.
That is allot of cats, if I half to say so.
Pingback: How to Write Data Analysis Reports. Lesson 3—Know Your Route. | Stats With Cats Blog
Pingback: How to write data analysis reports. Lesson 3—know your route. – The Future of Market Analysis
Pingback: How to Write Data Analysis Reports in Six Easy Lessons | Stats With Cats Blog
Pingback: Searching for Answers | Stats With Cats Blog
Pingback: How to write data analysis reports. Lesson 3—know your route. – Big Data Made Simple – One source. Many perspectives.
Pingback: The Evolution of Data Science … As I Remember It | Stats With Cats Blog
Pingback: The evolution of Data Science … as I remember it - Pye AI News