The Zen of Modeling

What’s the first thing you think of when you hear the word model? The plastic model airplanes you used to build? A fashion model? The model of the car you drive? The person who is your role model? But what do any of those things have to do with data analysis? Read on; you’re about to find that statistical analyses begin and end with models.

By Any Other Name

What do a Ford Focus, a plastic airplane, and Tyra Banks have in common? They are all called models. They are all representations of something, usually an ideal or a standard.

This is supposed to be a model of ME!

Models can be true representations, approximate (or at least as good as practicable), or simplified, even cartoonish compared to what they represent. They can be about the same size, bigger, or most typically, smaller, whatever makes them easiest to handle. They usually represent physical objects but can also represent a variety of phenomena, including conditions such as weather patterns, behaviors such as customer satisfaction, and processes such as widget manufacturing. The models themselves do not have to be physical objects either. They can be written, drawn, or consist of mathematical equations or computer programming. In fact, using equations and computer code can be much more flexible and less expensive than building a physical model.

Customarily, models are used either to:

Display what they represent (e.g., model airplanes) or are associated with (e.g., fashions)
Substitute for incomplete real world data, such as using the Normal distribution as a surrogate for a sample distribution.
Manipulate their components to learn more about the things they represent (e.g., scientific models for planetary motion).

Whether you know it or not, you deal with models every day. Your weather forecast comes from a meteorological model, maybe several. Mannequins are used to display how fashions may look on you. Blueprints are drawn models of objects or structures to be built. Examples are plentiful.

Examples of Physical Models

Humans, in particular, are modeled all the time because of our complexity. Children play with dolls as models of playmates. Mannequins are simplified models of fashion models, who in turn, are models of people who might wear a fashion designer’s wares. Posing models provide reference points for artists. Crash test dummies reveal how the human body might react in an automobile accident. Medical researchers use laboratory animals in place of humans for basic research. Medical schools use donated cadavers as models, very good ones as it turns out, of the human anatomy. So, there should be nothing unfamiliar or intimidating about models.

Whether it is a physical scale-model of a hydroelectric dam or a mathematical model of weather patterns, a model is nothing more than a tool used to stimulate the imagination by simulating an object or phenomenon. The model airplane takes its young pilot looping through the blue skies of a summer day. Globes teach geography and orreries teach planetary motion. The mannequin shows the bride-to-be how beautiful she’ll look in the gown at her wedding. The concept car unveiled today gives consumers an idea of what they may be driving in a few years. The National Hurricane Center uses over a dozen mathematical models to forecast the intensities and paths of tropical storms and to help understand the complex dynamics of hurricanes.

It should come as no surprise, then, that scientists, engineers, and mathematicians use models, especially virtual models, all the time. It may be surprising, though, that virtual models are also used extensively in business, economics, politics, and many other fields. Nevertheless, there is a mystique associated with modeling, especially the mathematical variety. Some believe that models are infallible and unchanging. Some believe that models are impossibly complex and necessarily unfathomable. Some believe that models are sophisticated delusions for obfuscating real data. In reality, none of these opinions is correct, at least entirely.

A Medley of Numbers

Mathematical models can be either theoretical (i.e., derived mathematically from scientific principles) or empirical
(i.e., based on experimental observations). For example, celestial movements and radioactive decay are phenomena that can be evaluated using theory-based models. To calibrate a theoretical model, the form of the model (i.e., the equation) is fixed and the inputs are adjusted so that the calculated results adequately represent actual observations.

Empirical models differ from theoretical models in that the model is not necessarily fixed for all instances of its use. Rather, empirical models are developed for specific situations from measured data. Model formulation and calibration are simultaneous. However, the selection of the form of the equation and the inputs used in an empirical model are usually based on related theories. Models developed using statistical techniques are examples of empirical models.

Empirical models can also be deterministic, stochastic, or sometimes a hybrid of the two. Deterministic empirical models presume that a specific mathematical relationship exists between two or more measurable phenomena (as do theoretical models) that will allow the phenomena to be modeled without uncertainty under a given set of conditions (i.e., the model’s inputs and assumptions). Biological growth models are examples of deterministic empirical models.

Both theoretical models and deterministic empirical models provide solutions that presume that there is no uncertainty. These solutions are termed “exact” (which does not necessarily imply “correct”). Conversely, stochastic empirical models presume that changes in a phenomenon have a random component. The random component allows stochastic empirical models to provide solutions that incorporate uncertainty into the analysis.

Statistical models are examples of stochastic empirical models in which the model equation is generated by quantifying and minimizing errors (i.e., uncertainty). Statistical models place great emphasis on examining and quantifying uncertainty, whereas theoretical models generally do not.

OK, that’s way more than you need to know. Let me simplify. Mathematical models are based on theories or observations or both. They can produce a single (exact) answer for a set of inputs by assuming there is no variability or a range of (inexact) answers by incorporating the variability into the model.

For example, distribution models are equations that produce exact solutions for the equation curve. The model describes what your data frequency would look like if your sampling were a perfect representation of the population. So if your data follow a particular distribution model, you can use the model instead of your data to estimate the probability of a data value occurring. This is the basis of parametric statistics; you evaluate your data as if they came from a population described by the model. (In contrast, nonparametric statistics use your data instead of an exact model to estimate the probability of a data value occurring.) It’s like building a sand castle. A distribution model is like a bucket you can fill with sand (data) to create the castle (the result) with great efficiency. Without the model serving as a substitute, it takes more effort (data) to completely shape the castle.

Statistical analyses involving descriptive statistics and testing rely on exact mathematical models like the Normal distribution to represent data frequencies and error rates. Just as importantly, though, statistical techniques are used to build models from data. Such statistical models include an error term to incorporate the effects of variation, and thus, are inexact because they produce a solution that is a range of possible values. Statistical analyses involving detecting differences, prediction, or exploration involve using statistics to estimate the mathematical coefficients, the parameters, of a model.

So, models and statistics are closely intertwined. Statistical analyses begin and end with models. Models serve as both inputs and outputs of statistical analyses. You can’t do without them, so you might as well understand what they are.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.com, barnesandnoble.com, or other online booksellers.