The Measure of a Measure

If you can measure a phenomenon, you can analyze the phenomenon. But if you don’t measure the phenomenon accurately and precisely, you won’t be able to analyze the phenomenon accurately and precisely. So in planning a statistical analysis, once you have specific concepts you want to explore you’ll need to identify ways the concepts could be measured.

All feline phenomena should be measured with appropriate scales.

Start with conventional measures, the ones everyone would recognize and know what you did to determine. Then, consider whether there are any other ways to measure the concept directly. From there, establish whether there are any indirect measures or surrogates that could be used in lieu of a direct measurement. Finally, if there are no other options, explore whether it would be feasible to develop a new measure based on theory. Keep in mind that developing a new measure or a new scale of measurement is more difficult for the experimenter and less understandable for reviewers than using an established measure. Say, for example, that you wanted to assess the taste of various sources of drinking water. You might use standard laboratory analysis procedures to test water samples for specific ions known to affect taste, like iron and sulfate. These would be direct measures of water quality. An example of an indirect measure would be total dissolved solids, a general measure of water quality that responds to many dissolved ions besides iron and sulfate. An example of a surrogate measure would be the water’s electrical conductivity, which is positively correlated to the quantity of dissolved ions in the water. Electrical conductivity is easier and less expensive to measure than dissolved solids, which is easier and less expensive to measure than specific analytes like iron and sulfate. Developing a new measure based on theory might also be useful. Sometimes it’s beneficial to think out of the box. That’s how sabermetrics got started. So for example, you might use professional taste testers to judge the tastes of the waters. Or, more simply, you might conduct comparison surveys of untrained individuals. Clearly, what you measure and how you measure it will have a great influence on your findings.

Of the possible measures you identify, select scales of measurement and consider how difficult it would be to generate accurate and precise data. Measurement bias and variability are introduced into a data value by the very process of generating the data value. It’s like tuning an analog radio. Turn the tuning dial a bit off the station and you hear more static. That’s more variance in the station’s signal. Every measurement can be thought of consisting of three elements:

  • Benchmark – The accepted standard against which a data value is made. Scientific instruments, meters, rulers, scales, comparison charts, and survey question response options are all examples of measurement benchmarks.
  • Processes – Repetitive activities that are conducted as part of generating a data value. Equipment calibration, measurement procedures, and survey interview scripts are all examples of measurement processes.
  • Judgments – Decisions made by the individual to create the data value. Examples of measurement judgments include reading instrument scales, making comparisons to visual scales, and recording survey responses.

Consider the examples of data types shown in the following table. For any particular data type, all three of these elements change over time. Benchmarks change when new measurement technologies are developed or existing meters, gauges and other devices become more accurate and precise. Standardized tests, like the SAT, change to safeguard the secrecy of questions. Likewise, processes change over time to improve consistency and to accommodate new benchmarks. Judgments improve when data collectors are trained and gain work experience. Such changes can create problems when historical and current data are combined because variance differences attributable to evolving measurement systems can produce misleading statistics.

Understanding these three facets of measurements is important because it will help you select good measures and measurement scales for a phenomenon, as well as decide how to control extraneous variability in data collection. For example:

  • Qualities are usually more difficult to measure accurately and consistently than quantities because there is more complex judgments involved.
  • Counts are straightforward when they involve simple judgments as to what to count. Some judgments, such as species counts, though, can be relatively complex. Counts have no decimals and no negative numbers.
  • Amounts are usually more difficult to measure than counts because the judgment process is more complex. Amounts have decimals but no negative numbers unless losses are admissible.
  • Ratio measures, such as concentrations, rates, and percentages, are usually more difficult to measure than amounts because they involve two or more amounts. Ratio measures have both decimals and negative numbers.

There’s a special type of analysis aimed at evaluating measurement variance called Gage R&R. The R&R part refers to:

  • Repeatability — the ability of the measurement system to produce consistent results. The focus of repeatability is on the benchmark and process portions of the measurement system. Testing for repeatability involves using the same subject or sample, the same characteristic or other variable, the same measurement device or instrument, the same environmental setting or conditions, and the same researcher to make the measurements.
  • Reproducibility — the ability of the measurement system and the people making the measurements to produce consistent results. The focus of reproducibility is on the entire measurement system. By comparing reproducibility to repeatability, the effects of the judgments made by the people making the measurements can be assessed. Testing for reproducibility involves using the same sample, characteristic, measurement instrument, and environmental conditions, but using different researchers to make the measurements.

Gage R&R is a fundamental type of analysis in industrial statistics, where meeting product specifications requires consistent measurements, but it can be used for any measurement system from medical testing to opinion surveys.

Finally, take into account your objective and the ultimate use of your statistical models. For example, if you want to predict some dependent variable, quantitative independent variables would usually be preferable to qualitative variables because they would provide more scale resolution. Furthermore, you could dumb down a quantitative variable to a less finely divided scale or even a qualitative scale but you usually can’t go in the other direction. If you want your prediction model to be simple and inexpensive to use, don’t select predictors that are expensive and time-consuming to measure.

Consider building some redundancy into your variables if there is more than one way to measure a concept. Sometimes one variable will display a higher correlation with your model’s dependant variable or help explain analogous measurements in a related measure. For example, redundant measures are often included in opinion surveys by using differently worded questions to solicit the same information. One question might ask “Did you like [something]?” and then a later question ask “Would you recommend [something] to your friends?” or “Would you use [something] again in the future?” to assess consistency in a respondent’s opinion about a product.

Finally, take into account your objective and the ultimate use of your statistical models. For example, if you want to predict some dependent variable, quantitative independent variables would usually be preferable to qualitative variables because they would provide more scale resolution. Furthermore, you could dumb down a quantitative variable to a less finely divided scale or even a qualitative scale but you usually can’t go in the other direction. If you want your prediction model to be simple and inexpensive to use, don’t select predictors that are expensive and time-consuming to measure.

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at Wheatmark, amazon.combarnesandnoble.com, or other online booksellers.

About statswithcats

Charlie Kufs has been crunching numbers for over thirty years. He retired in 2019 and is currently working on Stats with Kittens, the prequel to Stats with Cats.
This entry was posted in Uncategorized and tagged , , , , , , , , , , , . Bookmark the permalink.

4 Responses to The Measure of a Measure

  1. Pingback: Ten Fatal Flaws in Data Analysis | Stats With Cats Blog

  2. Pingback: The Santa Claus Strategy | Stats With Cats Blog

  3. Pingback: Searching for Answers | Stats With Cats Blog

  4. Pingback: 35 Ways Data Go Bad | Stats With Cats Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s