Looking for Insight through a Window

Black_cat_on_window At a press briefing on February 12, 2002, then Secretary of Defense Donald Rumsfeld addressed the absence of evidence linking the government of Iraq with weapons of mass destruction:

There are known knowns. There are things we know that we know. There are known unknowns. That is to say, there are things that we now know we don’t know. But there are also unknown unknowns. There are things we do not know we don’t know.

Now, despite the statement being a transparently irresponsible attempt to cover up a monumental failure in the collection and analysis of information or just a REALLY BIG LIE, the statement actually makes some sense. Similar words have been attributed to Confucius and others. But whether he realized it or not, Mr. Rumsfeld was describing a type of data analysis window.

Analytical windows are a type of matrix plot. Matrix plots are just grids for organizing information. The cells of a matrix plot can contain data, tables, graphs, or text. Windows consist of two criteria, or dimensions, defined by rows and columns. Each dimension usually has two categories, or levels, resulting in four cells, or panes. Rumsfeld’s window would look like this:

	Things that We
	Know	Don’t Know
We Know	Things that we know we know.	Things that we don’t know we know.
We Don’t Know	Things that we know we don’t know.	Things that we don’t know we don’t know.

So, for example, a Rumsfeld Window could be used for planning a statistical study.

Things that we know we know would be things like background information on the study environment, the underlying theory on the phenomenon being explored, and the statistical characteristics of the population.
Things that we don’t know we know would be things like the statistical assumptions we make to perform the analysis — independence of observations, normality and homoscedasticity of errors.
Things that we know we don’t know would be things like the results of the research questions and test hypotheses we plan to focus on.
Things that we don’t know we don’t know would be things like the causes of outliers and other data and analysis anomalies.

The beauty of a window is the way it can organize sometimes complex information into simple binary categories. As a consequence, windows are used in many ways to analyze data.

Mom-and-Teen-Matrix-v3

Johari Windows

cat-window-1 A Johari Window is a tool used by psychologists to help individuals and groups evaluate interpersonal communications. Its name comes from the first names of Joseph Luft and Harry Ingham, who created it in 1955. To use the window, subjects are told to pick five or six adjectives they feel describe their own personality from a standard list of 56 adjectives. Peers of the subject are then given the same standard list of 56 adjectives, and each pick five or six adjectives that describe the subject. These adjectives are then paced in the appropriate pane of the Johari Window.

	Known to Self	Not Known to Self
Known to Others	Open	Blind
Not Known to Others	Hidden	Unknown

Johari windows were featured on a 2010 episode of the television series Fringe, which was seen by six million viewers, most of whom probably had no idea what they are.

Variance Windows

Windows can also be applied to planning how to control extraneous variance in the process of collecting data. If you plan to conduct a statistical analysis, you’ll need to understand the three fundamental Rs of variance control — Reference, Replication, and Randomization. Every measurement of a phenomenon includes characteristics of the population and natural variability as well as unwanted sampling variability, measurement variability, and environmental variability. You can’t understand your data unless you control extraneous variance attributable to the way you select samples, the way you measure variable values, and any influences of the environment in which you are working. Using the concepts of reference, replication and randomization, you can control, minimize, or at least be able to assess the effects of extraneous variability using: procedural controls; quality samples and measurements; sampling controls; experimental controls; and statistical controls.

	Sources of Variance that we
	Understand	Don’t Understand
Control	Sampling and measurement variance	Sampling and measurement variance, environmental variance
Don’t Control	Natural variance	Sampling and measurement variance, environmental variance

To use a window to plan a variance control program, fill the panes of the window with all the sources of variability you can think of, categorized by how well you understand the source and think you can control it. Then identify a control measure for each source of variation.

Pick Charts

cat in window A Pick Chart is a Lean Six Sigma tool for comparing difficulty of implementation (in terms of costs, effort, complexity, or time) to possible results (paybacks, returns, impacts, or improvements) for actions being considered. These two concepts serve as the axes of a data analysis window having four quadrants:

Possible – “ideas that are considered “low hanging fruit”. The effort to implement is low, but the impact is also low. These should only be implemented after everything in the “Implement” quadrant.”
Implement – “ideas that should be implemented as they will have a high impact and require low effort.”
Challenge – “ideas that should be considered for implementation after everything in the “Implement” column. The impact is high, but the effort is also high.”
Kill – “ideas that should be “killed” or not implemented. The effort to do so is high and the impact is low.”

Here’s an example involving the federal Employee Viewpoint Survey. In this pick chart, eighteen EVS question areas are compared according to:

Payoff from the actions being considered to improve EVS scores
Difficulty anticipated in successfully undertaking the actions.

Payoff was calculated (after scale adjustments) as the product of the score for a question and the decline in the scores from 2012 to 2014. Difficulty was based on: (1) who would have to be involved in implementing the change (i.e., many or few staff; in the main office or satellite offices; at staff, supervisor, or senior leader levels); (2) if existing programs or policies would be used or if they would have to be created; and (3) the funding required to implement the change. Payoff is based on actual EVS data so there is not much uncertainty. Difficulty is based on judgments concerning what generic actions might be taken to improve job satisfaction, so there is considerable uncertainty. Thus, the positions of the icons representing the EVS question areas are likely to shift horizontally, depending on the nature of specific projects being considered, but not vertically.

Performance Windows

154978-Cat-Watching-Rain-Out-Window A performance window is a way to convey the results of a statistical test or classification. It is a table with two rows and two columns that summarize the number of correct classifications (true positives and true negatives), and the number of misclassifications (false positives and false negatives). This type of window is also called a confusion matrix, an error matrix, or a matching matrix.

Here are performance windows for classifications and statistical tests.

		Predicted Classification
		A	B
Actual Classification	A	Correct Classification	Misclassification
Actual Classification	B	Misclassification	Correct Classification

		Statistical Test
		Null hypothesis is not rejected	Null hypothesis is rejected
Actual Condition	True	Correct Inference	False Positive – Type I Error
Actual Condition	False	False Negative -Type II Error	Correct Inference

A contingency table is a type of matrix plot, frequently for more than two levels on the dimensions or even more than two dimensions, which summarizes the occurrence of data. They are also called cross tabulation ‎tables.

Windows on Scatter Plots

The concept of dividing areas of information into more understandable parts can be extended to scatter plots. Plots can be divided into quadrants, for example, using the means (or medians) of the data points for each axis. In essence, the window is overlain on the scatter plot. The window can be subdivided further by standard deviations (or quartiles).

$English-Math 2$

The performance window for this scatter plot would be:

		Math Grade
		Below Average	Above Average
English Grade	Above Average	9	18	27
English Grade	Below Average	16	8	24
		25	26	51

Read more about using statistics at the Stats with Cats blog. Join other images fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data Analysis at amazon.com, barnesandnoble.com, or other online booksellers.