DARE TO COMPARE – PART 4

Posted on January 13, 2019 by statswithcats

testing 11-25-2018 small 2 Part 3 of Dare to Compare shows how one-population statistical tests are conducted. Part 4 extends these concepts to two-population tests.

To review, this flowchart summarizes the the process of statistical testing.

First, you PLAN the comparison by understanding the populations you will take a representative sample of individuals from and measure the phenomenon on. Then you assess the frequency distributions of the measurements to see if they approximate a Normal distribution.

Second, you TEST the measurements by considering the test parameters, the type of test, the hypotheses, the test dimensionality, degrees of freedom, and violations of assumptions.

Third, you review the RESULTS by setting the confidence, determining the effect size and power of the test, and assessing the significance and meaningfulness of the test.

Now imagine this.

You’re a sophomore statistics major at Faber College and you need to sign up for the dreaded STATS 102 class. The class is taught in the Fall and the Spring by two different instructors (Dr. Statisticus and Prof. Modearity) as either three, one-hour sessions on Mondays, Wednesdays, and Fridays, or as two, hour and a half sessions on Tuesdays and Thursdays. You wonder if it makes a difference which class you take. Having completed STATS 101, you know everything there is to know about statistics, so you get the grades from the classes that were taught last year. Here are the data.

table 4-1 2019-01-13_19-07-45

What class should you take to get the highest grade? Dr. Statisticus gave out the highest grades in the Fall; Prof. Modearity gave out a higher grade in the Spring. On the other side of the coin, only one person flunked (grade below 75) Dr. Statisticus’ classes but six people flunked Prof. Modearity’s classes. Three students flunked in the Fall while four students flunked in the Spring. Two people flunked TuTh classes and five people flunked MWF classes. This is complicated.

Looking at the averages, you think that taking Dr. Statisticus’ Tuesday-Thursday class in the fall would be your best bet. However, is a two or three point difference worth the class conflicts and scheduling hassles you might have? Does it really matter?

table 4-2 2019-01-13_19-09-03

Maybe it’s time for some statistical testing? But these would be two-population tests because you have to compare two semesters, two instructors, and two class lengths.

Two Population t-Tests

In a two-population test, you compare the average of the measurements in the first population to the average of the second population, using the formula:

equation 4-1 2019-01-13_19-09-47 - copy - copy

frightened kitten 6 This is a bit more complicated than the formula for a one-population test because you can have different standard deviations and different numbers of measurements in the two populations.

Here’s what’s happening. The numerator (top part of the formula) is the same in both t-test formulas. The leftmost term in the denominator calculates a weighted average of the variances, called a pooled variance.

equation 4-2 2019-01-13_19-10-44 - copy If the number of measurements taken of the two populations is the same, the test design is said to be balanced. If the variances of the measurements in the two populations are the same, the leftmost term in the denominator reduces to s². So, the formula for a balanced two-population t-test with equal variances is:

equation 4-4 2019-01-13_19-11-17 - copy Much more simple but not as useful as the more complicated formula. You might be able to control the number of samples from the populations but you can’t control the variances.

Once you calculate a t value, the rest of the test is similar to a one-population test. You compare the calculated t to a t-value from a table or other reference for the appropriate number of tails, the confidence (1- α), and the degrees of freedom (the number of samples in the sample of the population minus 1).

If the calculated t value is larger than the table t value, the test is SIGNIFICANT, meaning that the means are statistically different. If the table t value is larger than the calculated t value, the test is NOT SIGNIFICANT, meaning that the means are statistically the same.

2 pop nonsig nondir

2 pop sig nondir

Example

Back to the example. You want to compare the differences between semesters, instructors, and class days. You have no expectations for what the best semester, instructor, or class day would be. To be conservative, you’ll accept a false positive rate (i.e., 1-confidence, α) of 0.05. Your null hypotheses are:

ц_{Fall Semester}= ц_{Spring Semester}
ц_{Dr. Statisticus}= ц_{Prof. Modearity}
ц_MWF = ц_TuTh

Now for some calculations, first the semesters.

X_{Fall Semester}     = 84.0
X_{Spring Semester}  = 83.5
N_{Fall Semester}     = 33
N_{Spring Semester}  = 35
S²_{Fall Semester}   = 49.7 (S = 7.05)
S²_{Spring Semester} = 41.7 (S = 6.46)

equation 4-5 2019-01-13_19-12-18 - copy

And the tabled value is:

t_{(2-tailed, 0.05 confidence, 65 degrees of freedom)} = 1.997

You can do these calculations in Excel with the formula:

=T.TEST(array1,array2,tails,type)

Where type=3 is a t-test for two-samples with unequal variances. There are also a few online sites for the calculations, such as https://www.evanmiller.org/ab-testing/t-test.html, from which this graphic was produced.

semesters 2019-01-07_16-28-14

4 teacher 15a5s4 So there is no statistically significant difference between the Fall semester classes and the Spring semester classes.

Now for the instructors:

X_{Dr. Statisticus}    = 85.4
X_{Prof. Modearity}   = 82.0
N_{Dr. Statisticus}    = 35
N_{Prof. Modearity}   = 33
S²_{Dr. Statisticus}  = 37.5 (S = 6.12)
S²_{Prof. Modearity} = 48.5 (S = 6.96)

equation 4-6 2019-01-13_19-13-26

And the tabled value is:

t_{(2-tailed, 0.05 confidence, 66 degrees of freedom)} = 1.996

So there is a statistically significant difference between instructors. Dr. Statisticus gives higher grades than Prof. Modearity.

From www.evanmiller.org/:

instructor 2019-01-07_16-32-27

4 monday e8c828e1-7aaa-464d-8398-441da35e3184 Now for the days of the week:

X_MWF            = 82.4
X_TuTh                = 85.2
N_MWF            = 36
N_TuTh            = 32
S²_MWF               = 47.8 (S = 6.91)
S²_TuTh            = 39.4 (S = 6.28)

equation 4-7 2019-01-13_19-14-35

So there is no statistically significant difference between the one-hour classes on Mondays, Wednesdays, and Fridays and the hour-and-a-half classes on Tuesdays and Thursdays.

From www.evanmiller.org/:

days 2019-01-07_16-36-01 - copy - copy

4 part49763641_10213631891056765_4351658033923751936_n Here is a summary of the three tests.

table 4-5 2019-01-13_19-15-50

So take Dr, Statisticus’ class when ever it fits in your schedule.

ANOVAs

3- g1szbxu2qu21pboy71iba__vbf7ok3nzfdxnx0-ogik So what do you do if you have more than two populations or more than one phenomenon or some other weird combinations of data? You use an Analysis of Variance (ANOVA).

ANOVA includes a variety of statistical designs used to analyze differences in group means. It is a generalization of the t-test of a factor (called maineffect or treatments in ANOVA) to more than two groups (called levels in ANOVA). In an ANOVA, the variances in the levels of factors being compared are partitioned between variation associated with the factors in the design (called model variation) and random variation (called error variation). ANOVA is conceptually similar to multiple two-population t-tests, but produces fewer type I (false positive) errors. While t-tests use t-values from the t-distribution, ANOVAs use F-tests from the F-distribution. An F-test is the ratio of the model variation the error variation. When there are only two means to compare, the t-test and the ANOVA F-test are equivalent according tp the relationship F = t².

Types of ANOVA

There are many types of ANOVA designs. One-way and multi-way ANOVAs are the most common.

One-Way ANOVAs

One-way ANOVA is used to test for differences among three or more independent levels of one effect. In the example t-test, a one-way ANOVA might involve more than two levels of one of the three factors. For example, a one-way ANOVA would allow testing more than two instructors or more than two semesters.

Multi-Way ANOVAs

Multi-way ANOVAs (sometimes called factorial ANOVAs) are used to test for differences between two or more effects. A two-way ANOVA tests two effects, a three-way ANOVA tests three effects, and so on. Multi-way ANOVAs have the advantage of being able to test the significance of interaction effects. Interaction effects occur when two or more effects combine to affect measurements of the phenomenon. In the example t-test, a three-way ANOVA would allow simultaneous analysis of the semesters, instructors, and days, as well as interactions between them.

Other Types of ANOVA

There are numerous other types of ANOVA designs, some of which are too complex to explain in a sentence or two. Here are a few of the more commonly used designs.

Repeated Measures ANOVAs (also called as within-subjects ANOVA) are used when the same subjects are used for each treatment effect, as in a longitudinal study. In the example, if the scores for the students were recorded every month of the semester, it could be analyzed with a Repeated Measures ANOVA.

Some ANOVAs use design elements to control extraneous variance. The significance of the design elements is not important to the dependent variable so long as it controls variability in the main effects. If the design element is a nominal-scale variable, it is called a blocking effect. If the design element is a continuous-scale variable, it is called a covariate and the model is called an Analysis of Covariance (ANCOVA). In the example, if students’ year in college (freshman, sophomore, junior, or senior, an ordinal scale measure) were added as an effect to control variance, it would be a blocking factor. If students’ GPA (grade point average, a continuous scale measure) as a covariate, it would be a ANCOVA design.

Random Effects ANOVAs assume that the levels of a main effect are sampled from a population of possible levels so that the results can be extended to other possible levels. The Instructors main effect in the example could be a random effect if other instructors were considered part of the population that included Dr. Statisticus and Prof. Modearity. If only Dr. Statisticus and Prof. Modearity were levels of the effect, it would be called a fixed effect. If a design included both fixed and random effects, it is called a mixed effects design.

Multivariate analysis of variance (MANOVA) is used when there is more than one set of measurements (also called dependent variables or response variables) of the phenomenon.

Now What?

Dare to Compare is a fairly comprehensive summary of statistical comparisons. You may not hear about all of these concepts in Stats 101 and that’s fine. Learn what you need to to pass the course. Some topics are taught differently, especially hypothesis development and the normal curve. Follow what your instructor teaches. He or she will assign your grade.

Believe it or not, there’s quite a bit more to learn about all of the topics if you go further in statistics. There are special t-tests for proportions, regression coefficients, and samples that are not independent (called paired sample t-tests). There are tests based on other distributions besides the Normal and t-distributions, such as the binomial and chi² distributions. There are also quite a few nonparametric tests, based on ranks. And, of course, there are many topics on the mathematics end and o2n more metaphysical concepts like meaningfulness.

Statistical testing is more complicated than portrayed by some people but it’s still not as formidible as, say, driving a car. You might learn to drive as a teenager but not discover statistics and statistical testing until college. Both statistical testing and driving are full of intracacies that you have to keep in mind. In testing you consider an issue once, while in driving you must do it continually. When you make a mistake in testing, you can go back and correct it. If you make a mistake in driving, you might get a ticket or cause an accident. After you learn to drive a car, you can go on to learn to drive motorcycles, trucks, busses, and racing vehicles. After you learn simple hypothesis testing, you can go on to learn ANOVA, regression, and many more advanced techniques. So if you think you can learn to drive a car, you can also learn to conduct a statistical test.

3-end 3

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data analysis at amazon.com, barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged ANOVA, blogs, cats, populations, statistical comparisons, statistical tests, statistics, statswithcats, t-test | Leave a comment

DARE TO COMPARE – PART 3

Posted on December 1, 2018 by statswithcats

Parts 1 and 2 of Dare to Compare summarized fundamental topics about simple statistical comparisons. Part 3 shows how those concepts play a role in conducting statistical tests. The importance of these concept are highlighted in the following table.

Test Specification	Why it is Important
*Population*	Groups of individuals or items having some fundamental commonalities relative to the phenomenon being tested. Populations must be definable and readily reproducible so that results can be applied to other situations.
*Number of populations being compared*	The number of populations determines whether a comparison can be a relatively simple 1- or 2-population test or a complex ANOVA test.
*Phenomena*	The characteristic of the population being tested. It is usually measured as a continuous-scale attribute of a representative sample of the population.
*Number of phenomenon*	The number of phenomenon determines whether a comparison will be a relatively simple univariate test or a complex multivariate test.
*Representative sample*	A relatively small portion of all the possible measurements of the phenomenon on the population selected in such a way as to be a true depiction of the phenomenon.
*Sample size*	The number of observations of the phenomenon used to characterize the population. The sample size contributes to the determinations of the type of test to be used, the size of the difference that can be detected, the power of the test, and the meaningfulness of the results.
*Hypotheses*	You start statistical comparisons with a research hypothesis of what you expect to find about the phenomenon in the population. The research hypothesis is about the differences between the categories of the variable representing the population. You then create a null hypothesis that translates the research hypothesis into a mathematical statement that is the opposite of the research hypothesis, usually written in term of no change or no difference. This is the subject of the test. If you do not reject the null hypothesis, you adopt the alternative hypothesis.
*Distribution*	Statistical tests examine chance occurrences of measurements on a phenomenon. These extreme measurements occur in the tails of the frequency distribution. Parametric statistical tests assume that the measurements are Normally distributed. If the distribution is different from the tails of a Normal distribution, the results of the test may be in error.
*Directionality*	Null hypotheses can be non-directional or two-sided (i.e., ц=0), in which both tails of the distribution are assessed. They can also be nondirectional or one-sided (i.e., ц<0 or ц>0), in which only one tail of the distribution is assessed.
*Assumptions*	Statistical tests assume that the measurements of the phenomenon are independent (not correlated) and are representative of the population. They also assume that errors are normally distributed and the variances of populations are equal.
*Type of test*	Statistical tests can be based on a theoretical frequency distribution (parametric) or based on some imposed ordering (nonparametric). Parametric tests tend to be more powerful.
*Test Parameters*	Test parameters are the statistics used in the test. For t-tests using the Normal distribution, this involves the mean and the standard deviation. For F-tests in ANOVA, this involves the variance. For nonparametric tests, this usually involves the median and range.
*Confidence*	Confidence is 1 minus the false-positive error rate. The confidence is set by the person doing the test before testing as the maximum false-positive error rate they will accept. Usually, an error rate of 0.05 (5%) is selected but sometimes 0.1 (10%) or 0.01 (1%) are used, corresponding to confidences of 95%, 90%, and 99%..
*Power*	Power is the ability of a test to avoid false-negative errors (1-β). Power is based on sample size, confidence, and population variance and is NOT set by the person doing the test, but instead, calculated after a significant test result..
*Degrees of Freedom*	The number of values in the final calculation of a statistic that are free to vary. For a t-test, the degrees of freedom is equal to the number of samples minus 1.
*Effect Size*	The smallest difference the test could have detected. Effect size is influenced by the variance, the sample size, and the confidence. Effect size can be too small, leading to false negatives, or too large, leading to false positives.
*Significance*	Significance refers to the result of a statistical test in which the null hypothesis is rejected. Significance is expressed as a p-value.
*Meaningfulness*	Meaningfulness is assessed by considering the difference detected by the test to what magnitude of difference would be important in reality.

3-1 INTRO 3

Normal Distributions

After defining the population, the phenomena, and the test hypotheses, you measure the phenomenon on an appropriate number of individuals in the population. These measurements need to be independent of each other and representative of the population. Then, you need to assess whether it’s safe to assume that the frequency distribution of the measurements is similar to a Normal distributed. If it is, a z-test or a t-test would be in order.

Yes, this is scary looking. It’s the equation for the Normal distribution. Relax, you will probably never have to use it.

This figure represents a Normal distribution. The area under the curve represents the total probability of measured values occurring, which is equal to 1.0. Values near the center of the distribution, near the mean, have a large probability of occurring while values near the tails (the extremes) of the distribution have a small probability of occurring.

In statistical testing, the Normal distribution is used to estimate the probability that the measurements of the phenomenon will fall within a particular range of values. To estimate the probability that a measurement will occur, you could use the values of the mean and the standard deviation in the formula for the Normal distribution. Actually though, you never have to do that because there are tables for the Normal distribution and the t-distribution. Even easier, the functions are available in many spreadsheet applications, like Microsoft Excel.

Statistical tests focus on the tails of the distribution where the probabilities are the smallest. It doesn’t matter much if the measurements of the phenomenon follow a normal distribution near the mean so long as it does in the tails. The z-distribution can be used if the sample size is large; some say as few as 30 measurements and others recommend more, perhaps 100 measurements. The t-distribution compensates for small sample sizes by having more area in the tails. It can be used instead of the z-distribution with any number of samples.

The concept behind statistical testing is to determine how likely it is that a difference in two populations parameters like the means (or a population parameter and a constant) could have occurred by chance. If the probability of the difference occurring is large enough to occur in the tails of the distribution, there is only a small probability that the difference could have occurred by chance. Differences having a probability of occurrence less then a pre-specified value (α) are said to be significant differences. The pre-specified value, which is the acceptable false positive error rate, α, may be any small percentage but is usually taken as five-in-a-hundred (0.05), one-in-a-hundred (0.01), or ten-in-a-hundred (0.10).

Here are a few examples of what the process of statistical testing looks like for comparing a population mean to a constant.

maxresdefault

One Population z-Test or t-Test

All z-tests and t-tests involve either one or two populations and only one phenomenon. The population is represented by the nominal-scale, independent variable. The measurement of the phenomenon is the dependent variable, which can be measured using a nominal, ordinal, interval, or ratio scale.

For a one-population test, you would be comparing the average (or other parameter) of the measurements in the population to a constant. You do this using the formula for a one-population t-test value (or a z-test value) to calculate the t value for the test.

t-test equation

The Normal distribution and the t-distribution are symmetrical so it doesn’t matter if the numerator of the equation is positive or negative.

Then compare that value to a table of values for the t-distribution (for the appropriate number of tails, the confidence (1- α), and the degrees of freedom (the number of samples of the population minus 1). If the calculated t value is larger than the table t value, the test is SIGNIFICANT, meaning that the mean and the constant are statistically different. If the table t value is larger than the calculated t value, the test is NOT SIGNIFICANT, meaning that the mean and the constant are statistically the same.

Example

Imagine you are comparing the average height of male, high school freshmen, in Minneapolis school district #1. You want to know how their average height compares to the height of 9th to 11th century Vikings (their mascot), for the school newspaper. Turn-of-the-century Vikings were typically about 5’9” or 69 inches (172 cm) tall.

This comparison doesn’t need to be too rigorous. The only possible negative consequence to the test is it being reported by Fox News as a liberal conspiracy, and they do that to everything anyway. You’ll accept a false positive rate (i.e., 1-confidence, α) of 0.10.

Nondirectional Tests

Say you don’t know many freshmen boys but you don’t think they are as tall as Vikings. You certainly don’t think of them as rampaging Vikings. They’re younger so maybe they’re shorter. Then again, they’ve grown up having better diets and medical care so maybe they’re taller. Therefore, your research hypothesis is that Freshmen are not likely to be the same height as Vikings. The null hypothesis you want to test is:

Height of Freshmen = Height of Vikings

which is a nondirectional test. If you reject the null hypothesis, the alternative hypothesis:

Height of Freshmen ≠ Height of Vikings

is probably true of the Freshmen. Say you then measure the heights of 10 freshmen and you get:

63.2, 63.8, 72.8, 56.9, 75.2, 70.8, 68.0, 64.0, 61.4, 65.2

The measurements average 66 inches with a standard deviation of 5.3 inches. The t-value would be equal to:

(Freshmen height – Viking height) / ((standard deviation / (√number of samples)))

t-value = (66 inches – 69 inches) / (5.3 inches / (√10 samples))

t-value = -1.790

Ignore the negative sign; it won’t matter.

In this comparison, the calculated t-value (1.79) is less than the table t-value (t_{(2-tailed, 90% confidence, 9 degrees of freedom)} = 1.833) so the comparison is not significant. The comparison might look something like this:

1-pop nondir nonsig

There is no statistical difference in the average heights of Freshmen and Vikings. Both are around 5’6” to 5’9” tall. That isn’t to say that there weren’t 6’0” Vikings, or Freshmen, but as a group, the Freshmen are about the same height as a band of berserkers. I’m sure that there are high school principals who will agree with this.

When you get a nonsignificant test, it’s a good practice to conduct a power analysis to determine what protection you had against false negatives. For a t-test, this involves rearranging the t-test formula to solve for t_beta:

t_beta = (sqrt(n)/sd) * difference – t_alpha

The t_alpha is for the confidence you selected, in this case 90%. Then you look up the t-value you calculated to find the probability for beta. It’s a cumbersome but not difficult procedure. In this example, the calculated t_beta would have been 1.24 so the power would have been 88%. That’s not bad. Anything over 80% is usually considered acceptable.

Most statistical software will do this calculation for you. You can increase power by increasing the sample size or the acceptable Type 1 error rate (decrease the confidence) before conducting the test.

So if everything were the same (i.e., mean of students = 66 inches, standard deviation = 5.3 inches) except that you had collected 30 samples instead of 10 samples:

t-value = (69 inches – 66 inches) / (5.3 inches / (√30 samples))

t-value = 3.10

t_{(2-tailed, 90% confidence, 29 degrees of freedom)} = 1.699

If you had collected 100 samples:

t-value = (69 inches – 66 inches) / (5.3 inches / (√100 samples))

t-value = 5.66

t_{(2-tailed, 90% confidence, 99 degrees of freedom)} = 1.660

These comparisons are both significant, and might look something like this:

1-pop nondir sig

More samples give you better resolution.

kitten-exploring-bookshelf

Directional Tests

Now say, in a different reality, you know that many of those freshmen boys grew up on farms and they’re pretty buff. You even think that they might just be taller than the Vikings of a millennia ago. Therefore, your research hypothesis is that Freshmen are likely to be taller than the warfaring Vikings. The null hypothesis you want to test is:

Height of Freshmen ≤ Height of Vikings

which is a directional test. If you reject the null hypothesis, the alternative hypothesis:

Height of Freshmen >Height of Vikings

is probably true of the Freshmen. Then you measure the heights of 10 freshmen and get:

72.4, 71.1, 75.4, 69.0, 75.7, 73.3, 76.0, 58.8, 70.4, 78.6

The measurements average 71.2 inches with a standard deviation of 5.3 inches. The t-value would be equal to:

(Freshmen height – Viking height) / (standard deviation / (√number of samples))

t-value = (72 inches – 69 inches) / (5.3 inches / (√10 samples))

t-value = 1.790

t-values In this comparison, the table t-value you would use is for a one-tailed (directional) test at 90% confidence for 10 samples, t_{(1-tailed, α = 0.1, 9 degrees of freedom)} = 1.383. For comparison, the value of t_{(2-tailed, 0.9 confidence, 9 degrees of freedom)}, which was used in the first example, is equal to 1.833, as is t_{(1-tailed, 0.95 confidence, 9 degrees of freedom)}. The reason is that you only have to look in half of the t-distribution area in a one-tailed test compared to a two-tailed test. That means that if you use a directional test you can have a smaller false positive rate.

The table t value you would use, t_{(1-tailed, 0.1 confidence, 9 degrees of freedom)}, is equal to 1.383. which is smaller than the calculated t-value, 1.790, so the comparison is significant. The comparison might look something like this:

1 Pop Sig Dir

In this comparison, the Freshmen are on average at least 3 inches taller than their frenzied Viking ancestors. Genetics, better diet, and healthy living win out.

But what if the farm boys averaged only 71 inches:

(Freshmen height – Viking height) / (standard deviation / (√number of samples))

t-value = (71 inches – 69 inches) / (5.3 inches / (√10 samples))

t-value = 1.193

The table t value you would use, t_{(1-tailed, 0.1 confidence, 9 degrees of freedom)}, is equal to 1.383. which is larger than the calculated t-value, 1.193, so the comparison is not significant. The comparison might look something like this:

1 Pop NonSig Dir

And that’s what one-population t-tests look like. Now for some two-population tests in Dare to Compare – Part 4.

LAST10271675

Posted in Uncategorized | Tagged confidence, degrees of freedom, effect size, hypothesis tests, Normal distribution, population, power, sample, significance, significant, statistical tests, stats with cats | 1 Comment

You Need Statistics to Make Wine

Posted on October 5, 2018 by statswithcats

Todd P Chang 10845962_10204840870808885_3322491173165553713_n The American Statistical Association has identified 146 college majors that require statistics to complete a degree.

You probably wouldn’t be surprised that statistics is required for degrees in mathematics, engineering, physics, astronomy, chemistry, meteorology, and even biology and geology. Most business-related degrees also require statistics. Agronomy degrees require statistics as do degrees in dairy science, aquatic sciences, and veterinary sciences. Degrees for medical professions such as nursing, nutrition, physical therapy, occupational health, pharmacy, and speech-language-hearing all require statistics. And, many social science degrees require statistics, including economics, psychology, sociology, anthropology, political science, education, and criminology. What may be surprising though is that statistics is required for some degrees in history, archaeology, geography, culinary science, viticulture (grape horticulture), journalism, graphic communications, library science, and linguistics. Pretty much everybody needs to know statistics.

newton_writing_wm

Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data analysis at amazon.com, barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged college degrees, college majors, statistics, stats with cats | 1 Comment

Dare to Compare – Part 2

Posted on September 22, 2018 by statswithcats

2-1 INTRO cats-big-blue-eyes-cat-animals-free-wallpapers-736x491 Part 1 of Dare to Compare summarized several fundamental topics about statistical comparisons.

Statistical comparisons, or statistical tests as they are usually called, involve populations, groups of individuals or items having some fundamental commonalities. The members of a population also have one or more characteristics, called phenomena, which are what is compared in the populations. You don’t have to measure the phenomena in every member of the population. You can take a representative sample. Statistical tests can involve one population (comparing a population phenomenon to a constant), two populations (comparing a population phenomenon the phenomenon in another population), or three or more populations. You can also compare just one phenomenon (called univariate tests) or two or more phenomena (called multivariate tests).

Parametric statistical tests compare frequency distributions, the number of times each value of the measured phenomena appears in the population. Most tests involve the Normal distribution in which the center of the distribution of values is estimated by the average, also called the mean. The variability of the distribution is estimated by the variance or the standard deviation, the square root of the variance. The mean and standard deviation are called parameters of the Normal distribution because they are in the mathematical formula that defines the form of the distribution. Formulas for statistical tests usually involve some measure of accuracy (involving the mean) divided by some measure of precision (involving the variance). Most statistical tests focus on the extreme ends of the Normal distribution, called the tails. Tests of whether population means are equal are called non-directional, two-sided, or two-tailed tests because differences in both tails of the Normal distribution are considered. Tests of whether population means are less then or greater then are called directional, one-sided, or one-tailed tests because the difference in only one tail of the Normal distribution is considered.

2-2 NORMAL Why-do-kittens-meow-so-much Statistical tests that don’t rely on the distributions of the phenomenon in the populations are called nonparametric tests. Nonparametric tests often involve converting the data to ranks and analyzing the ranks using the median and the range.

The nice thing about statistical comparisons is that you don’t have to measure the phenomenon in the entire population at the same place or the same time, and you can then make inferences about groups (populations) instead of just individuals or items. What may even be better is that if you follow statistical testing procedures, most people will agree with your findings.

Now for even more …

Process

There are just a few more things you need to know before conducting statistical comparisons.

You start with a research hypothesis, a statement of what you expect to find about the phenomenon in the population. From there, you create a null hypothesis that translates the research hypothesis into a mathematical statement about the opposite of the research hypothesis. Statistical comparisons are sometimes called hypothesis tests. The null hypothesis is usually also written in term of no change or no difference. For example, if you expect that the average heights of students in two school districts will be different because of some demographic factors (your research hypothesis), then your null hypothesis would be that the means of the two populations are equal.

2-3 HYPOTHESES When you conduct a statistical test, the result does not mean you prove your hypothesis. Rather, you can only reject or fail to reject the null hypothesis. If you reject the null hypothesis, you adopt the alternative hypothesis. This would mean that it is more likely that the null hypothesis is not true in the populations. If you fail to reject the null hypothesis, it is more likely that the null hypothesis is true in the populations.

The results of statistical tests are sometimes in error, but fortunately, you have some control over the rates at which errors occur. There are four possibilities for the results of a statistical test.

True Positive – The statistical test fails to reject a null hypothesis that is true in the population.
True Negative – The statistical test rejects a null hypothesis that is false in the population.
False Positive – The statistical test rejects a null hypothesis that is true in the population. This is called a Type I error and is represented by α. The Type I error rate you will accept for a test is called the Confidence. Typically the confidence is set at 0.05, a 5% Type I error rate, although sometimes 0.10 (more acceptable error) or 0.001 (less acceptable error) are used.
False Negative – The statistical test fails to reject a null hypothesis that is false. This is called a Type II error and is represented by β. The ability of a particular comparison to avoid a Type II error is represented by 1-β and is called the Power of the test. Typically, power should be at least 0.8 for a 20% Type II error rate.

When you design a statistical test, you specify the hypotheses including the number of populations and directionality, the type of test, the confidence, and the number of observations in your representative sample of the population. From the sample, you calculate the mean and standard deviation. You calculate the test statistic and compare it to standard values in a table based on the distribution. If the test statistic is greater than the standard value, you reject the null hypothesis. When you reject the null hypothesis the comparison is said to be significant. If the test statistic is less than the standard value, you fail to reject the null hypothesis and the comparison is said to be nonsignificant. Most statistical software now provide exact probabilities, called p-values, that the null hypothesis is false so no tables are necessary.

2-4 ERRORS cat-with-kittens-e1464736782810 After you conduct the test, there are two pieces of information you need to determine – the sensitivity of the test to detect differences, called the effect size, and the power of the test. The power of the test will depend on the sample size, the confidence, and the effect size. The effect size also provides insight into whether the test results are meaningful. Meaningfulness is important because a test may be able to detect a difference far smaller than what might of interest, such as a difference in mean student heights less than a millimeter. Perhaps surprisingly, the most common reason for being able to detect differences that are too small to be meaningful is having too large a sample size. More samples are not always better.

Tests

It seems like there are hundreds of kinds of statistical tests, and in a way there are, but most are just variations of the concept of the accuracy in terms of the precision. In most tests, you calculate a test statistic and compare it to a standard. If the test statistic is greater than the standard, the difference is larger than might have been expected by chance, and is said to be statistically significant. For the most part, statistical software now reports exact probabilities for statistical tests instead of relying on manual comparisons.

Don’t worry too much about remembering formulas for the statistical tests (unless a teacher tells you to). Most testing is done using software with the test formulas already programmed. If you need a test formula, you can always search the Internet.

Tests depend on the scales of the data to be used in the statistical comparison. Usually, the dependent variable (the measurements of the phenomenon) is continuous and the independent variable (the divisions of the populations being tested) is categorical for parametric tests. Sometimes there are also grouping variables used as independent variables, called effects. In advanced designs, continuous-scale variables used as independent variables are called covariates. Some other scales of measurement for the dependent variable, like binary scales and restricted-range scales, requires special tests or test modifications.

Here are a few of the most common parametric statistical tests.

Table of tests dare to compare 2

z-Tests and t-Tests

The z-test and the t-test have similar forms relating the difference between a population mean and a constant (one-population test) or two population means (two-population test) to some measure of the uncertainty in the population(s). The difference in the tests is that a z-test is for Normally distributed populations where the variance is known and t-tests are for populations where the variance is unknown and must be estimated from the sample. t-Tests depend on the number of observations made on the sample of the population. The greater the sample size, the closer the t-test is to the z-test. Adjustments of two-population t-tests are made when the sample sizes or variances are different in the two populations. These tests can also be used to compare paired (e.g., before vs after) data.

ANOVA F-Tests

Unlike t-tests that are calculated from means and standard deviations, F-tests are calculated from variances. The formula for the one-way ANOVA F-test is:

F = explained variance / unexplained variance, or
F = between-group variability / within-group variability, or
F = Mean square for treatments / Mean square for error

These are all equivalent. Also, as it turns out, F = t².

χ² Tests

The chi-squared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in mutually exclusive categories of a contingency table. The test statistic is the square of the observed frequency minus the expected frequency divided by the expected frequency.

Nonparametric Tests

Nonparametric tests are also called distribution-free tests because they don’t rely on any assumptions concerning the frequency distribution of the test measurements. Instead, the tests use ranks or other imposed orderings of the data to identify differences. Here are a few of the most common nonparametric statistical tests.

Table of tests second dare to compare 2

PUBLISHED by catsmob.com

Assumptions

You make a few assumptions in conducting statistical tests. First you assume your population is real (i.e., not a phantom population) and that your samples of the population are representative of all the possible measurements. Then, if you plan to do a parametric test, you assume (and hope) that the measurements of the phenomenon are Normally distributed and that the variances are the same in all the populations being compared. The closer these assumptions are met, the more valid are the comparisons. The reason for this is that you are using Normal distributions, defined by means and variances, to represent the phenomenon in the populations. If the true distributions of the phenomenon in the populations do not exactly follow the Normal distribution, the comparison will be somewhat in error. Of course, the Normal distribution is a theoretical mathematical distribution so there is always going to be some deviation from it and real world data. Likewise with variances in multi-population comparisons. Thus, the question is always how much deviation from the assumptions is tolerable before the test becomes misleading.

Data that do not satisfy the assumptions can often be transformed to satisfy the assumptions. Adding a constant to data or multiplying data by a constant does not affect statistical tests, so transformations have to be more involved, like roots, powers, reciprocals, and logs. Box-Cox transformations are especially useful but are laborious to calculate without supporting software. Ultimately, ranks and nonparametric tests can be used in which there is no assumption about the Normal distribution.

Next, we’ll see how it all comes together …

2-8 One does not
Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data analysis at amazon.com, barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged confidence, null hypothesis, parametric, sample size, significance, statistical comparisons, statistical power, statistical tests, stats with cats | 1 Comment

Dare to Compare – Part 1

Posted on September 4, 2018 by statswithcats

IMG_9756_1509155888545_69974046_ver1.0_640_480 In school, you probably had to line up by height now and then. That wasn’t too difficult. There weren’t too many individuals being lined up and they were all in the same place at the same time. An individual’s place in line was decided by comparing his or her height to the heights of other individuals. The comparisons were visual; no measurements were made. Everyone made the same decisions about the height comparisons. You didn’t need statistics to solve the problem. So why might you ever need statistics to compare heights?

Populations

Statistics are primarily concerned about groups of individuals or items, especially those having some fundamental commonalities. These groups are called populations. Populations are more difficult to compare than pairs of individuals because you have to define the population and measure the characteristics of the phenomenon that you want to compare. Statistical comparisons, or statistical tests as they are usually called, can involve one population (comparing a population phenomenon to a constant), two populations (comparing a population phenomenon the phenomenon in another population), or three or more populations . You can also compare just one phenomenon (called univariate tests) or two or more phenomena (called multivariate tests). You can test if the phenomena are equal (called a nondirectional or two-sided test) or less then/greater then (called a directional or one-sided test).

For example, you might want to compare the heights of male high school freshmen in two different school districts. There would be a two-population test – male high school freshmen in school district 1 and male high school freshmen in school district 2. The phenomenon you want to compare is the height of the two populations. But, it’s not as easy as just visually comparing the heights of pairs of individuals because they are not located in the same place. You have to measure at least some of the heights of the individuals in the two populations.

Samples

Fortunately, you don’t have to measure every individual in the population so long as you measure a representative sample of the individuals in the populations. You can improve your chances of getting a representative sample by using the three Rs of variance control — Reference, Replication, and Randomization.

How many samples should you have? No, the answer isn’t as many as possible. Some people think the answer is 30 samples, but that’s a myth based on a misunderstood tradition. Like potato chips and middle managers, too many can be as bad as not enough. It’s a matter of resolution.

Distributions

If you were comparing two individuals, you would only be concerned with whether one height is greater than, equal to, or less than the other height. When you’re comparing populations, there’s not just one height but many, and you only know what some of the heights are (hopefully a representative sample of them). That’s where distributions come in.

In statistical testing, a frequency distribution refers to the number of times each value of the measured phenomena appears in the population. A bar chart of these values, with the values on the horizontal axis and their frequencies on the vertical axis, is called a histogram. Histograms often looks like a bell, which is why they are called bell curves.

Measured phenomena that have a histogram that looks like a bell curve have many values located at the middle of the distribution and fewer values farther away from the center, called the tails. The center of the distribution of values is estimated by the average. The variability of the values, how far they stretch along the horizontal axis, is estimated by the variance or the standard deviation, the square root of the variance.

A bell curve is usually assumed to represent a Normal distribution. The average and the variance of the values are called parameters of the distribution because they are in the mathematical formula that defines the form of the distribution.

Having a mathematical equation that you can use as a model of the frequency of phenomenon values in the population is advantageous because you can use the distribution model to represent the characteristics of the population.

Statistical Comparisons

Once you have data on the phenomenon from the representative sample of the population, you calculate descriptive statistics for the population. Statistical comparisons consider both the accuracy (i.e., the difference between the measured heights and the true heights in the population of individuals) and the precision (i.e., how consistent or variable are the heights) of the measurements of the population. Formulas for statistical tests usually involve some measure of accuracy divided by some measure of precision.

Statistical tests that compare the distributions of population characteristics are called parametric tests. They are usually based on the Normal distribution and involve using averages as measures of the center of the population distribution and standard deviations as measures of the variability of the distribution. (This is not always the case but is true most of the time.) The average and standard deviation are called test parameters. You can still test whether population means are equal (called non-directional or two-sided tests because differences in both tails of the Normal distribution are considered) or less then/greater then (called directional or one-sided tests because the difference in only one tail of the Normal distribution is considered).

Statistical tests that don’t rely on the distributions of the phenomenon in the populations are called nonparametric tests. Nonparametric tests usually involve converting the data to ranks and analyzing the ranks using the median and the range.

And, there is still a lot more to know about statistical comparisons … more to come

ELZ1_7Kittens_1a Read more about using statistics at the Stats with Cats blog. Join other fans at the Stats with Cats Facebook group and the Stats with Cats Facebook page. Order Stats with Cats: The Domesticated Guide to Statistics, Models, Graphs, and Other Breeds of Data analysis at amazon.com, barnesandnoble.com, or other online booksellers.

Posted in Uncategorized | Tagged blogs, cats, Normal distribution, population, statistical comparisons, statistical tests | Leave a comment

Catalog of Models

Posted on November 5, 2017 by statswithcats

page-cat-models-cat Whether you know it or not, you deal with models every day. Your weather forecast comes from a meteorological model, usually several. Mannequins are used to display how fashions may look on you. Blueprints are drawn models of objects or structures to be built. Maps are models of the earth’s terrain. Examples are everywhere.

Models are representations of things, usually an ideal, a standard, or something desired. They can be true representations, approximate (or at least as good as practicable), or simplified, even cartoonish compared to what they represent. They can be about the same size, bigger, or most typically, smaller, whatever makes them easiest to manipulate. They can represent:

Physical objects that can be seen and touched
Processes that can be watched
Behaviors that can be observed
Conditions that can be monitored
Opinions that can be surveyed.

The models themselves do not have to be physical objects. They can be written, drawn, or consist of mathematical equations or computer programming. In fact, using equations and computer code can be much more flexible and less expensive than building a physical model.

Stats with Cats Models 10-23-2017

Classification of Models

There are many ways that models are classified, so this catalog isn’t unique. The models may be described with different terms or broken out to greater levels of detail. Furthermore, you can also create hybrid models. Examples include mash-ups of analytical and stochastic components used to analyze phenomena such as climate change and subatomic particle physics. Nevertheless, the catalog should give you some ideas for where you might start to develop your own model.

Physical Models

Your first exposure to a model was probably a physical model like a baby pacifier or a plush animal, and later, a doll or a toy car. From then, you’ve seen many more – from ant farms to anatomical models in school. You probably even built your own models with Legos, plastic model kits, or even a Halloween costume. They are all representations of something else.

models-table

Physical models aren’t used often for advanced applications because they are difficult and expensive to build and calibrate to a realistic experience. Flight simulators, hydrographic models of river systems, and reef aquariums are well known examples.

Conceptual Models

Strat model Models can also be expressed in words and pictures. These are used in virtually all fields to convey mental images of some mechanism, process, or other phenomenon that was or will be created. Blueprints, flow diagrams, geologic fence diagrams, anatomical diagrams are all conceptual models. So are the textual descriptions that go with them. In fact, you should always start with a simple text model before you embark on building a complex physical or mathematical model.

Mathematical and Computer Models

Theoretical Models

Theoretical models are based on scientific laws and mathematical derivations. Both theoretical models and deterministic empirical models provide solutions that presume that there is no uncertainty. These solutions are termed exact (which does not necessarily imply correct). There is a single solution for given inputs.

Analytical Models

Analytical models are mathematical equations derived from scientific laws that produce exact solutions that apply everywhere. For example, F (force) = M (mass) times A (acceleration) and E(energy) = m (mass) times c² (speed of light squared) are analytical models. Probably, most concepts in classical physics can be modeled analytically.

Numerical Models

Numerical models are mathematical equations that have a time parameter. Numerical models are solved repeatedly, usually on a grid, to obtain solutions over time. This is sometimes called a Dynamic Model (as opposed to a Static Model) because it describes time-varying relationships.

Empirical Models

Empirical models can be deterministic, probabilistic, stochastic, or sometimes, a hybrid of the three. They are developed for specific situations from measured data. Empirical models differ from theoretical models in that the model is not necessarily fixed for all instances of its use. There may be multiple reasonable empirical models that can apply to a given situation.

Deterministic Models

Deterministic empirical models presume that a mathematical relationship exists between two or more measurable phenomena (as do theoretical models) that will allow the phenomena to be modeled without uncertainty (or at least, not much uncertainty, so that it can be ignored) under a given set of conditions. The difference is that the relationship isn’t unique or proven. There are usually assumptions. Biological growth and groundwater flow models are examples of deterministic empirical models

Probability Models

Probability models are based on a set of events or conditions all occurring at once. In probability, it is called an intersection of events. Probability models are multiplicative because that is how intersection probabilities are combined. The most famous example of a probability model is the Drake equation, a summary of the factors affecting the likelihood that we might detect radio-communication from intelligent extraterrestrial life

Stochastic Models

Stochastic empirical models presume that changes in a phenomenon have a random component. The random component allows stochastic empirical models to provide solutions that incorporate uncertainty into the analysis. Stochastic models include lottery picks, weather, and many problems in the behavioral, economic, and business disciplines that are analyzed with statistical models.

Comparison Models

Bombay-cat-3 In statistical comparison models, the dependent variable is a grouping-scale variable (one measured on a nominal scale). The independent variable can be either grouping, continuous, or both. Simple hypothesis tests include:

c² tests that analyze cell frequencies on one or more grouping variables, and
t-tests and z-tests that analyze independent variable means in two or fewer groups of a grouping variable.

Analysis of Variance (ANOVA) models compare independent variable means for two or more groups of a dependent grouping variable. Analysis of Covariance (ANCOVA) models compare independent variable means for two or more groups of a dependent grouping variable while controlling for one or more continuous variables. Multivariate ANOVA and ANCOVA compare two or more dependent variables using multiple independent variables. There are many more types of ANOVA model designs.

Classification Models

Classification and identification models also analyze groups.

Clustering models identify groups of similar cases based on continuous-scale variables. There need be no prior knowledge or expectation about the nature of the groups. There are several types of cluster analysis, including hierarchical clustering, K-Means clustering, two-step clustering, and block clustering. Often, the clusters or segments that are used as inputs to subsequent analyses. Clustering models are also known as segmentation models.

cute-dog-and-cat-hd-wallpaper Clustering models do not have a nominal-scale dependent variable, but most classification models do. Discriminant analysis models have a nominal-scale dependent variable and one or more continuous-scale independent variables. They are usually used to explain why the groups are different, based on the independent variables, so they often follow a cluster analysis. Logistic regression is analogous to linear regression but is based on a non-linear model and a binary or ordinal dependent variable instead of a continuous-scale variable. Often, models for calculating probabilities use a binary (0 or 1) dependent variable with logistic regression.

There are many analyses that produce decision trees, which look a bit like organization charts. C&R (Classification and Regression Trees) split categorical dependent variables into its groups based in continuous or categorical-scale independent variables. All splits are binary. CHAID (Chi-square Automatic Interaction Detector) generates decision trees that can have more than two branches at a split. A Random Forest consists of a collection of simple tree predictors.

Explanation Models

Explanation models aim to explain associations within or between sets of variables. With explanation models, you select enough variables to address all the theoretical aspects of the phenomenon, even to the point of having some redundancy. As you build the model, you discover which variables are extraneous and can be eliminated.

page-cat-models-kitten Factor Analysis (FA) and Principal Components Analysis (PCA) are used to explore associations in a set of variables where there is no distinction between dependent and independent variables. The two types of statistical analysis:

Create new metrics, called factors or components, which explain almost the same amount of variation as the original variables.
Create fewer factors/components than the original variables so further analysis is simplified.
Require that the new factors/components be interpreted in terms of the original variables, but they often make more conceptual sense so subsequent analyses are more intuitive.
Produce factors/components that are statistically independent (uncorrelated) so they can be used in regression models to determine how important each is in explaining a dependent variable.

Canonical Correlation Analysis (CCA) is like PCA only there are two sets of variables. Pairs of components, one from each group, are created that explain independent aspects of the dataset.

Regression analysis is also used to build explanation models. In particular, regression using principle components as independent variables is popular because the components are uncorrelated and not subject to multicollinearity.

Prediction Models

cat Some models are created to predict new values of a dependent variable or forecast future values of a time-dependent variable. To be useful, a prediction model must use prediction variables that cost less to generate than the prediction is worth. So the predictor variables and their scales must be relatively inexpensive and easy to create or obtain. In prediction models, accuracy tends to come easy while precision is elusive. Prediction models usually keep only the variables that work best in making a prediction, and they may not necessarily make a lot of conceptual sense.

Regression is the most commonly used technique for creating prediction models. Transformations are used frequently. If a model includes one or more lagged values of the dependent variable among its predictors, it is called an autoregressive model.

Neural Networks is a predictive modeling technique inspired by the way biological nervous systems process information. The technique involves interconnected nodes or layers that apply predictor variables in different ways, linear and nonlinear, to all or some of the dependent variable values. Unlike most modeling techniques, neural networks can’t be articulated so they are not useful for explanation purposes.

Picking the Right Model

There are many ways to model a phenomenon. Experience helps you to judge which model might be most appropriate for the situation. If you need some guidance, follow these steps.

Step 1 – Start at top of the Catalog of Models figure. Decide whether you want to create a physical, mathematical, or conceptual model. Whichever you choose, start by creating a brief conceptual model so you have a mental picture of what your ultimate goal is and can plan for how to get there.

If your goal is a physical or full blown conceptual model, do the research you’ll need to identify appropriate materials and formats. But this blog is about mathematical models, so let’s start there

Step 2 – If you want to select a type of mathematical model, start on the second line of the Catalog of Models figure and decide whether your phenomenon fits best with a theoretical or an empirical approach.

If there are scientific or mathematical laws that apply to your phenomenon, you’ll probably want to start with some type of theoretical model. If there is a component of time, particularly changes over time periods, you’ll probably want to try developing a numerical model. Otherwise, if a single solution is appropriate, try an analytical model.

Step 3 – If your phenomenon is more likely to require data collection and analysis to model, you’ll need an empirical model. An empirical model can be probabilistic, deterministic, or stochastic. Probability models are great tools for thought experiments. There are no wrong answers, only incomplete ones. Deterministic models are more of a challenge. There needs to be some foundation of science (natural, physical, environmental, behavioral, or other discipline), engineering, business rules, or other guidelines for what should go into the model. More often than not, deterministic models are overly complicated because there is no way to distinguish between components that are major factors versus those that are relatively inconsequential to the overall results. Both Probability and Deterministic models are often developed through panels of experts using some form of Delphi process.
Step 4 – If you need to develop a stochastic (statistical) model, go here to pick the right tool for the job.
Step 5 – Consider adding hybrid elements. Don’t feel constrained to only one type of component in building your model. For instance, maybe your statistical model would benefit from having deterministic, probability, or other types of terms in it. Calibrate your deterministic model using regression or another statistical method. Be creative.

Kenya-5

Posted in Uncategorized | Tagged analytical, blogs, cats, data mining, deterministic, empirical, models, numeric, probabilistic, relationships, statistical, statistics, stats with cats, stochastic, theoretical | 2 Comments

How to Describe Numbers

Posted on September 10, 2017 by statswithcats

Data cat Say you wanted to describe someone you see on the street. You might characterize their sex, age, height, weight, build, complexion, face shape, hair, mouth and lips, eyes, nose, tattoos, scars, moles, and birthmarks. Then there’s clothing, behavior, and if you’re close enough, speech, odors, and personality. Your description might be different if you’re talking to a friend or a stranger, of the same or different sex and age. Those are a lot of characteristics and they’re sometimes hard to assess. Individual characteristics aren’t always relevant and can change over time. And yet, without even thinking about it, we describe people we see every day using these characteristics. We do it mentally to remember someone or overtly to describe a person to someone else. It becomes second nature because we do it all the time.

Most people don’t describe sets of numbers very often, though, so they don’t know how easy it actually is. You have to consider only a few characteristics, all of which are fairly easy to assess and will never change for the dataset. Once you learn how, it’s hardly a challenge to get it right, unlike describing the hot young guy who just robbed a bank wearing a clown costume.

What’s involved in describing a dataset? First, before considering any descriptive statistics, you have to assess two qualities.

Phenomenon and population or sample
Measurement scale

From this information, you’ll be able to determine what descriptive statistics to calculate.

Phenomenon and Population or Sample

This is a thinking exercise; there are no calculations.

First, determine what the numbers represent. What is the phenomenon they are related to? If there’s no context for the numbers, like it’s just a dataset for a homework problem, that’s fine too. But if you know something about the data, you might be able to judge whether your answer makes sense later when the calculations are done.

download

Next, think about the population from which the data were obtained. How is the population defined? Do you have all the possible measurements or entities? If not, you have a sample of the population, hopefully a sample that is a good representation of the population. This knowledge will help you judge whether your answer makes sense and will be consistent with other samples taken from the same population. Again, if there’s no context for the numbers, that’s fine. Now, all you have to decide is whether you want to describe the population or just the sample of the population for which you have measurements. If you’re not sure, assume you want to describe the population. All the fun stuff in statistics involves populations.

Measurement Scale

1c7af8e810b9f84fe845ba738776dfa8

Scales of measurement express the phenomenon represented by the population. Simply put, scales are the ways that a set of numbers are related to each other. For example, the increments between scale values may all be identical, such as with heights and weights, or vary in size, such as with earthquake magnitudes and hurricane categories. The actual values of scales are called levels.

You have to understand the scale of measurement to describe data. There are a variety of types of measurement scales, but for describing a dataset you only need to pick from three categories:

Grouping Scales – Scales that define collections having no mathematical relationship to each other. The groups can represent categories, names, and other sets of associated attributes. These scales are also called nominal scales. They are described by counts and statistics based on counts, like percentages.
Ordered Scales – Scales that define measurement levels having some mathematical progression or order, commonly called ordinal scales. Data measured on an ordinal scale are represented by integers, usually positive. Counts and statistics based on medians and percentiles can be calculated for ordinal scales.
Continuous Scales – Scales that define a mathematical progression involving fractional levels, represented by numbers having decimal points after the integer. These scales may be called interval scales or ratio scales depending on their other properties. Any statistic can be calculated for data measured on continuous scales.

There are other scales of measurement but that’s all you’ll need at this point.

Descriptive Statistics

Now you can get on to describing a set of numbers. You’ll only need to consider four attributes – frequency, central tendency, dispersion, and shape.

Frequency refers to the number of times the level of a scale appears in a set of numbers. It is used mostly for nominal (grouping) scales and sometimes with ordinal scales. The level with the highest frequency is called the mode. Frequency is used most effectively to show how scale levels compare to each other, such as with percentages or in a histogram.

Abandoned-Kittens

Central Tendency refers to where the middle of a set of numbers is. It is used mostly for continuous (interval or ratio) scales and often with ordinal scales. There are many statistics that may be used to describe where the center of a dataset is, the most popular of which are the median and the mean. The median is the exact center of a progression-scale dataset. There are exactly the same number of data values less than and greater than the median. You determine the median by sorting the values in the dataset and counting the values from the extremes until you find the center. The mean, or average, is the center of a progression-scale dataset that is determined by a calculation. There may not be an equal number of data values less than and greater than the mean. You determine the mean by adding all the values in the dataset and dividing that sum by the number of values. The mean or the median is used in most statistical testing to find differences in data populations.

Dispersion refers to how spread out the data values are. It is used for continuous (interval or ratio) scales but only rarely with ordinal scales. There are many ways to describe data dispersion but the most popular is the standard deviation. You calculate the standard deviation by:

Subtracting the mean of a dataset from each value in the dataset
Squaring each subtracted value
Adding all the squared values
Dividing the sum of the squared values by the number of values in the dataset (if you’re describing a sample) or by the number of values in the dataset minus 1 (if you’re describing a population).

The standard deviation is used in statistical testing to find differences in data populations.

Shape refers to the frequency of the values in a dataset at selected levels of the scale, most often depicted as a graph. For ordinal scales, the graph is usually a histogram. For continuous scales, the graph is usually a probability plot, although sometimes histograms are used. Shapes of continuous scale data can be compared to mathematical models (equations) of frequency distributions. It’s like comparing a person to some well-known celebrity; they’re not identical but are similar enough to provide a good comparison. There are dozens of such distribution models, but the most commonly used is the normal distribution. The normal distribution model has two parameters – the mean and the standard deviation.

There are many other statistics that can be used to describe datasets, but most of the time, this is all you need:

Capture

For example, a nominal-scale dataset would be described by providing counts or percentages of observations in each group. An ordinal-scale dataset would be described by providing counts or percentages for each level, the median and percentiles, and ideally, a histogram. A continuous-scale dataset would be described by providing the closest distribution model and estimates of its parameters, such as “normally distributed with a mean of 10 and a standard deviation of 2.” Continuous-scale datasets can be described so succinctly because the distribution-shape specification contains so much of the telling information.

Now isn’t that a lot easier than describing that hot bank robber wearing a clown costume?

afb9cb25f3e68bf315dbab888ebfe07a

Posted in Uncategorized | 4 Comments

Visualizations versus Infographics

Posted on May 13, 2017 by statswithcats

Visualizations and infographics are both visual representations of data that are often confused. In fact, there is not a clear line of demarcation between the two. Both are informative. Both can be static or animated. Both require a knowledgeable person to create them.

VizInfo5-13-2017

Visualizations Explore

Data visualizations are created to make sense of data visually and to explore data interactively. Visualization is mostly automatic, generated through the use of data analysis software, to create graphs, plots, and charts. The visualizations can use the default settings of the software or involve Data Artistry and labeling (i.e., these Enhanced Visualizations fall in the intersection of the two circles in the figure). The processes used to create visualizations can be applied efficiently to almost any dataset. Visualizations tend to be more objective than infographics and better for allowing audiences to draw their own conclusions, although the audience needs to have some skills in data analysis. Data visualizations do not contain infographics.

Infographics Explain

Infographics are artistic displays intended to make a point using information. They are specific, elaborate, explanatory, and self-contained. Every infographic is unique and must be designed from scratch for visual appeal and overall reader comprehension. There is no software for automatically producing infographics the way there is for visualizations. Infographics are combinations of illustrations, images, text, and even visualizations designed for general audiences. Infographics are better than visualizations for guiding the conclusions of an audience but can be more subjective than visualizations.

	Visualization	Infographic
Objective	Analyze	Communicate
Audience	Some data analysis skills	General audience
Components	Points, lines, bars, and other data representations	Graphic design elements, text, visualizations
Source of Information	Raw data	Analyzed data and findings
Creation Tool	Data analysis software	Desktop publishing software
Replication	Easily reproducible with new data	Unique
Interactive or Static	Either	Static
Aesthetic Treatment	Not necessary	Essential
Interpretation	Left to the audience	Provided to the audience

REFERENCES

http://jacobjwalker.effectiveeducation.org/blog/2017/05/12/data-artistry-using-and-sharing-the-knowledge-in-an-effective-manner/

http://killerinfographics.com/blog/data-visualization-versus-infographics.html

http://killerinfographics.com/infographic-design-start-finish.html

http://www.arena-media.co.uk/blog/2012/09/whats-the-difference-between-an-infographic-and-data-visualisation/

http://www.dummies.com/programming/big-data/big-data-visualization/understanding-the-difference-between-data-visualization-and-infographics/

http://www.thefunctionalart.com/2014/03/infographics-to-reveal-visualizations.html

https://eagereyes.org/blog/2010/the-difference-between-infographics-and-visualization

https://visage.co/throwdown-data-visualization-vs-infographics/

img_8475c (1)

Posted in Uncategorized | Tagged data analysis, data artistry, infographics, statistical analysis, statistical tests, statistics, stats with cats, visualizations, writing | 3 Comments

How to Analyze Text

Posted on February 12, 2017 by statswithcats

Statisticians love to analyze numbers, but what do they do when what they want to explore is unformatted text? It happens all the time. The text may come from open cat-diary -ended responses on surveys, social networking sites, email, online reviews, public comments, notations (e.g., medical, customer relations), documents and text files, or even recorded and transcribed interactions. But before anything can happen, you have to accomplish three tasks:

Get the text into a spreadsheet or other software that you can use to manipulate it.
Break the text into analyzable fragments – letters, words, phrases, sentences, paragraphs, or whatever.
Assign properties to the text fragments

How you might complete these tasks depends on what you want to do and the software you have. Nonetheless, you’ll be surprised by how much you can do with just a spreadsheet and an internet connection if you have the time and focus. This article will show you how.

Approaches

Ther 0402a6_fd87fbc829ec41faaf10aa7aa1cbed88-mv2_d_2000_1333_s_2 e are several ways that you can analyze text. You can:

Count the occurrence of specific letters, words, or phrases, often summarized as Word Clouds. There are quite a few free web sites that will help you construct word clouds.
Categorize text by key themes, topics, or commonalities, called Text Mining.
Classify attitudes, emotions, and opinions of a source toward some topic, called Sentiment Analysis or opinion mining. There are many applications of sentiment analysis in business, marketing, customer management, political science, law, sociology, psychology, and communications.
Explore relationships between words using a Word Net. The relationships can reflect definitions or other commonalities.

Some of these analyses can be performed using free web apps, others, require special software.

Specialized Software

Some text analytics can be performed manually, but it is a time consuming process so having software can be crucial. Unfortunately, the biggest and best software is proprietary, like SAS and SPSS, and costs a lot. There are also free and low-cost alternatives, as well as free web sites that preform less sophisticated analyses. There are a lot of software options so there are probably a lot of people analyzing text. Let Google be your guide.

Manual Analyses

Even if you don’t have access to specialized software for text analyses, you can also still perform two types of analyses with nothing more than a spreadsheet program and an internet connection. You can count the number of times that a letter, word, or phrase appears in a text passage. Word frequency turns out to be relatively easy to produce but once you have the counts, the analysis and interpretation may be a bit more challenging. You can also do simple topic analyses or sentiment analyses. Parsing the sentences or sentence fragments and analyzing them is straightforward but time consuming, though the interpretation is usually easier.

Word Counts

If you are just looking for keywords or counting words for some diagnostic purpose, you’ll find that it’s not that difficult. Here’s how to do word counts.

Step 1 – Find the text you want to analyze.

This is usually easy except for there being so many choices. You have to start with an electronic file. If you have hard copy, you’ll have to sc
an it and correct the errors. If you have text from separate sources, you’ 1399360333213 ll want to aggregate them to make things easier. If you have text on a website, you can usually highlight it and copy it using <ctrl-C>. If the passage is long, you can use <ctrl-A> to select everything before copying it, but you’ll have to edit out the extraneous material. You can do these operations in most word processors.

Step 2 – Scrub the data

You should scrub the text to be sure you’ll be counting the correct things. Take out entries that aren’t part of the flow of the text, like footnotes and section numbers. Correct misspellings. Take out punctuation that might become associated with words, like em dashes.

Step 3 – Count the words.

The quickest way to count words is to go to an Internet site for that purpose. Just copy your scrubbed text, paste it into the box on the site, and press submit. You’ll get a column of words and their frequencies. Parse the numbers from the text and you’re ready to analyze the data. It’s a good idea to review the results of the counting to be sure no errors have crept into the process.

Another way to do this solely in a spreadsheet is to replace all the punctuation with blanks and then replace the blanks with paragraph marks. This will give you a column of words. Copy it and remove the duplicates then you can use a formula to count each word.

Once you have the counts, the analysis is up to you. You can compare word statistics from different sources or analyze word frequencies within a single source. The possibilities are endless. Interpretation is another matter. Here are some examples.

table

One thing you can do with word counts is to produce a word cloud. There are many web sites that will generate these graphics. My favorite is Wordle, but be advised, you have to use Internet Explorer for it to work. Here’s an example of a word cloud produced with Wordle.

wordle1

Text Mining

Topic or Sentiment Analyses are straightforward but more time consuming than word counts. Unless you are analyzing text for work or school, relax and turn on Netflix. This isn’t very sophisticated, but it’ll take a while and you’ll need frequent breaks to maintain your focus.

There are six steps.

Step 1 – Get the Data into a Spreadsheet

As with word counts, you have to get the text file into a text manager, preferably a spreadsheet. Highlight your text or use <ctrl A> and then <ctrl C> and <ctrl V>. You’ll need to parse any block text into sentences or whatever length fragment you want to analyze. You can usually do this by replacing periods with paragraph marks. Start with a small dataset, perhaps fewer than fifty fragments, until you get used to the process.

Step 2 – Scrub the Responses

Format the fragments into a single column with one fragment per row. Delete extraneous fragments. Don’t worry about misspellings and punctuation. If you make a mistake, <ctrl Z> will undo it.

Step 3 – Assign Descriptors

In a column next to the column with the fragments, enter your first descriptor. It can be a keyword, theme, sentiment, length, or whatever you want to analyze. Unless you have predetermined descriptors you are looking for, don’t worry too much about the descriptors you use. You’ll review and edit them in the next step.

Step 4 – Count the Fragments Assigned to Each Descriptor

When you count the fragments assigned to each descriptor, you’ll probably find a few descriptors with only a few fragments. Consider combining them with other descriptors. When you’re satisfied with the assignments, you might want to subdivide the descriptor groups with another set of descriptors.

Step 5 – Repeat Steps 3 and 4

You can repeat the last two steps as many times as you feel is necessary. You can use these hierarchical descriptor groups to characterize subsets of the text so don’t have too many or too few fragments in each descriptor group. When you’re done, your data set would look something like this.

spreadsheet

If you have a predetermined set of descriptors, you can assign each one to a column of the spreadsheet and code them as 0 or 1 for presence or absence.

Step 6 – Analyze

Once you have built your data set, you can analyze it statistically by counts and percentages, or graphically using word clouds. Consider this example. On December 29, 2016, Tanya Lynn Dee asked the question on her Facebook page, “Without revealing your actual age, what [is] something you remember that if you told a younger person they wouldn’t understand?” There were over 1,000 responses (at the time I saw the post), which I copied and classified into common themes. The results are here.

To learn more about analyzing text for its sentiment, read Sentiment Analysis
nearly everything you need to know by MonkeyLearn.

So, try analyzing some text (and other things) at home. You won’t need parental supervision.

cat-news3

Posted in Uncategorized | Tagged data mining, keywords, sentiment analysis, stats with cats, text analysis, text mining, word cloud, word count, wordle, writing | 6 Comments

Top 50 Statistics Blogs And Websites on the Web

Posted on January 28, 2017 by statswithcats

Number 28

Reading Stats with Cats

Posted in Uncategorized | Tagged blogs, cats, statistics, stats with cats, writing | 1 Comment

Two Population t-Tests

Example

ANOVAs

Types of ANOVA

One-Way ANOVAs

Multi-Way ANOVAs

Other Types of ANOVA

Now What?

Normal Distributions

One Population z-Test or t-Test

Example

Nondirectional Tests

Directional Tests

Process

Tests

z-Tests and t-Tests

ANOVA F-Tests

χ2 Tests

Nonparametric Tests

Assumptions

Populations

Samples

Distributions

Statistical Comparisons

Classification of Models

Physical Models

Conceptual Models

Mathematical and Computer Models

Theoretical Models

Analytical Models

Numerical Models

Empirical Models

Deterministic Models

Probability Models

Stochastic Models

Comparison Models

Classification Models

Explanation Models

Prediction Models

Picking the Right Model

Phenomenon and Population or Sample

Measurement Scale

Descriptive Statistics

Visualizations Explore

Infographics Explain

REFERENCES

Approaches

Specialized Software

Manual Analyses

Word Counts

Step 1 – Find the text you want to analyze.

Step 2 – Scrub the data

Step 3 – Count the words.

Text Mining

Step 1 – Get the Data into a Spreadsheet

Step 2 – Scrub the Responses

Step 3 – Assign Descriptors

Step 4 – Count the Fragments Assigned to Each Descriptor

Step 5 – Repeat Steps 3 and 4

Step 6 – Analyze

DISCLAIMER

Recent Posts

Archives

RSS Links

Feedburner

Follow Blog via Email

Recent Posts from: Random TerraBytes

Meta

χ² Tests