So let's imagine you're doing doing something fun...like parametric statistics. And suddenly you're worried about those darn assumptions! Is it normal? Homogeneous? Are there zeros?

In reality the normality assumption is probably not the biggest one (pay attention to that homogeneity of variance!), but its still something to be aware of and I thought I'd show an approach using QQ plots.

But first... what are some of these options for deciding if you're normal?

1) there are tests...(Bartlett's, Levene's, etc.),

2) make a QQ plot and eyeball it,

3) make a histogram and eyeball it...

....maybe there are more, but those are the three I can think of right now.

Since the second two involve "eye-balling" it, its no wonder folks who are not trained as statisticians end up doing the tests! Often we're doing the stats because we imagine them to be objective, and therefore the thought of adding back in subjective "eyeballs" doesn't sit well.

I gotta say, I don't like the tests! They seem overly stringent.... and can even be challenging to get normal data to pass those tests! And I just don't like the idea of testing assumptions without looking at the data, which tends to happen when you get in the habit of running tests.

So what to do instead? ...

In reality the normality assumption is probably not the biggest one (pay attention to that homogeneity of variance!), but its still something to be aware of and I thought I'd show an approach using QQ plots.

But first... what are some of these options for deciding if you're normal?

1) there are tests...(Bartlett's, Levene's, etc.),

2) make a QQ plot and eyeball it,

3) make a histogram and eyeball it...

....maybe there are more, but those are the three I can think of right now.

Since the second two involve "eye-balling" it, its no wonder folks who are not trained as statisticians end up doing the tests! Often we're doing the stats because we imagine them to be objective, and therefore the thought of adding back in subjective "eyeballs" doesn't sit well.

I gotta say, I don't like the tests! They seem overly stringent.... and can even be challenging to get normal data to pass those tests! And I just don't like the idea of testing assumptions without looking at the data, which tends to happen when you get in the habit of running tests.

So what to do instead? ...

Well... both histograms and QQ plots are great. In most cases keep in mind that you probably want to use the residuals from a statistical model rather than with your raw data, when you assessing normality. Its a bit complicated to explain and I'm not sure I fully understand it myself, but using residuals should not steer you wrong, while using raw data to test these assumptions... could.

I like QQ plots and so I'm going to show a cool little function that makes them a bit more useful (at least I feel so).

QQ plots are a means of comparing two distributions. So as a test of normality, you can plot your data (or its distribution) against a normal distribution and see if they match. If the two distributions are similar the points on a QQ plot will fall along the

So how far away from that line can they be? Where is the transition from being "eh, pretty normal" to "nope...no way that's normal"? It is quite challenging to find any guidelines, which means we're left with a nice visual tool for testing normality, but little or no means of assessing how far from perfectly normal we can be and still be reasonable.

"Well... how do we interpret that?"

Well....what if we made a QQ plot using data actually drawn from a normal distribution with the same variance and sample size as our data set and then compared that "normal" QQ plot to the QQ plot drawn with our data. As its randomly drawn from a normal distribution, with a presumably rather small sample size, the "normal" data won't fall perfectly on the

But what if we did that 8 times? Then we'd get a pretty good sense of what we might expect a normal QQ plot to look like. And if we can't identify our QQ plot from a sea of "normal" QQ plots, our data is probably sufficiently normal.

And I actually I have a little R function that does just that...let's see it in action.

I like QQ plots and so I'm going to show a cool little function that makes them a bit more useful (at least I feel so).

**What are QQ plots?**QQ plots are a means of comparing two distributions. So as a test of normality, you can plot your data (or its distribution) against a normal distribution and see if they match. If the two distributions are similar the points on a QQ plot will fall along the

*y=x*line (unity). If they don't...well then they don't. And typically "real" data aren't going to fall right on that*y=x*line!So how far away from that line can they be? Where is the transition from being "eh, pretty normal" to "nope...no way that's normal"? It is quite challenging to find any guidelines, which means we're left with a nice visual tool for testing normality, but little or no means of assessing how far from perfectly normal we can be and still be reasonable.

"Well... how do we interpret that?"

Well....what if we made a QQ plot using data actually drawn from a normal distribution with the same variance and sample size as our data set and then compared that "normal" QQ plot to the QQ plot drawn with our data. As its randomly drawn from a normal distribution, with a presumably rather small sample size, the "normal" data won't fall perfectly on the

*y=x*line, but it will give us a sense of where actual "normal" data might fall. That would be a useful comparison.But what if we did that 8 times? Then we'd get a pretty good sense of what we might expect a normal QQ plot to look like. And if we can't identify our QQ plot from a sea of "normal" QQ plots, our data is probably sufficiently normal.

And I actually I have a little R function that does just that...let's see it in action.

**First...here's the function..**To see how it works, let's create some data...say 50 data points and plot it to see what it looks like.

Plot of the data.

For simplicity let's just run a quick linear model using that data (named it "mod") and look at the QQ plot (its actually the second plot produced when you "plot" the model).

Hmmm...so the data fall on the line on the QQ plot, but then both ends tail off a bit. Is this ok? Does that mean the normality assumption is violated?

The

Just type qqfunc and apply it to the model you just ran! Simple.

The

*qqfunc*to the rescue.Just type qqfunc and apply it to the model you just ran! Simple.

And the plot output is what is shown below. A set of 9 QQ plots. One of them is ours, the other 8 are drawn randomly from a normal distribution with the same variance and sample size as our data. Can you pick out our QQ plot? In case you can't, the number for the plot featuring our data is printed to the R console. In this case, #7, as shown above.

So...I might actually run the function a couple times, just to check more "normal" QQ plots, but while not bad, this data doesn't look great on the normality front (1, 6, 9, look pretty good). I would use a bit of caution and check a few more runs of the qqfunc. If you are able, without much trouble to pick your data out of the line-up you might want to try a transformation and rerun the function.

So there you have it...a means of using QQ plots to detect normality... don't have too much fun with it.

So there you have it...a means of using QQ plots to detect normality... don't have too much fun with it.