thr3ads.net - R help - [R] normal distribution assumption for multi-level modelling [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Cecile De Cat

2012-Apr-18 15:21 UTC

[R] normal distribution assumption for multi-level modelling

Hello,

I'm analysing reaction time data from a linguistic experiment (a variant of
a lexical decision task).   To ascertain that the data was normally
distributed, I used *shapiro.test *for each participant (see commands
below), but only one out of 21 returns a p value above p.0 05.
> f = function(dfr) return(shapiro.test(dfr$Target.RTinv)$p.value)
> p = as.vector(by(newdat, newdat$Subject, f))
> names(p) = levels(newdat$Subject)
> names(p[p < 0.05])
Removing a few outliers per subject doesn't make a difference, and
"aggressive" removal of outliers (done by subject, for each of the 6
conditions ) still results in non-normally distributed data by subject.

Does this invalidate any attempt at multi-level modelling?

Many thanks in advance for your help.

Cecile

	[[alternative HTML version deleted]]

Ben Bolker

2012-Apr-18 18:01 UTC

head link

[R] normal distribution assumption for multi-level modelling

Cecile De Cat <c.decat <at> leeds.ac.uk> writes:
> I'm analysing reaction time data from a linguistic experiment (a
variant of
> a lexical decision task).   To ascertain that the data was normally
> distributed, I used *shapiro.test *for each participant (see commands
> below), but only one out of 21 returns a p value above p.0 05.
> 
> > f = function(dfr) return(shapiro.test(dfr$Target.RTinv)$p.value)
> > p = as.vector(by(newdat, newdat$Subject, f))
> > names(p) = levels(newdat$Subject)
> > names(p[p < 0.05])
> 
> Removing a few outliers per subject doesn't make a difference, and
> "aggressive" removal of outliers (done by subject, for each of
the 6
> conditions ) still results in non-normally distributed data by subject.
> 
> Does this invalidate any attempt at multi-level modelling?
  I don't think so.

  1. You should be concerned about the normality the *residuals* of
your response variable, i.e. the conditional distribution of your data
(or if you only have categorical predictors you could equivalently
look *within* the smallest sampling unit where you expect a constant
mean), not the marginal distribution of the data.

  2. Many statisticians would say you shouldn't be doing hypothesis
tests of normality for this purpose in any case; if you have little
data the tests have low power (so you won't detect non-normal data),
while if you have a great deal the tests can be *too* powerful
(i.e. you detect significant deviations of normality which do not
actually compromise the inferences you would be making from your
analysis).  I don't have a great citation for this handy, but
one is listed below (Cherry 1998).

  3. You're not applying any multiple-comparisons correction, so
getting 1/20 (let alone out of 1/21) p values <0.05 is exactly
as expected if the null hypothesis were true.

  Follow-ups to r-sig-mixed-models <at> r-project.org, although
this issue (hypothesis testing as a way to validate the statistical
assumptions of a model) is not specific to mixed models.

@article{cherry_statistical_1998,
	title = {Statistical Tests in Publications of The Wildlife Society},
	volume = {26},
	issn = {0091-7648},
	url = {http://www.jstor.org/stable/3783574},
	number = {4},
	journal = {Wildlife Society Bulletin},
	author = {Cherry, Steve},
	month = dec,
	year = {1998},
	pages = {947--953}
}

Bert Gunter

2012-Apr-18 18:55 UTC

head link

[R] normal distribution assumption for multi-level modelling

Cecile:

On Wed, Apr 18, 2012 at 8:21 AM, Cecile De Cat <c.decat at leeds.ac.uk>
wrote:> Hello,
>
> I'm analysing reaction time data from a linguistic experiment (a
variant of
> a lexical decision task). ? To ascertain that the data was normally
> distributed, I used *shapiro.test *for each participant (see commands
> below), but only one out of 21 returns a p value above p.0 05.
>
>> f = function(dfr) return(shapiro.test(dfr$Target.RTinv)$p.value)
>> p = as.vector(by(newdat, newdat$Subject, f))
>> names(p) = levels(newdat$Subject)
>> names(p[p < 0.05])
>
> Removing a few outliers
!! Yikes!! I won't say "Don't do this." But I will say that
this can
be a very dangerous and unscientific thing to do, leading to biased,
misleading results.

 per subject doesn't make a difference, and> "aggressive" removal of outliers (done by subject, for each of
the 6
> conditions ) still results in non-normally distributed data by subject.
>
> Does this invalidate any attempt at multi-level modelling?
How can we possibly know without knowing in detail the objectives of
the investigation, the nature of the data, and the details of the
analysis you did??!

On general principles, normality is rarely of any real importance;
lack of independence (or, in general, non-adherence to the covariance
structures specified) usually is.  So "any attempt" seems too general
a claim to support. Indeed, a good graphical analysis -- often the
most scientifically informative thing to do anyway -- is almost always
a good thing to do.

As this has little to do with R, you should follow up on a statistical
list, like stats.stackexchange.com .

-- Bert>
> Many thanks in advance for your help.
>
> Cecile
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Cecile De Cat

2012-Apr-19 13:49 UTC

head link

[R] normal distribution assumption for multi-level modelling

Thanks. ??I appreciate this isn't strictly an R question and will
pursue on another list.

The procedure I followed was inspired from
@article{
? ?Author = {Baayen, R. Harald and Milin, Petar},
? ?Title = {Analysing Reaction Times},
? ?Journal = {International Journal of Psychological Research},
? ?Volume = {3},
? ?Number = {2},
? ?Pages = {12--28},
? ? ? Year = {2010} }

Best,

Cecile


On 18 April 2012 19:55, Bert Gunter <gunter.berton at gene.com>
wrote:>
> Cecile:
>
> On Wed, Apr 18, 2012 at 8:21 AM, Cecile De Cat <c.decat at
leeds.ac.uk> wrote:
> > Hello,
> >
> > I'm analysing reaction time data from a linguistic experiment (a
variant of
> > a lexical decision task). ? To ascertain that the data was normally
> > distributed, I used *shapiro.test *for each participant (see commands
> > below), but only one out of 21 returns a p value above p.0 05.
> >
> >> f = function(dfr) return(shapiro.test(dfr$Target.RTinv)$p.value)
> >> p = as.vector(by(newdat, newdat$Subject, f))
> >> names(p) = levels(newdat$Subject)
> >> names(p[p < 0.05])
> >
> > Removing a few outliers
>
> !! Yikes!! I won't say "Don't do this." But I will say
that this can
> be a very dangerous and unscientific thing to do, leading to biased,
> misleading results.
>
> ?per subject doesn't make a difference, and
> > "aggressive" removal of outliers (done by subject, for each
of the 6
> > conditions ) still results in non-normally distributed data by
subject.
> >
> > Does this invalidate any attempt at multi-level modelling?
>
> How can we possibly know without knowing in detail the objectives of
> the investigation, the nature of the data, and the details of the
> analysis you did??!
>
> On general principles, normality is rarely of any real importance;
> lack of independence (or, in general, non-adherence to the covariance
> structures specified) usually is. ?So "any attempt" seems too
general
> a claim to support. Indeed, a good graphical analysis -- often the
> most scientifically informative thing to do anyway -- is almost always
> a good thing to do.
>
> As this has little to do with R, you should follow up on a statistical
> list, like stats.stackexchange.com .
>
> -- Bert
> >
> > Many thanks in advance for your help.
> >
> > Cecile
> >
> > ? ? ? ?[[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
>
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Apr 2012 - normal distribution assumption for multi-level modelling

[R] normal distribution assumption for multi-level modelling

[R] normal distribution assumption for multi-level modelling

[R] normal distribution assumption for multi-level modelling

[R] normal distribution assumption for multi-level modelling

Apparently Analagous Threads