thr3ads.net - R help - [R] Normality tests on groups of rows in a data frame, grouped based on content in other columns [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Joel Fürstenberg-Hägg

2011-Oct-30 18:07 UTC

[R] Normality tests on groups of rows in a data frame, grouped based on content in other columns

Dear R users,

I have a data frame in the form below, on which I would like to make normality
tests on the values in the ExpressionLevel column.
> head(df)  ID Plant  Tissue  Gene ExpressionLevel
1  1 p1     t1      g1   366.53
2  2 p1     t1      g2     0.57
3  3 p1     t1      g3    11.81
4  4 p1     t2      g1   498.43
5  5 p1     t2      g2     2.14
6  6 p1     t2      g3     7.85

I would like to make the tests on every group according to the content of the
Plant, Tissue and Gene columns.

My first problem is how to run a function for all these sub groups.
I first thought of making subsets:

group1 <- subset(df, Plant=="p1" & Tissue=="t1" &
Gene=="g1")
group2 <- subset(df, Plant=="p1" & Tissue=="t1" &
Gene=="g2")
group3 <- subset(df, Plant=="p1" & Tissue=="t1" &
Gene=="g3")
group4 <- subset(df, Plant=="p1" & Tissue=="t2" &
Gene=="g1")
group5 <- subset(df, Plant=="p1" & Tissue=="t2" &
Gene=="g2")
group6 <- subset(df, Plant=="p1" & Tissue=="t2" &
Gene=="g3") etc...

But that would be very time consuming and I would like to be able to use the
code for other data frames...
I have also tried to store these in a list, which I am looping through, running
the tests, something like this:

alist=list(group1, group2, group3, group4, group5, group6)
for(i in alist)
{
  print(shapiro.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel))
  print(pearson.test(i$ExpressionLevel, adjust=FALSE))
}

But, there must be an easier and more elegant way of doing this... I found the
example below at
http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop.
I think might be used for the printing of the results, but I do not know how to
adjust for my data frame, since the functions are applied on several columns
instead of certain rows in one column.

DF <- data.frame(A = rnorm(100), B = rlnorm(100))

obj2 <- lapply(DF, shapiro.test)

tab2 <- lapply(obj, function(x) c(W = unname(x$statistic), p.value =
x$p.value))
tab2 <- data.frame(do.call(rbind, tab2))
printCoefmat(tab2, has.Pvalue = TRUE)

Finally, I have found several different functions for testing for normality, but
which one(s) should I choose? As far as I can see in the help files they only
differ in the minimum number of samples required.

Thanks in advance!

Kind regards,

Joel






	[[alternative HTML version deleted]]

Dennis Murphy

2011-Oct-30 20:11 UTC

head link

[R] Normality tests on groups of rows in a data frame, grouped based on content in other columns

Hi:

Here are a few ways (untested, so caveat emptor):

# plyr package
library('plyr')
ddply(df, .(Plant, Tissue, Gene), summarise, ntest
shapiro.test(ExpressionLevel))

# data.table package
library('data.table')
dt <- data.table(df, key = 'Plant, Tissue, Gene')
dt[, list(ntest = shapiro.test(ExpressionLevel)), by = key(dt)]

# aggregate() function
aggregate(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN shapiro.test)

# doBy package:
summaryBy(ExpressionLevel ~ Plant + Tissue + Gene, data = df, FUN shapiro.test)

There are others, too...

HTH,
Dennis

2011/10/30 Joel F?rstenberg-H?gg <joelf at
life.ku.dk>:> Dear R users,
>
> I have a data frame in the form below, on which I would like to make
normality tests on the values in the ExpressionLevel column.
>
>> head(df)
> ?ID Plant ?Tissue ?Gene ExpressionLevel
> 1 ?1 p1 ? ? t1 ? ? ?g1 ? 366.53
> 2 ?2 p1 ? ? t1 ? ? ?g2 ? ? 0.57
> 3 ?3 p1 ? ? t1 ? ? ?g3 ? ?11.81
> 4 ?4 p1 ? ? t2 ? ? ?g1 ? 498.43
> 5 ?5 p1 ? ? t2 ? ? ?g2 ? ? 2.14
> 6 ?6 p1 ? ? t2 ? ? ?g3 ? ? 7.85
>
> I would like to make the tests on every group according to the content of
the Plant, Tissue and Gene columns.
>
> My first problem is how to run a function for all these sub groups.
> I first thought of making subsets:
>
> group1 <- subset(df, Plant=="p1" & Tissue=="t1"
& Gene=="g1")
> group2 <- subset(df, Plant=="p1" & Tissue=="t1"
& Gene=="g2")
> group3 <- subset(df, Plant=="p1" & Tissue=="t1"
& Gene=="g3")
> group4 <- subset(df, Plant=="p1" & Tissue=="t2"
& Gene=="g1")
> group5 <- subset(df, Plant=="p1" & Tissue=="t2"
& Gene=="g2")
> group6 <- subset(df, Plant=="p1" & Tissue=="t2"
& Gene=="g3") etc...
>
> But that would be very time consuming and I would like to be able to use
the code for other data frames...
> I have also tried to store these in a list, which I am looping through,
running the tests, something like this:
>
> alist=list(group1, group2, group3, group4, group5, group6)
> for(i in alist)
> {
> ?print(shapiro.test(i$ExpressionLevel))
> ?print(pearson.test(i$ExpressionLevel))
> ?print(pearson.test(i$ExpressionLevel, adjust=FALSE))
> }
>
> But, there must be an easier and more elegant way of doing this... I found
the example below at
http://stackoverflow.com/questions/4716152/why-do-r-objects-not-print-in-a-function-or-a-for-loop.
I think might be used for the printing of the results, but I do not know how to
adjust for my data frame, since the functions are applied on several columns
instead of certain rows in one column.
>
> DF <- data.frame(A = rnorm(100), B = rlnorm(100))
>
> obj2 <- lapply(DF, shapiro.test)
>
> tab2 <- lapply(obj, function(x) c(W = unname(x$statistic), p.value =
x$p.value))
> tab2 <- data.frame(do.call(rbind, tab2))
> printCoefmat(tab2, has.Pvalue = TRUE)
>
> Finally, I have found several different functions for testing for
normality, but which one(s) should I choose? As far as I can see in the help
files they only differ in the minimum number of samples required.
>
> Thanks in advance!
>
> Kind regards,
>
> Joel
>
>
>
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Oct 2011 - Normality tests on groups of rows in a data frame, grouped based on content in other columns

[R] Normality tests on groups of rows in a data frame, grouped based on content in other columns

[R] Normality tests on groups of rows in a data frame, grouped based on content in other columns

Apparently Analagous Threads