aolinto_r@bignet.com.br
2004-Apr-25 15:52 UTC
[R] nonparametric multiple sample comparison
Hello all,
Here goes one of my first functions.
I want to make a nonparametric multiple sample comparison with unequal sample
sizes (see Zar?s Biostatistical Analysis, 3rd. Ed., pg. 201 Example 10.11, pg.
288 Example 11.10). In the real world, I want to compare samples of fish
length captured with different fishing gears.
After using the Kruskal-Wallis test I want to check the differences between
each two samples.
I wrote the function multcomp (see below) and it?s working OK but I still have
some doubts. To use it one must have a two-column dataframe with groups and
measurements.
1) In line 20 (Results <- data.frame....) I create a dataframe to store the
final
results. It has a fixed row number but it would be better to have a variable
number, i.e., the number of combination of the groups taken 2 by 2 (4 groups =
5 combinations, 6 groups = 15 combinations). Which function in R returns the
number of combinations? I couldn?t find it!
2) In lines 35 to 39 I print the result tables. How can I make a
"clean"
print, without row numbers and quotation marks?
3) Also I?d like to calculate the critical values "q" and p-values but
I
couldn?t find the distribution indicated in Zar?s Table B.15
(App107) "Critical Values of Q for Nonparametric Multiple Comparison
Testing".
Is it possible to calculate these numbers in R?
Thanks in advance for any help or comments to improve this function.
Best regards,
Antonio Olinto
Sao Paulo Fisheries Institute
Brazil
multcomp <- function(DataSet) {
dat.multcomp <- DataSet
names(dat.multcomp) <- c("VarCat","VarNum")
attach(dat.multcomp)
dat.multcomp$Rank <- rank(VarNum)
attach(dat.multcomp)
RankList <- aggregate(Rank,list(Rank=Rank),FUN=length)
t <- length(RankList$Rank)
st <- 0
for (i in 1:t) if (RankList[i,2]>1) st<-st+(RankList[i,2]^3-RankList[i,2])
LevCat <- levels(dat.multcomp$VarCat)
NLevCat <- aggregate(VarCat,list(LevCat=VarCat),FUN=length)
RLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=sum)
MLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=mean)
SampleSummary <- data.frame(LevCat,RLevCat[,2],NLevCat[,2],MLevCat[,2])
names(SampleSummary)<-c("Samples","RSum","N","RMean")
SampleSummary <- SampleSummary[order(SampleSummary$RMean,decreasing=T),]
NCat <- length(LevCat)
N <- length(dat.multcomp$VarCat)
Results <- data.frame(rep(NA,6),rep(NA,6),rep(NA,6),rep(NA,6))
names(Results) <-
c("Comparison","Difference","SE","Q")
l <- 1
for (i in 1:(NCat-1)) {
for (j in NCat:(i+1)) {
SE <- sqrt(((N*(N+1)/12)-(st/(12*(N-1))))*((1/SampleSummary[i,3])+
(1/SampleSummary[j,3])))
Dif <- SampleSummary[i,4]-SampleSummary[j,4]
Q=Dif/SE
Results[l,1] <- paste(SampleSummary[i,1],"vs",SampleSummary[j,1])
Results[l,2] <- round(Dif,4)
Results[l,3] <- round(SE,4)
Results[l,4] <- round(Q,4)
l <-l+1
}
}
print("Sample summary ranked by mean ranks")
print(SampleSummary)
print("")
print("Table of multiple comparisons")
print(Results)
}
==========
Data example taken from Zar (Biostatiscical Analysis):
dat.limno <- data.frame(scan(what = list (Pound=" ", pH=0),
sep=" "))
1 7.68
1 7.69
1 7.7
1 7.7
1 7.72
1 7.73
1 7.73
1 7.76
2 7.71
2 7.73
2 7.74
2 7.74
2 7.78
2 7.78
2 7.8
2 7.81
3 7.74
3 7.75
3 7.77
3 7.78
3 7.8
3 7.81
3 7.84
4 7.71
4 7.71
4 7.74
4 7.79
4 7.81
4 7.85
4 7.87
4 7.91
kruskal.test(pH~Pound)
Kruskal-Wallis rank sum test
data
Kruskal-Wallis chi-squared = 11.9435, df = 3, p-value = 0.007579
multcomp(dat.limno)
[1] "Sample summary ranked by mean ranks"
Samples RSum N RMean
3 3 145.0 7 20.71429
4 4 163.5 8 20.43750
2 2 132.5 8 16.56250
1 1 55.0 8 6.87500
[1] ""
[1] "Table of multiple comparisons"
Comparison Difference SE Q
1 3 vs 1 13.8393 4.6923 2.9493
2 3 vs 2 4.1518 4.6923 0.8848
3 3 vs 4 0.2768 4.6923 0.0590
4 4 vs 1 13.5625 4.5332 2.9918
5 4 vs 2 3.8750 4.5332 0.8548
6 2 vs 1 9.6875 4.5332 2.1370
-------------------------------------------------
WebMail Bignet - O seu provedor do litoral
www.bignet.com.br
I have not attempted to follow this in detail but a few comments
are interspersed within your discussion and the code:
<aolinto_r <at> bignet.com.br> writes:
:
: Hello all,
:
: Here goes one of my first functions.
:
: I want to make a nonparametric multiple sample comparison with unequal
sample
: sizes (see Zar伮抯 Biostatistical Analysis, 3rd. Ed., pg. 201 Example 10.11,
pg.
: 288 Example 11.10). In the real world, I want to compare samples of fish
: length captured with different fishing gears.
:
: After using the Kruskal-Wallis test I want to check the differences between
: each two samples.
:
: I wrote the function multcomp (see below) and it伮抯 working OK but I still
have
: some doubts. To use it one must have a two-column dataframe with groups and
: measurements.
:
: 1) In line 20 (Results <- data.frame伮....) I create a dataframe to store
the
final
: results. It has a fixed row number but it would be better to have a variable
: number, i.e., the number of combination of the groups taken 2 by 2 (4 groups
=
: 5 combinations, 6 groups = 15 combinations). Which function in R returns the
: number of combinations? I couldn伮抰 find it!
?choose
:
: 2) In lines 35 to 39 I print the result tables. How can I make a
"clean"
: print, without row numbers and quotation marks?
?cat
:
: 3) Also I伮抎 like to calculate the critical values "q" and p-values
but I
: couldn伮抰 find the distribution indicated in Zar伮抯 Table B.15
: (App107) "Critical Values of Q for Nonparametric Multiple Comparison
Testing".
: Is it possible to calculate these numbers in R?
:
: Thanks in advance for any help or comments to improve this function.
:
: Best regards,
:
: Antonio Olinto
: Sao Paulo Fisheries Institute
: Brazil
:
: multcomp <- function(DataSet) {
: dat.multcomp <- DataSet
You don't need to copy the DataSet. If you try to modify any object
within a function then R makes a copy of it automatically anyways.
: names(dat.multcomp) <- c("VarCat","VarNum")
: attach(dat.multcomp)
attach is generally frowned upon. Just call it a short name such as d so that
its easy to refer to. (Alternately wrap references to data frame in a with.)
I have assumed the data frame is called d in what follows.
: dat.multcomp$Rank <- rank(VarNum)
: attach(dat.multcomp)
: RankList <- aggregate(Rank,list(Rank=Rank),FUN=length)
: t <- length(RankList$Rank)
: st <- 0
: for (i in 1:t) if (RankList[i,2]>1)
st<-st+(RankList[i,2]^3-RankList[i,2])
The last two statements could be reduced to:
st <- sum(ifelse( RankList[,2]>1, RankList[,2]^3-RankList[,2], 0 )
: LevCat <- levels(dat.multcomp$VarCat)
: NLevCat <- aggregate(VarCat,list(LevCat=VarCat),FUN=length)
: RLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=sum)
: MLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=mean)
: SampleSummary <- data.frame(LevCat,RLevCat[,2],NLevCat[,2],MLevCat[,2])
:
names(SampleSummary)<-c("Samples","RSum","N","RMean")
If I have followed this correctly then the above 6 lines can be reduced to:
SampleSummary <- do.call( "rbind", by(d$Rank, d$VarCat,
function(x) c(Rsum=sum(x),N=length(x),RMean=mean(x))))
although that produces a matrix rather than a data frame and puts the
levels as the row rather than a column so you will have to adjust
what follows.
: SampleSummary <- SampleSummary[order(SampleSummary$RMean,decreasing=T),]
: NCat <- length(LevCat)
: N <- length(dat.multcomp$VarCat)
: Results <- data.frame(rep(NA,6),rep(NA,6),rep(NA,6),rep(NA,6))
: names(Results) <-
c("Comparison","Difference","SE","Q")
: l <- 1
: for (i in 1:(NCat-1)) {
: for (j in NCat:(i+1)) {
: SE <- sqrt(((N*(N+1)/12)-(st/(12*(N-1))))*((1/SampleSummary[i,3])+
: (1/SampleSummary[j,3])))
: Dif <- SampleSummary[i,4]-SampleSummary[j,4]
: Q=Dif/SE
: Results[l,1] <- paste(SampleSummary[i,1],"vs",SampleSummary[j,1])
: Results[l,2] <- round(Dif,4)
: Results[l,3] <- round(SE,4)
: Results[l,4] <- round(Q,4)
: l <-l+1
: }
: }
: print("Sample summary ranked by mean ranks")
: print(SampleSummary)
: print("")
: print("Table of multiple comparisons")
: print(Results)
: }
:
: ==========:
: Data example taken from Zar (Biostatiscical Analysis):
:
: dat.limno <- data.frame(scan(what = list (Pound=" ", pH=0),
sep=" "))
: 1 7.68
: 1 7.69
: 1 7.7
: 1 7.7
: 1 7.72
: 1 7.73
: 1 7.73
: 1 7.76
: 2 7.71
: 2 7.73
: 2 7.74
: 2 7.74
: 2 7.78
: 2 7.78
: 2 7.8
: 2 7.81
: 3 7.74
: 3 7.75
: 3 7.77
: 3 7.78
: 3 7.8
: 3 7.81
: 3 7.84
: 4 7.71
: 4 7.71
: 4 7.74
: 4 7.79
: 4 7.81
: 4 7.85
: 4 7.87
: 4 7.91
:
: kruskal.test(pH~Pound)
:
: Kruskal-Wallis rank sum test
:
: data
: Kruskal-Wallis chi-squared = 11.9435, df = 3, p-value = 0.007579
:
: multcomp(dat.limno)
:
: [1] "Sample summary ranked by mean ranks"
: Samples RSum N RMean
: 3 3 145.0 7 20.71429
: 4 4 163.5 8 20.43750
: 2 2 132.5 8 16.56250
: 1 1 55.0 8 6.87500
: [1] ""
: [1] "Table of multiple comparisons"
: Comparison Difference SE Q
: 1 3 vs 1 13.8393 4.6923 2.9493
: 2 3 vs 2 4.1518 4.6923 0.8848
: 3 3 vs 4 0.2768 4.6923 0.0590
: 4 4 vs 1 13.5625 4.5332 2.9918
: 5 4 vs 2 3.8750 4.5332 0.8548
: 6 2 vs 1 9.6875 4.5332 2.1370
:
:
: -------------------------------------------------
: WebMail Bignet - O seu provedor do litoral
: www.bignet.com.br
:
: ______________________________________________
: R-help <at> stat.math.ethz.ch mailing list
: https://www.stat.math.ethz.ch/mailman/listinfo/r-help
: PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
:
:
Have you looked at the npmc package on CRAN? HTH, Andy> From: aolinto_r at bignet.com.br > > Hello all, > > Here goes one of my first functions. > > I want to make a nonparametric multiple sample comparison > with unequal sample > sizes (see Zar's Biostatistical Analysis, 3rd. Ed., pg. 201 > Example 10.11, pg. > 288 Example 11.10). In the real world, I want to compare > samples of fish > length captured with different fishing gears. > > After using the Kruskal-Wallis test I want to check the > differences between > each two samples. > > I wrote the function multcomp (see below) and it's working OK > but I still have > some doubts. To use it one must have a two-column dataframe > with groups and > measurements. > > 1) In line 20 (Results <- data.frame...) I create a dataframe > to store the final > results. It has a fixed row number but it would be better to > have a variable > number, i.e., the number of combination of the groups taken 2 > by 2 (4 groups = > 5 combinations, 6 groups = 15 combinations). Which function > in R returns the > number of combinations? I couldn't find it! > > 2) In lines 35 to 39 I print the result tables. How can I > make a "clean" > print, without row numbers and quotation marks? > > 3) Also I'd like to calculate the critical values "q" and > p-values but I > couldn't find the distribution indicated in Zar's Table B.15 > (App107) "Critical Values of Q for Nonparametric Multiple > Comparison Testing". > Is it possible to calculate these numbers in R? > > Thanks in advance for any help or comments to improve this function. > > Best regards, > > Antonio Olinto > Sao Paulo Fisheries Institute > Brazil > > multcomp <- function(DataSet) { > dat.multcomp <- DataSet > names(dat.multcomp) <- c("VarCat","VarNum") > attach(dat.multcomp) > dat.multcomp$Rank <- rank(VarNum) > attach(dat.multcomp) > RankList <- aggregate(Rank,list(Rank=Rank),FUN=length) > t <- length(RankList$Rank) > st <- 0 > for (i in 1:t) if (RankList[i,2]>1) > st<-st+(RankList[i,2]^3-RankList[i,2]) > LevCat <- levels(dat.multcomp$VarCat) > NLevCat <- aggregate(VarCat,list(LevCat=VarCat),FUN=length) > RLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=sum) > MLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=mean) > SampleSummary <- > data.frame(LevCat,RLevCat[,2],NLevCat[,2],MLevCat[,2]) > names(SampleSummary)<-c("Samples","RSum","N","RMean") > SampleSummary <- > SampleSummary[order(SampleSummary$RMean,decreasing=T),] > NCat <- length(LevCat) > N <- length(dat.multcomp$VarCat) > Results <- data.frame(rep(NA,6),rep(NA,6),rep(NA,6),rep(NA,6)) > names(Results) <- c("Comparison","Difference","SE","Q") > l <- 1 > for (i in 1:(NCat-1)) { > for (j in NCat:(i+1)) { > SE <- sqrt(((N*(N+1)/12)-(st/(12*(N-1))))*((1/SampleSummary[i,3])+ > (1/SampleSummary[j,3]))) > Dif <- SampleSummary[i,4]-SampleSummary[j,4] > Q=Dif/SE > Results[l,1] <- paste(SampleSummary[i,1],"vs",SampleSummary[j,1]) > Results[l,2] <- round(Dif,4) > Results[l,3] <- round(SE,4) > Results[l,4] <- round(Q,4) > l <-l+1 > } > } > print("Sample summary ranked by mean ranks") > print(SampleSummary) > print("") > print("Table of multiple comparisons") > print(Results) > } > > ==========> > Data example taken from Zar (Biostatiscical Analysis): > > dat.limno <- data.frame(scan(what = list (Pound=" ", pH=0), sep=" ")) > 1 7.68 > 1 7.69 > 1 7.7 > 1 7.7 > 1 7.72 > 1 7.73 > 1 7.73 > 1 7.76 > 2 7.71 > 2 7.73 > 2 7.74 > 2 7.74 > 2 7.78 > 2 7.78 > 2 7.8 > 2 7.81 > 3 7.74 > 3 7.75 > 3 7.77 > 3 7.78 > 3 7.8 > 3 7.81 > 3 7.84 > 4 7.71 > 4 7.71 > 4 7.74 > 4 7.79 > 4 7.81 > 4 7.85 > 4 7.87 > 4 7.91 > > kruskal.test(pH~Pound) > > Kruskal-Wallis rank sum test > > data > Kruskal-Wallis chi-squared = 11.9435, df = 3, p-value = 0.007579 > > multcomp(dat.limno) > > [1] "Sample summary ranked by mean ranks" > Samples RSum N RMean > 3 3 145.0 7 20.71429 > 4 4 163.5 8 20.43750 > 2 2 132.5 8 16.56250 > 1 1 55.0 8 6.87500 > [1] "" > [1] "Table of multiple comparisons" > Comparison Difference SE Q > 1 3 vs 1 13.8393 4.6923 2.9493 > 2 3 vs 2 4.1518 4.6923 0.8848 > 3 3 vs 4 0.2768 4.6923 0.0590 > 4 4 vs 1 13.5625 4.5332 2.9918 > 5 4 vs 2 3.8750 4.5332 0.8548 > 6 2 vs 1 9.6875 4.5332 2.1370 > > > ------------------------------------------------- > WebMail Bignet - O seu provedor do litoral > www.bignet.com.br > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}