aolinto_r@bignet.com.br
2004-Apr-25 15:52 UTC
[R] nonparametric multiple sample comparison
Hello all, Here goes one of my first functions. I want to make a nonparametric multiple sample comparison with unequal sample sizes (see Zar?s Biostatistical Analysis, 3rd. Ed., pg. 201 Example 10.11, pg. 288 Example 11.10). In the real world, I want to compare samples of fish length captured with different fishing gears. After using the Kruskal-Wallis test I want to check the differences between each two samples. I wrote the function multcomp (see below) and it?s working OK but I still have some doubts. To use it one must have a two-column dataframe with groups and measurements. 1) In line 20 (Results <- data.frame....) I create a dataframe to store the final results. It has a fixed row number but it would be better to have a variable number, i.e., the number of combination of the groups taken 2 by 2 (4 groups = 5 combinations, 6 groups = 15 combinations). Which function in R returns the number of combinations? I couldn?t find it! 2) In lines 35 to 39 I print the result tables. How can I make a "clean" print, without row numbers and quotation marks? 3) Also I?d like to calculate the critical values "q" and p-values but I couldn?t find the distribution indicated in Zar?s Table B.15 (App107) "Critical Values of Q for Nonparametric Multiple Comparison Testing". Is it possible to calculate these numbers in R? Thanks in advance for any help or comments to improve this function. Best regards, Antonio Olinto Sao Paulo Fisheries Institute Brazil multcomp <- function(DataSet) { dat.multcomp <- DataSet names(dat.multcomp) <- c("VarCat","VarNum") attach(dat.multcomp) dat.multcomp$Rank <- rank(VarNum) attach(dat.multcomp) RankList <- aggregate(Rank,list(Rank=Rank),FUN=length) t <- length(RankList$Rank) st <- 0 for (i in 1:t) if (RankList[i,2]>1) st<-st+(RankList[i,2]^3-RankList[i,2]) LevCat <- levels(dat.multcomp$VarCat) NLevCat <- aggregate(VarCat,list(LevCat=VarCat),FUN=length) RLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=sum) MLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=mean) SampleSummary <- data.frame(LevCat,RLevCat[,2],NLevCat[,2],MLevCat[,2]) names(SampleSummary)<-c("Samples","RSum","N","RMean") SampleSummary <- SampleSummary[order(SampleSummary$RMean,decreasing=T),] NCat <- length(LevCat) N <- length(dat.multcomp$VarCat) Results <- data.frame(rep(NA,6),rep(NA,6),rep(NA,6),rep(NA,6)) names(Results) <- c("Comparison","Difference","SE","Q") l <- 1 for (i in 1:(NCat-1)) { for (j in NCat:(i+1)) { SE <- sqrt(((N*(N+1)/12)-(st/(12*(N-1))))*((1/SampleSummary[i,3])+ (1/SampleSummary[j,3]))) Dif <- SampleSummary[i,4]-SampleSummary[j,4] Q=Dif/SE Results[l,1] <- paste(SampleSummary[i,1],"vs",SampleSummary[j,1]) Results[l,2] <- round(Dif,4) Results[l,3] <- round(SE,4) Results[l,4] <- round(Q,4) l <-l+1 } } print("Sample summary ranked by mean ranks") print(SampleSummary) print("") print("Table of multiple comparisons") print(Results) } ========== Data example taken from Zar (Biostatiscical Analysis): dat.limno <- data.frame(scan(what = list (Pound=" ", pH=0), sep=" ")) 1 7.68 1 7.69 1 7.7 1 7.7 1 7.72 1 7.73 1 7.73 1 7.76 2 7.71 2 7.73 2 7.74 2 7.74 2 7.78 2 7.78 2 7.8 2 7.81 3 7.74 3 7.75 3 7.77 3 7.78 3 7.8 3 7.81 3 7.84 4 7.71 4 7.71 4 7.74 4 7.79 4 7.81 4 7.85 4 7.87 4 7.91 kruskal.test(pH~Pound) Kruskal-Wallis rank sum test data Kruskal-Wallis chi-squared = 11.9435, df = 3, p-value = 0.007579 multcomp(dat.limno) [1] "Sample summary ranked by mean ranks" Samples RSum N RMean 3 3 145.0 7 20.71429 4 4 163.5 8 20.43750 2 2 132.5 8 16.56250 1 1 55.0 8 6.87500 [1] "" [1] "Table of multiple comparisons" Comparison Difference SE Q 1 3 vs 1 13.8393 4.6923 2.9493 2 3 vs 2 4.1518 4.6923 0.8848 3 3 vs 4 0.2768 4.6923 0.0590 4 4 vs 1 13.5625 4.5332 2.9918 5 4 vs 2 3.8750 4.5332 0.8548 6 2 vs 1 9.6875 4.5332 2.1370 ------------------------------------------------- WebMail Bignet - O seu provedor do litoral www.bignet.com.br
I have not attempted to follow this in detail but a few comments are interspersed within your discussion and the code: <aolinto_r <at> bignet.com.br> writes: : : Hello all, : : Here goes one of my first functions. : : I want to make a nonparametric multiple sample comparison with unequal sample : sizes (see Zar伮抯 Biostatistical Analysis, 3rd. Ed., pg. 201 Example 10.11, pg. : 288 Example 11.10). In the real world, I want to compare samples of fish : length captured with different fishing gears. : : After using the Kruskal-Wallis test I want to check the differences between : each two samples. : : I wrote the function multcomp (see below) and it伮抯 working OK but I still have : some doubts. To use it one must have a two-column dataframe with groups and : measurements. : : 1) In line 20 (Results <- data.frame伮....) I create a dataframe to store the final : results. It has a fixed row number but it would be better to have a variable : number, i.e., the number of combination of the groups taken 2 by 2 (4 groups = : 5 combinations, 6 groups = 15 combinations). Which function in R returns the : number of combinations? I couldn伮抰 find it! ?choose : : 2) In lines 35 to 39 I print the result tables. How can I make a "clean" : print, without row numbers and quotation marks? ?cat : : 3) Also I伮抎 like to calculate the critical values "q" and p-values but I : couldn伮抰 find the distribution indicated in Zar伮抯 Table B.15 : (App107) "Critical Values of Q for Nonparametric Multiple Comparison Testing". : Is it possible to calculate these numbers in R? : : Thanks in advance for any help or comments to improve this function. : : Best regards, : : Antonio Olinto : Sao Paulo Fisheries Institute : Brazil : : multcomp <- function(DataSet) { : dat.multcomp <- DataSet You don't need to copy the DataSet. If you try to modify any object within a function then R makes a copy of it automatically anyways. : names(dat.multcomp) <- c("VarCat","VarNum") : attach(dat.multcomp) attach is generally frowned upon. Just call it a short name such as d so that its easy to refer to. (Alternately wrap references to data frame in a with.) I have assumed the data frame is called d in what follows. : dat.multcomp$Rank <- rank(VarNum) : attach(dat.multcomp) : RankList <- aggregate(Rank,list(Rank=Rank),FUN=length) : t <- length(RankList$Rank) : st <- 0 : for (i in 1:t) if (RankList[i,2]>1) st<-st+(RankList[i,2]^3-RankList[i,2]) The last two statements could be reduced to: st <- sum(ifelse( RankList[,2]>1, RankList[,2]^3-RankList[,2], 0 ) : LevCat <- levels(dat.multcomp$VarCat) : NLevCat <- aggregate(VarCat,list(LevCat=VarCat),FUN=length) : RLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=sum) : MLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=mean) : SampleSummary <- data.frame(LevCat,RLevCat[,2],NLevCat[,2],MLevCat[,2]) : names(SampleSummary)<-c("Samples","RSum","N","RMean") If I have followed this correctly then the above 6 lines can be reduced to: SampleSummary <- do.call( "rbind", by(d$Rank, d$VarCat, function(x) c(Rsum=sum(x),N=length(x),RMean=mean(x)))) although that produces a matrix rather than a data frame and puts the levels as the row rather than a column so you will have to adjust what follows. : SampleSummary <- SampleSummary[order(SampleSummary$RMean,decreasing=T),] : NCat <- length(LevCat) : N <- length(dat.multcomp$VarCat) : Results <- data.frame(rep(NA,6),rep(NA,6),rep(NA,6),rep(NA,6)) : names(Results) <- c("Comparison","Difference","SE","Q") : l <- 1 : for (i in 1:(NCat-1)) { : for (j in NCat:(i+1)) { : SE <- sqrt(((N*(N+1)/12)-(st/(12*(N-1))))*((1/SampleSummary[i,3])+ : (1/SampleSummary[j,3]))) : Dif <- SampleSummary[i,4]-SampleSummary[j,4] : Q=Dif/SE : Results[l,1] <- paste(SampleSummary[i,1],"vs",SampleSummary[j,1]) : Results[l,2] <- round(Dif,4) : Results[l,3] <- round(SE,4) : Results[l,4] <- round(Q,4) : l <-l+1 : } : } : print("Sample summary ranked by mean ranks") : print(SampleSummary) : print("") : print("Table of multiple comparisons") : print(Results) : } : : ==========: : Data example taken from Zar (Biostatiscical Analysis): : : dat.limno <- data.frame(scan(what = list (Pound=" ", pH=0), sep=" ")) : 1 7.68 : 1 7.69 : 1 7.7 : 1 7.7 : 1 7.72 : 1 7.73 : 1 7.73 : 1 7.76 : 2 7.71 : 2 7.73 : 2 7.74 : 2 7.74 : 2 7.78 : 2 7.78 : 2 7.8 : 2 7.81 : 3 7.74 : 3 7.75 : 3 7.77 : 3 7.78 : 3 7.8 : 3 7.81 : 3 7.84 : 4 7.71 : 4 7.71 : 4 7.74 : 4 7.79 : 4 7.81 : 4 7.85 : 4 7.87 : 4 7.91 : : kruskal.test(pH~Pound) : : Kruskal-Wallis rank sum test : : data : Kruskal-Wallis chi-squared = 11.9435, df = 3, p-value = 0.007579 : : multcomp(dat.limno) : : [1] "Sample summary ranked by mean ranks" : Samples RSum N RMean : 3 3 145.0 7 20.71429 : 4 4 163.5 8 20.43750 : 2 2 132.5 8 16.56250 : 1 1 55.0 8 6.87500 : [1] "" : [1] "Table of multiple comparisons" : Comparison Difference SE Q : 1 3 vs 1 13.8393 4.6923 2.9493 : 2 3 vs 2 4.1518 4.6923 0.8848 : 3 3 vs 4 0.2768 4.6923 0.0590 : 4 4 vs 1 13.5625 4.5332 2.9918 : 5 4 vs 2 3.8750 4.5332 0.8548 : 6 2 vs 1 9.6875 4.5332 2.1370 : : : ------------------------------------------------- : WebMail Bignet - O seu provedor do litoral : www.bignet.com.br : : ______________________________________________ : R-help <at> stat.math.ethz.ch mailing list : https://www.stat.math.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html : :
Have you looked at the npmc package on CRAN? HTH, Andy> From: aolinto_r at bignet.com.br > > Hello all, > > Here goes one of my first functions. > > I want to make a nonparametric multiple sample comparison > with unequal sample > sizes (see Zar's Biostatistical Analysis, 3rd. Ed., pg. 201 > Example 10.11, pg. > 288 Example 11.10). In the real world, I want to compare > samples of fish > length captured with different fishing gears. > > After using the Kruskal-Wallis test I want to check the > differences between > each two samples. > > I wrote the function multcomp (see below) and it's working OK > but I still have > some doubts. To use it one must have a two-column dataframe > with groups and > measurements. > > 1) In line 20 (Results <- data.frame...) I create a dataframe > to store the final > results. It has a fixed row number but it would be better to > have a variable > number, i.e., the number of combination of the groups taken 2 > by 2 (4 groups = > 5 combinations, 6 groups = 15 combinations). Which function > in R returns the > number of combinations? I couldn't find it! > > 2) In lines 35 to 39 I print the result tables. How can I > make a "clean" > print, without row numbers and quotation marks? > > 3) Also I'd like to calculate the critical values "q" and > p-values but I > couldn't find the distribution indicated in Zar's Table B.15 > (App107) "Critical Values of Q for Nonparametric Multiple > Comparison Testing". > Is it possible to calculate these numbers in R? > > Thanks in advance for any help or comments to improve this function. > > Best regards, > > Antonio Olinto > Sao Paulo Fisheries Institute > Brazil > > multcomp <- function(DataSet) { > dat.multcomp <- DataSet > names(dat.multcomp) <- c("VarCat","VarNum") > attach(dat.multcomp) > dat.multcomp$Rank <- rank(VarNum) > attach(dat.multcomp) > RankList <- aggregate(Rank,list(Rank=Rank),FUN=length) > t <- length(RankList$Rank) > st <- 0 > for (i in 1:t) if (RankList[i,2]>1) > st<-st+(RankList[i,2]^3-RankList[i,2]) > LevCat <- levels(dat.multcomp$VarCat) > NLevCat <- aggregate(VarCat,list(LevCat=VarCat),FUN=length) > RLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=sum) > MLevCat <- aggregate(Rank,list(LevCat=VarCat),FUN=mean) > SampleSummary <- > data.frame(LevCat,RLevCat[,2],NLevCat[,2],MLevCat[,2]) > names(SampleSummary)<-c("Samples","RSum","N","RMean") > SampleSummary <- > SampleSummary[order(SampleSummary$RMean,decreasing=T),] > NCat <- length(LevCat) > N <- length(dat.multcomp$VarCat) > Results <- data.frame(rep(NA,6),rep(NA,6),rep(NA,6),rep(NA,6)) > names(Results) <- c("Comparison","Difference","SE","Q") > l <- 1 > for (i in 1:(NCat-1)) { > for (j in NCat:(i+1)) { > SE <- sqrt(((N*(N+1)/12)-(st/(12*(N-1))))*((1/SampleSummary[i,3])+ > (1/SampleSummary[j,3]))) > Dif <- SampleSummary[i,4]-SampleSummary[j,4] > Q=Dif/SE > Results[l,1] <- paste(SampleSummary[i,1],"vs",SampleSummary[j,1]) > Results[l,2] <- round(Dif,4) > Results[l,3] <- round(SE,4) > Results[l,4] <- round(Q,4) > l <-l+1 > } > } > print("Sample summary ranked by mean ranks") > print(SampleSummary) > print("") > print("Table of multiple comparisons") > print(Results) > } > > ==========> > Data example taken from Zar (Biostatiscical Analysis): > > dat.limno <- data.frame(scan(what = list (Pound=" ", pH=0), sep=" ")) > 1 7.68 > 1 7.69 > 1 7.7 > 1 7.7 > 1 7.72 > 1 7.73 > 1 7.73 > 1 7.76 > 2 7.71 > 2 7.73 > 2 7.74 > 2 7.74 > 2 7.78 > 2 7.78 > 2 7.8 > 2 7.81 > 3 7.74 > 3 7.75 > 3 7.77 > 3 7.78 > 3 7.8 > 3 7.81 > 3 7.84 > 4 7.71 > 4 7.71 > 4 7.74 > 4 7.79 > 4 7.81 > 4 7.85 > 4 7.87 > 4 7.91 > > kruskal.test(pH~Pound) > > Kruskal-Wallis rank sum test > > data > Kruskal-Wallis chi-squared = 11.9435, df = 3, p-value = 0.007579 > > multcomp(dat.limno) > > [1] "Sample summary ranked by mean ranks" > Samples RSum N RMean > 3 3 145.0 7 20.71429 > 4 4 163.5 8 20.43750 > 2 2 132.5 8 16.56250 > 1 1 55.0 8 6.87500 > [1] "" > [1] "Table of multiple comparisons" > Comparison Difference SE Q > 1 3 vs 1 13.8393 4.6923 2.9493 > 2 3 vs 2 4.1518 4.6923 0.8848 > 3 3 vs 4 0.2768 4.6923 0.0590 > 4 4 vs 1 13.5625 4.5332 2.9918 > 5 4 vs 2 3.8750 4.5332 0.8548 > 6 2 vs 1 9.6875 4.5332 2.1370 > > > ------------------------------------------------- > WebMail Bignet - O seu provedor do litoral > www.bignet.com.br > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}