I've got a data frame that looks like this: subject foo bar 2 1.7 3.2 2 2.3 4.1 3 7.6 2.3 3 7.1 3.3 3 7.3 2.3 3 7.4 1.3 5 6.2 6.1 5 3.4 6.9 ... That is, I've got multiple rows per subject. I need to compute summaries within categories where the subject has the same number of rows. For example, subject 2 and 5 both have two rows. I need to compute mean for those four values of foo. This looks like a good candidate for index vectors, but I need some help. I've tried something like: table(data) -> tmp and: tmp[tmp == 2] and even: as.numeric(attr(tmp[tmp == 2],"names")) to get a vector of subject numbers that have two rows in the original data frame. But I am getting stuck there. I want some kind of "is.member" function to use in a subsequent index vector expression, like: i <- as.numeric(attr(tmp[tmp == 2],"names")) data[is.member($subject,i)]$foo but there isn't an is.member() function. Can someone please give me a pointer on the canonical way to do this? Thanks! -- Russell Senior ``The two chiefs turned to each other. seniorr at aracnet.com Bellison uncorked a flood of horrible profanity, which, translated meant, `This is extremely unusual.' '' -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
There's probably a better way, but ... apply(outer(subject,subject,FUN="=="),1,sum) will give you a vector of the counts for each value of subject, so would be 2 2 4 4 4 4 2 2 ... in your example. You could add this as a column of the data frame and use gsummary to get the summary statistics. Ian.> -----Original Message----- > From: Russell Senior [mailto:seniorr at aracnet.com] > Sent: Wednesday, 5 June 2002 10:31 AM > To: R-help at stat.math.ethz.ch > Subject: [R] hairy indexing problem > > > > I've got a data frame that looks like this: > > subject foo bar > 2 1.7 3.2 > 2 2.3 4.1 > 3 7.6 2.3 > 3 7.1 3.3 > 3 7.3 2.3 > 3 7.4 1.3 > 5 6.2 6.1 > 5 3.4 6.9 > ... > > That is, I've got multiple rows per subject. I need to compute > summaries within categories where the subject has the same number of > rows. For example, subject 2 and 5 both have two rows. I need to > compute mean for those four values of foo. This looks like a good > candidate for index vectors, but I need some help. I've tried > something like: > > table(data) -> tmp > > and: > > tmp[tmp == 2] > > and even: > > as.numeric(attr(tmp[tmp == 2],"names")) > > to get a vector of subject numbers that have two rows in the original > data frame. But I am getting stuck there. I want some kind of > "is.member" function to use in a subsequent index vector expression, > like: > > i <- as.numeric(attr(tmp[tmp == 2],"names")) > data[is.member($subject,i)]$foo > > but there isn't an is.member() function. Can someone please give me a > pointer on the canonical way to do this? > > Thanks! > > -- > Russell Senior ``The two chiefs turned to each other. > seniorr at aracnet.com Bellison uncorked a flood of horrible > profanity, which, translated meant, `This is > extremely unusual.' '' > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> I've got a data frame that looks like this: > > subject foo bar > 2 1.7 3.2 > 2 2.3 4.1 > 3 7.6 2.3 > 3 7.1 3.3 > 3 7.3 2.3 > 3 7.4 1.3 > 5 6.2 6.1 > 5 3.4 6.9 > ... > > That is, I've got multiple rows per subject. I need to compute > summaries within categories where the subject has the same number of > rows. For example, subject 2 and 5 both have two rows. I need to > compute mean for those four values of foo. > Can someone please give me a > pointer on the canonical way to do this?Canonical? Would you settle for "it works for me"? ;) I suspect one of the gurus has a tidy, elegant way of doing this, but here's how I'd do it instead (not being a guru). Run-length encoding works pretty well at things like this.> d1 <- data.frame(subject=c(2,2,3,3,3,3,5,5),foo=c(1.7,2.3,7.6,7.1,7.3,7.4,6.2,3.4))> d1subject foo 1 2 1.7 2 2 2.3 3 3 7.6 4 3 7.1 5 3 7.3 6 3 7.4 7 5 6.2 8 5 3.4> d1.subj.rle <- rle(d1$subject[order(d1$subject)])## make a vector of unique numbers of subjects> n.subj <- unique(d1.subj.rle$lengths)## now take means based on number of subjects.> > n.subj <- unique(d1.subj.rle$lengths) > sapply(n.subj,function(x,...) {+ mean(d1$foo[d1$subject %in% d1.subj.rle$values[d1.subj.rle$lengths == x]])}) [1] 3.40 7.35 ##check the numbers> mean(d1$foo[d1$subject == 2 | d1$subject == 5])[1] 3.4> mean(d1$foo[d1$subject == 3])[1] 7.35>That could be a *lot* clearer inside the sapply function; maybe in v2.0 of my attempt at this ;) Cheers Jason -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> -----Original Message----- > From: Ian.Saunders at csiro.au [mailto:Ian.Saunders at csiro.au] > Sent: Wednesday, June 05, 2002 2:32 PM > To: seniorr at aracnet.com; R-help at stat.math.ethz.ch > Subject: RE: [R] hairy indexing problem > > There's probably a better way, but ... > > apply(outer(subject,subject,FUN="=="),1,sum) > > will give you a vector of the counts for each value of subject, so would > be > 2 2 4 4 4 4 2 2 ... > in your example.[WNV] I think there could be a better way. Make sure subject is a factor: subject <- as.factor(data$subject) and then your "replications class" factor is reps <- factor(table(subject)[subject]) The next step could be fooMeans <- tapply(data$foo, reps, mean)> > You could add this as a column of the data frame and use gsummary to get > the > summary statistics.[WNV] Yep, that too. gsummary is part of the nlme package which has to be loaded.> Ian.[WNV] Bill.> > -----Original Message----- > > From: Russell Senior [mailto:seniorr at aracnet.com] > > Sent: Wednesday, 5 June 2002 10:31 AM > > To: R-help at stat.math.ethz.ch > > Subject: [R] hairy indexing problem > > > > > > > > I've got a data frame that looks like this: > > > > subject foo bar > > 2 1.7 3.2 > > 2 2.3 4.1 > > 3 7.6 2.3 > > 3 7.1 3.3 > > 3 7.3 2.3 > > 3 7.4 1.3 > > 5 6.2 6.1 > > 5 3.4 6.9 > > ... > > > > That is, I've got multiple rows per subject. I need to compute > > summaries within categories where the subject has the same number of > > rows. For example, subject 2 and 5 both have two rows. I need to > > compute mean for those four values of foo. This looks like a good > > candidate for index vectors, but I need some help. I've tried > > something like: > > > > table(data) -> tmp > > > > and: > > > > tmp[tmp == 2] > > > > and even: > > > > as.numeric(attr(tmp[tmp == 2],"names")) > > > > to get a vector of subject numbers that have two rows in the original > > data frame. But I am getting stuck there. I want some kind of > > "is.member" function to use in a subsequent index vector expression, > > like: > > > > i <- as.numeric(attr(tmp[tmp == 2],"names")) > > data[is.member($subject,i)]$foo > > > > but there isn't an is.member() function. Can someone please give me a > > pointer on the canonical way to do this? > > > > Thanks! > > > > -- > > Russell Senior ``The two chiefs turned to each other. > > seniorr at aracnet.com Bellison uncorked a flood of horrible > > profanity, which, translated meant, `This is > > extremely unusual.' '' > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > > -.-.-.-.-.-.-.-.- > > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _. > _._ > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.- > r-help mailing list -- Read > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. > _._._-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
See ?tapply of course. E.g. tapply(foo, subject, mean, na.rm=T) #mean of foo per subj best, vito ----- Original Message ----- From: "Russell Senior" <seniorr at aracnet.com> To: <R-help at stat.math.ethz.ch> Sent: Wednesday, June 05, 2002 3:00 AM Subject: [R] hairy indexing problem> > I've got a data frame that looks like this: > > subject foo bar > 2 1.7 3.2 > 2 2.3 4.1 > 3 7.6 2.3 > 3 7.1 3.3 > 3 7.3 2.3 > 3 7.4 1.3 > 5 6.2 6.1 > 5 3.4 6.9 > ... > > That is, I've got multiple rows per subject. I need to compute > summaries within categories where the subject has the same number of > rows. For example, subject 2 and 5 both have two rows. I need to > compute mean for those four values of foo. This looks like a good > candidate for index vectors, but I need some help. I've tried > something like: > > table(data) -> tmp > > and: > > tmp[tmp == 2] > > and even: > > as.numeric(attr(tmp[tmp == 2],"names")) > > to get a vector of subject numbers that have two rows in the original > data frame. But I am getting stuck there. I want some kind of > "is.member" function to use in a subsequent index vector expression, > like: > > i <- as.numeric(attr(tmp[tmp == 2],"names")) > data[is.member($subject,i)]$foo > > but there isn't an is.member() function. Can someone please give me a > pointer on the canonical way to do this? > > Thanks! > > -- > Russell Senior ``The two chiefs turned to each other. > seniorr at aracnet.com Bellison uncorked a flood of horrible > profanity, which, translated meant, `This is > extremely unusual.' '' > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-> r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html> Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch >_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I would use function w.conti that calculates a weighted contingency matrix. That is, given 2 vectors of categorical variables (i.e., species and soil type) and a 3rd vector of a quantitative variable (i.e. biomass), calculates the sum of the quant. var. for each pair (i.e., the total biomass for each species in each soil type). With your data, as you just have one categorical variable, just set the second one to a constant to calculate the sum of foo for each subject:> matriz<-cbind(sub,foo,bar) > matrizsub foo bar [1,] 2 1.7 3.2 [2,] 2 2.3 4.1 [3,] 3 7.6 2.3 [4,] 3 7.1 3.3 [5,] 3 7.3 2.3 [6,] 3 7.4 1.3 [7,] 5 6.2 6.1 [8,] 5 3.4 6.9> > a <- w.conti(matriz[,1],rep(1,nrow(matriz)),matriz[,2]) > av2 v1 1 2 4.0 3 29.4 5 9.6 Then, using the result of table you can calculate the mean from the sum:> a/as.vector(table(matriz[,1]))v2 v1 1 2 2.00 3 7.35 5 4.80>From your question I understand that you want new subjects accordingto their number of rows, so that subject 2 and 5 would become a new subject:> new.sub <- as.vector(table(matriz[,1])) > new.sub[1] 2 4 2> new.sub <- rep(new.sub,new.sub) > new.sub[1] 2 2 4 4 4 4 2 2> a <- w.conti(new.sub,rep(1,nrow(matriz)),matriz[,2]) > av2 v1 1 2 13.6 4 29.4> a/as.vector(table(new.sub))v2 v1 1 2 3.40 4 7.35>w.conti is simply: function (v1,v2,z) { xtabs(z~v1+v2) } (I could use xtabs() directely, but I never remember that expression, while w.conti is easier to remember) Of course, if you always need the mean, just add the second step to w.conti. Agus Dr. Agustin Lobo Instituto de Ciencias de la Tierra (CSIC) Lluis Sole Sabaris s/n 08028 Barcelona SPAIN tel 34 93409 5410 fax 34 93411 0012 alobo at ija.csic.es On 4 Jun 2002, Russell Senior wrote:> > I've got a data frame that looks like this: > > subject foo bar > 2 1.7 3.2 > 2 2.3 4.1 > 3 7.6 2.3 > 3 7.1 3.3 > 3 7.3 2.3 > 3 7.4 1.3 > 5 6.2 6.1 > 5 3.4 6.9 > ... > > That is, I've got multiple rows per subject. I need to compute > summaries within categories where the subject has the same number of > rows. For example, subject 2 and 5 both have two rows. I need to > compute mean for those four values of foo. This looks like a good > candidate for index vectors, but I need some help. I've tried > something like: > > table(data) -> tmp > > and: > > tmp[tmp == 2] > > and even: > > as.numeric(attr(tmp[tmp == 2],"names")) > > to get a vector of subject numbers that have two rows in the original > data frame. But I am getting stuck there. I want some kind of > "is.member" function to use in a subsequent index vector expression, > like: > > i <- as.numeric(attr(tmp[tmp == 2],"names")) > data[is.member($subject,i)]$foo > > but there isn't an is.member() function. Can someone please give me a > pointer on the canonical way to do this? > > Thanks! > > -- > Russell Senior ``The two chiefs turned to each other. > seniorr at aracnet.com Bellison uncorked a flood of horrible > profanity, which, translated meant, `This is > extremely unusual.' '' > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._