Hi everybody, I have data in the format of the example data below where essentially a large number of indicator variables (coded [0,1]) reflect traits of the same id across multiple rows. I need to represent the data in a 1 row per id format. I see this as being similar to converting from long to wide format, however, there is no time component here: The multiple rows here are all characteristics observed at the same measurement occasion. So, really I just need an individual sum for each variable (for a large number of variables) and for these to be all saved in the same row (along with the id variable and other demographics (e.g., "location"). Here is the example df and the method I used first: d1<-data.frame(id=c(1,1,1,2,2,2,2,3,3,4),location=factor(c(rep(0,7),rep(1,3)), labels=c("A","B")),var1=as.logical(round(runif(10))), var2=as.logical(round(runif(10))),var3=as.logical(round(runif(10)))) d1 mysum<-function(x) aggregate(x,by=list(d1$id),sum) d2<-sapply(d1[2:4],mysum) d2 Any help is appreciated!! Thanks! Dan [[alternative HTML version deleted]]
Hello, If I understand it correctly, just change mysum to the following. mysum<-function(x) tapply(x,d1$id,sum) Hope this helps, Rui Barradas Em 30-10-2013 11:07, Dan Abner escreveu:> Hi everybody, > > I have data in the format of the example data below where essentially a > large number of indicator variables (coded [0,1]) reflect traits of the > same id across multiple rows. I need to represent the data in a 1 row per > id format. I see this as being similar to converting from long to wide > format, however, there is no time component here: The multiple rows here > are all characteristics observed at the same measurement occasion. So, > really I just need an individual sum for each variable (for a large number > of variables) and for these to be all saved in the same row (along with the > id variable and other demographics (e.g., "location"). > > Here is the example df and the method I used first: > > > d1<-data.frame(id=c(1,1,1,2,2,2,2,3,3,4),location=factor(c(rep(0,7),rep(1,3)), > labels=c("A","B")),var1=as.logical(round(runif(10))), > var2=as.logical(round(runif(10))),var3=as.logical(round(runif(10)))) > d1 > mysum<-function(x) aggregate(x,by=list(d1$id),sum) > d2<-sapply(d1[2:4],mysum) > d2 > > Any help is appreciated!! > > Thanks! > > Dan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
David Winsemius
2013-Oct-30 17:17 UTC
[R] Subtotals by id for a large number of columns XXXX
On Oct 30, 2013, at 4:07 AM, Dan Abner wrote:> Hi everybody, > > I have data in the format of the example data below where essentially a > large number of indicator variables (coded [0,1]) reflect traits of the > same id across multiple rows. I need to represent the data in a 1 row per > id format. I see this as being similar to converting from long to wide > format, however, there is no time component here: The multiple rows here > are all characteristics observed at the same measurement occasion. So, > really I just need an individual sum for each variable (for a large number > of variables) and for these to be all saved in the same row (along with the > id variable and other demographics (e.g., "location"). > > Here is the example df and the method I used first: > > > d1<-data.frame(id=c(1,1,1,2,2,2,2,3,3,4),location=factor(c(rep(0,7),rep(1,3)), > labels=c("A","B")),var1=as.logical(round(runif(10))), > var2=as.logical(round(runif(10))),var3=as.logical(round(runif(10)))) > d1Perhaps.> mysum<-aggregate(d1[-(1:2)],by=d1[1:2] ,sum) > mysumid location var1 var2 var3 1 1 A 0 2 1 2 2 A 1 2 1 3 3 B 1 0 2 4 4 B 1 1 0> > [[alternative HTML version deleted]]Please learn to use your mail client to post in plain text. (All of the free mailer services support plain text, so continuing to post in HYML is evidence of willful refusal to adhere to the posting guidelines.) -- David Winsemius Alameda, CA, USA