Hi, When using the aggregate function to aggregate a data.frame by one or more grouping variables I often have the problem, that I want the mean for some numeric variables but the unique value for factor variables. So for example in this data-frame: data <- data.frame(x = rnorm(10,1,2), group = c(rep(1,5), rep(2,5)), gender =c(rep('m',5), rep('f',5))) aggregate(data, by=list(data$group), FUN=mean) I would like to have 'm' and 'f' in the third column, not NA. I see the problem, that it could happen that there is no unique factor level in a group ? but is there an alternative function who at least tries what I am aiming at? That is; "aggregate the data.frame by a list of grouping variables, for numeric variables compute the mean, for factor variables return the unique factor value" Thanks!
Hi, Hope this is what you meant. #data1 aggregate(.~group+gender,data=data1,mean) #? group gender???????? x #1???? 2????? f? 1.750686 #2???? 1????? m -1.074343 A.K. ----- Original Message ----- From: Martin Batholdy <batholdy at googlemail.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, January 11, 2013 10:07 AM Subject: [R] aggregate data.frame based on column class Hi, When using the aggregate function to aggregate a data.frame by one or more grouping variables I often have the problem, that I want the mean for some numeric variables but the unique value for factor variables. So for example in this data-frame: data <- data.frame(x = rnorm(10,1,2), group = c(rep(1,5), rep(2,5)), gender =c(rep('m',5), rep('f',5))) aggregate(data, by=list(data$group), FUN=mean) I would like to have 'm' and 'f' in the third column, not NA. I see the problem, that it could happen that there is no unique factor level in a group ? but is there an alternative function who at least tries what I am aiming at? That is; "aggregate the data.frame by a list of grouping variables, for numeric variables compute the mean, for factor variables return the unique factor value" Thanks! ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Please see in line. On Fri, Jan 11, 2013 at 10:07 AM, Martin Batholdy <batholdy at googlemail.com> wrote:> Hi, > > When using the aggregate function to aggregate a data.frame by one or more grouping variables I often have the problem, that I want the mean for some numeric variables but the unique value for factor variables. > > So for example in this data-frame: > > data <- data.frame(x = rnorm(10,1,2), group = c(rep(1,5), rep(2,5)), gender =c(rep('m',5), rep('f',5))) > aggregate(data, by=list(data$group), FUN=mean) > > > I would like to have 'm' and 'f' in the third column, not NA. > > > I see the problem, that it could happen that there is no unique factor level in a group ? > but is there an alternative function who at least tries what I am aiming at? > > That is; > > "aggregate the data.frame by a list of grouping variables, > for numeric variables compute the mean, > for factor variables return the unique factor value"R is a language, so you just have to do the translation: mt <- function(x) { if(is.numeric(x)) { # if x is numeric return(mean(x)) # compute the mean } else { # otherwise tab <- table(x) # tabulate x return(paste(paste(names(tab), # and format it for display tab, sep=": "), collapse=", ")) } } aggregate(Dat, by=list(Dat$group), FUN=mt) Best, Ista> > > Thanks! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, May be I misunderstood ur question. You could do this: res<-aggregate(.~group,data=data1,mean) res$gender<-data1$gender[match(res$gender,as.numeric(data1$gender))] ?res #? group???????? x gender #1???? 1 -1.074343????? m #2???? 2? 1.750686????? f A.K. ----- Original Message ----- From: Martin Batholdy <batholdy at googlemail.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, January 11, 2013 10:07 AM Subject: [R] aggregate data.frame based on column class Hi, When using the aggregate function to aggregate a data.frame by one or more grouping variables I often have the problem, that I want the mean for some numeric variables but the unique value for factor variables. So for example in this data-frame: data <- data.frame(x = rnorm(10,1,2), group = c(rep(1,5), rep(2,5)), gender =c(rep('m',5), rep('f',5))) aggregate(data, by=list(data$group), FUN=mean) I would like to have 'm' and 'f' in the third column, not NA. I see the problem, that it could happen that there is no unique factor level in a group ? but is there an alternative function who at least tries what I am aiming at? That is; "aggregate the data.frame by a list of grouping variables, for numeric variables compute the mean, for factor variables return the unique factor value" Thanks! ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.