Mao Jianfeng
2009-Jun-09 01:56 UTC
[R] how to substitute missing values (NAs) by the group means
Dear Ruser's I ask for helps on how to substitute missing values (NAs) by mean of the group it is belonging to. my dummy dataframe is:> dfgroup traits 1 BSPy01-10 NA 2 BSPy01-10 7.3 3 BSPy01-10 7.3 4 BSPy01-11 5.3 5 BSPy01-11 5.4 6 BSPy01-11 5.6 7 BSPy01-11 NA 8 BSPy01-11 NA 9 BSPy01-11 4.8 10 BSPy01-12 8.1 11 BSPy01-12 6.0 12 BSPy01-12 6.0 13 BSPy01-13 6.1 I want to substitute each "NA" by the group mean of which the "NA" is belonging to. For example, substitute the first record of traits "NA" by the mean of "BSPy01-10". I have ever tried to solve this problem by using doBy package. But, I failed. I ask for the right solutions by using doBy package or not. The commands used and the output I got are as followed: library(doBy) df<-orderBy(~group,data=df) # succeeded f1<-function(x){m<-mean(x, na.ram=TRUE); x[is.na(x)]<-m; x} # succeeded datatraits<-lapplyBy(traits~group,data=df, FUN=f1(traits)) # failed errors: mean(x, na.ram = TRUE), can not find 'traits'. Thanks in advance. Sincerely, Mao J-F [[alternative HTML version deleted]]
Henrique Dallazuanna
2009-Jun-09 02:24 UTC
[R] how to substitute missing values (NAs) by the group means
Try this: d$traits[is.na(d$traits)] <- ave(d$traits, d$group, FUN=function(x)mean(x, na.rm = T))[is.na(d$traits)] On 6/8/09, Mao Jianfeng <jianfeng.mao at gmail.com> wrote:> Dear Ruser's > > I ask for helps on how to substitute missing values (NAs) by mean of the > group it is belonging to. > > my dummy dataframe is: > >> df > group traits > 1 BSPy01-10 NA > 2 BSPy01-10 7.3 > 3 BSPy01-10 7.3 > 4 BSPy01-11 5.3 > 5 BSPy01-11 5.4 > 6 BSPy01-11 5.6 > 7 BSPy01-11 NA > 8 BSPy01-11 NA > 9 BSPy01-11 4.8 > 10 BSPy01-12 8.1 > 11 BSPy01-12 6.0 > 12 BSPy01-12 6.0 > 13 BSPy01-13 6.1 > > > I want to substitute each "NA" by the group mean of which the "NA" is > belonging to. For example, substitute the first record of traits "NA" by the > mean of "BSPy01-10". > > I have ever tried to solve this problem by using doBy package. But, I > failed. I ask for the right solutions by using doBy package or not. > > The commands used and the output I got are as followed: > > library(doBy) > df<-orderBy(~group,data=df) # succeeded > f1<-function(x){m<-mean(x, na.ram=TRUE); x[is.na(x)]<-m; x} # succeeded > datatraits<-lapplyBy(traits~group,data=df, FUN=f1(traits)) # failed > errors: mean(x, na.ram = TRUE), can not find 'traits'. > > Thanks in advance. > > Sincerely, > > Mao J-F > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
David Winsemius
2009-Jun-09 02:39 UTC
[R] how to substitute missing values (NAs) by the group means
On Jun 8, 2009, at 9:56 PM, Mao Jianfeng wrote:> Dear Ruser's > > I ask for helps on how to substitute missing values (NAs) by mean of > the > group it is belonging to. > > my dummy dataframe is: > >> df > group traits > 1 BSPy01-10 NA > 2 BSPy01-10 7.3 > 3 BSPy01-10 7.3 > 4 BSPy01-11 5.3 > 5 BSPy01-11 5.4 > 6 BSPy01-11 5.6 > 7 BSPy01-11 NA > 8 BSPy01-11 NA > 9 BSPy01-11 4.8 > 10 BSPy01-12 8.1 > 11 BSPy01-12 6.0 > 12 BSPy01-12 6.0 > 13 BSPy01-13 6.1 > > > I want to substitute each "NA" by the group mean of which the "NA" is > belonging to. For example, substitute the first record of traits > "NA" by the > mean of "BSPy01-10". > > I have ever tried to solve this problem by using doBy package. But, I > failed. I ask for the right solutions by using doBy package or not.This should replace any NA by the mean with the group, or the non-NA value: as.numeric(apply(df, 1, function (x) ifelse( is.na(x[2]), tapply(df$traits, df $group, mean, na.rm=TRUE)[x[1]] , x[2] ) ) ) [1] 7.300 7.300 7.300 5.300 5.400 5.600 5.275 5.275 4.800 8.100 6.000 6.000 6.100 Whether that is the "right solution" depends on your artistic standards. If you accept that solution, you would execute: df$traits <- <the above expression> Another approach only replacing the NA's, rather than the whole column: df[is.na(df$traits), "traits"] <- tapply(df$traits, df$group, mean, na.rm=TRUE)[ df[is.na(df$traits),"group"] ]> > > The commands used and the output I got are as followed: > > library(doBy) > df<-orderBy(~group,data=df) # succeeded > f1<-function(x){m<-mean(x, na.ram=TRUE); x[is.na(x)]<-m; x} # > succeeded > datatraits<-lapplyBy(traits~group,data=df, FUN=f1(traits)) # failed > errors: mean(x, na.ram = TRUE), can not find 'traits'.-- David Winsemius, MD Heritage Laboratories West Hartford, CT
Jorge Ivan Velez
2009-Jun-09 02:49 UTC
[R] how to substitute missing values (NAs) by the group means
Dear Mao, Here is another way: yourdata$traits2 <- with(yourdata, do.call(c, tapply(traits, group, function(y){ ym <- mean(y,na.rm=TRUE) y[is.na(y)]<- ym y } ))) HTH, Jorge On Mon, Jun 8, 2009 at 9:56 PM, Mao Jianfeng <jianfeng.mao@gmail.com> wrote:> Dear Ruser's > > I ask for helps on how to substitute missing values (NAs) by mean of the > group it is belonging to. > > my dummy dataframe is: > > > df > group traits > 1 BSPy01-10 NA > 2 BSPy01-10 7.3 > 3 BSPy01-10 7.3 > 4 BSPy01-11 5.3 > 5 BSPy01-11 5.4 > 6 BSPy01-11 5.6 > 7 BSPy01-11 NA > 8 BSPy01-11 NA > 9 BSPy01-11 4.8 > 10 BSPy01-12 8.1 > 11 BSPy01-12 6.0 > 12 BSPy01-12 6.0 > 13 BSPy01-13 6.1 > > > I want to substitute each "NA" by the group mean of which the "NA" is > belonging to. For example, substitute the first record of traits "NA" by > the > mean of "BSPy01-10". > > I have ever tried to solve this problem by using doBy package. But, I > failed. I ask for the right solutions by using doBy package or not. > > The commands used and the output I got are as followed: > > library(doBy) > df<-orderBy(~group,data=df) # succeeded > f1<-function(x){m<-mean(x, na.ram=TRUE); x[is.na(x)]<-m; x} # succeeded > datatraits<-lapplyBy(traits~group,data=df, FUN=f1(traits)) # failed > errors: mean(x, na.ram = TRUE), can not find 'traits'. > > Thanks in advance. > > Sincerely, > > Mao J-F > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
hadley wickham
2009-Jun-09 03:10 UTC
[R] how to substitute missing values (NAs) by the group means
On Mon, Jun 8, 2009 at 8:56 PM, Mao Jianfeng<jianfeng.mao at gmail.com> wrote:> Dear Ruser's > > I ask for helps on how to substitute missing values (NAs) by mean of the > group it is belonging to. > > my dummy dataframe is: > >> df > ? ? ? group traits > 1 ?BSPy01-10 ? ? NA > 2 ?BSPy01-10 ? ?7.3 > 3 ?BSPy01-10 ? ?7.3 > 4 ?BSPy01-11 ? ?5.3 > 5 ?BSPy01-11 ? ?5.4 > 6 ?BSPy01-11 ? ?5.6 > 7 ?BSPy01-11 ? ? NA > 8 ?BSPy01-11 ? ? NA > 9 ?BSPy01-11 ? ?4.8 > 10 BSPy01-12 ? ?8.1 > 11 BSPy01-12 ? ?6.0 > 12 BSPy01-12 ? ?6.0 > 13 BSPy01-13 ? ?6.1 > > > I want to substitute each "NA" by the group mean of which the "NA" is > belonging to. For example, substitute the first record of traits "NA" by the > mean of "BSPy01-10".Here's yet another way, using the plyr package, http://had.co.nz/ library(plyr) impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)) ddply(df, ~ group, transform, traits = impute.mean(traits)) Or if you wanted to make it a little more generic impute <- function(x, fun) { missing <- is.na(x) replace(x, missing, fun(x[!missing])) } ddply(df, ~ group, transform, traits = impute(traits, mean)) ddply(df, ~ group, transform, traits = impute(traits, median)) ddply(df, ~ group, transform, traits = impute(traits, min)) Hadley -- http://had.co.nz/
Possibly Parallel Threads
- how to use "lapplyBy" function of "doBy" package
- How to replace outliers by group median?
- Why are there small circles in my plot
- how to fill the area under the density line with semitransparent colors
- how to calculate the consistency of different clusterings