Dear all, Apologies for this beginner's question. I have a variable Price, which is associated with factors Season and Crop, each of which have several levels. The Price variable contains missing values (NA), which I want to substitute by the mean of the remaining (non-NA) Price values of the same Season-Crop combination of levels. Price Crop Season 10 Rice Summer 12 Rice Summer NA Rice Summer 8 Rice Winter 9 Wheat Summer Price[is.na(Price)] gives me the missing values, and by(Price, list(Crop, Season), mean, na.rm = T) the values I want to impute. What I've not been able to figure out, by looking at by and the various incarnations of apply, is how to do the actual substitution. Any help would be much appreciated. Jan Smit
How about the following code below? Price[is.na(price)] = mean(Price[-which(is.na(price))]); HTH Manoj -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jan Smit Sent: Wednesday, September 01, 2004 5:44 PM To: R-help at stat.math.ethz.ch Subject: [R] Imputing missing values Dear all, Apologies for this beginner's question. I have a variable Price, which is associated with factors Season and Crop, each of which have several levels. The Price variable contains missing values (NA), which I want to substitute by the mean of the remaining (non-NA) Price values of the same Season-Crop combination of levels. Price Crop Season 10 Rice Summer 12 Rice Summer NA Rice Summer 8 Rice Winter 9 Wheat Summer Price[is.na(Price)] gives me the missing values, and by(Price, list(Crop, Season), mean, na.rm = T) the values I want to impute. What I've not been able to figure out, by looking at by and the various incarnations of apply, is how to do the actual substitution. Any help would be much appreciated. Jan Smit ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi Jan, you could try the following: dat <- data.frame(Price=c(10,12,NA,8,7,9,NA,9,NA), Crop=c(rep("Rise", 5), rep("Wheat", 4)), Season=c(rep("Summer", 3), rep("Winter", 4), rep("Summer", 2))) ###### dat <- dat[order(dat$Season, dat$Crop),] dat$Price.imp <- unlist(tapply(dat$Price, list(dat$Crop, dat$Season), function(x){ mx <- mean(x, na.rm=TRUE) ifelse(is.na(x), mx, x) })) dat However, you should be careful using this imputation technique since you don't take into account the extra variability of imputing new values in your data set. I don't know what analysis are you planning to do but in any case I would recommend to read some standard references for missing values, e.g., Little, R. and Rubin, D. (2002). Statistical Analysis with Missing Data, New York: Wiley. I hope this helps. Best, Dimitris ---- Dimitris Rizopoulos Doctoral Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/16/396887 Fax: +32/16/337015 Web: http://www.med.kuleuven.ac.be/biostat/ http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Jan Smit" <janpsmit at yahoo.co.uk> To: <R-help at stat.math.ethz.ch> Sent: Wednesday, September 01, 2004 10:43 AM Subject: [R] Imputing missing values> Dear all, > > Apologies for this beginner's question. I have a > variable Price, which is associated with factors > Season and Crop, each of which have several levels. > The Price variable contains missing values (NA), which > I want to substitute by the mean of the remaining > (non-NA) Price values of the same Season-Crop > combination of levels. > > Price Crop Season > 10 Rice Summer > 12 Rice Summer > NA Rice Summer > 8 Rice Winter > 9 Wheat Summer > > Price[is.na(Price)] gives me the missing values, and > by(Price, list(Crop, Season), mean, na.rm = T) the > values I want to impute. What I've not been able to > figure out, by looking at by and the various > incarnations of apply, is how to do the actual > substitution. > > Any help would be much appreciated. > > Jan Smit > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide!http://www.R-project.org/posting-guide.html
Try this: newPrice = unlist(sapply(Price, Crop:Season, function(x){ x[is.na(x)]=mean(x,na.rm=T); return(x); })) --- Jan Smit <janpsmit at yahoo.co.uk> wrote:> Dear all, > > Apologies for this beginner's question. I have a > variable Price, which is associated with factors > Season and Crop, each of which have several levels. > The Price variable contains missing values (NA), > which > I want to substitute by the mean of the remaining > (non-NA) Price values of the same Season-Crop > combination of levels. > > Price Crop Season > 10 Rice Summer > 12 Rice Summer > NA Rice Summer > 8 Rice Winter > 9 Wheat Summer > > Price[is.na(Price)] gives me the missing values, and > by(Price, list(Crop, Season), mean, na.rm = T) the > values I want to impute. What I've not been able to > figure out, by looking at by and the various > incarnations of apply, is how to do the actual > substitution. > > Any help would be much appreciated. > > Jan Smit > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >