Dear all, Apologies for this beginner's question. I have a variable Price, which is associated with factors Season and Crop, each of which have several levels. The Price variable contains missing values (NA), which I want to substitute by the mean of the remaining (non-NA) Price values of the same Season-Crop combination of levels. Price Crop Season 10 Rice Summer 12 Rice Summer NA Rice Summer 8 Rice Winter 9 Wheat Summer Price[is.na(Price)] gives me the missing values, and by(Price, list(Crop, Season), mean, na.rm = T) the values I want to impute. What I've not been able to figure out, by looking at by and the various incarnations of apply, is how to do the actual substitution. Any help would be much appreciated. Jan Smit
How about the following code below? Price[is.na(price)] = mean(Price[-which(is.na(price))]); HTH Manoj -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jan Smit Sent: Wednesday, September 01, 2004 5:44 PM To: R-help at stat.math.ethz.ch Subject: [R] Imputing missing values Dear all, Apologies for this beginner's question. I have a variable Price, which is associated with factors Season and Crop, each of which have several levels. The Price variable contains missing values (NA), which I want to substitute by the mean of the remaining (non-NA) Price values of the same Season-Crop combination of levels. Price Crop Season 10 Rice Summer 12 Rice Summer NA Rice Summer 8 Rice Winter 9 Wheat Summer Price[is.na(Price)] gives me the missing values, and by(Price, list(Crop, Season), mean, na.rm = T) the values I want to impute. What I've not been able to figure out, by looking at by and the various incarnations of apply, is how to do the actual substitution. Any help would be much appreciated. Jan Smit ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Hi Jan,
you could try the following:
dat <- data.frame(Price=c(10,12,NA,8,7,9,NA,9,NA),
Crop=c(rep("Rise", 5), rep("Wheat", 4)),
Season=c(rep("Summer", 3), rep("Winter",
4),
rep("Summer", 2)))
######
dat <- dat[order(dat$Season, dat$Crop),]
dat$Price.imp <- unlist(tapply(dat$Price, list(dat$Crop, dat$Season),
function(x){
mx <- mean(x, na.rm=TRUE)
ifelse(is.na(x), mx, x)
}))
dat
However, you should be careful using this imputation technique since
you don't take into account the extra variability of imputing new
values in your data set. I don't know what analysis are you planning
to do but in any case I would recommend to read some standard
references for missing values, e.g., Little, R. and Rubin, D. (2002).
Statistical Analysis with Missing Data, New York: Wiley.
I hope this helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Doctoral Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Jan Smit" <janpsmit at yahoo.co.uk>
To: <R-help at stat.math.ethz.ch>
Sent: Wednesday, September 01, 2004 10:43 AM
Subject: [R] Imputing missing values
> Dear all,
>
> Apologies for this beginner's question. I have a
> variable Price, which is associated with factors
> Season and Crop, each of which have several levels.
> The Price variable contains missing values (NA), which
> I want to substitute by the mean of the remaining
> (non-NA) Price values of the same Season-Crop
> combination of levels.
>
> Price Crop Season
> 10 Rice Summer
> 12 Rice Summer
> NA Rice Summer
> 8 Rice Winter
> 9 Wheat Summer
>
> Price[is.na(Price)] gives me the missing values, and
> by(Price, list(Crop, Season), mean, na.rm = T) the
> values I want to impute. What I've not been able to
> figure out, by looking at by and the various
> incarnations of apply, is how to do the actual
> substitution.
>
> Any help would be much appreciated.
>
> Jan Smit
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
Try this:
newPrice = unlist(sapply(Price, Crop:Season,
function(x){
x[is.na(x)]=mean(x,na.rm=T);
return(x);
}))
--- Jan Smit <janpsmit at yahoo.co.uk> wrote:
> Dear all,
>
> Apologies for this beginner's question. I have a
> variable Price, which is associated with factors
> Season and Crop, each of which have several levels.
> The Price variable contains missing values (NA),
> which
> I want to substitute by the mean of the remaining
> (non-NA) Price values of the same Season-Crop
> combination of levels.
>
> Price Crop Season
> 10 Rice Summer
> 12 Rice Summer
> NA Rice Summer
> 8 Rice Winter
> 9 Wheat Summer
>
> Price[is.na(Price)] gives me the missing values, and
> by(Price, list(Crop, Season), mean, na.rm = T) the
> values I want to impute. What I've not been able to
> figure out, by looking at by and the various
> incarnations of apply, is how to do the actual
> substitution.
>
> Any help would be much appreciated.
>
> Jan Smit
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>