Hi there, I am looking for some help replacing missing values in R with the row mean. This is survey data and I am trying to impute values for missing variables in each set of questions separately using the mean of the scores for the other questions within that set. I have a dataset that looks like this ID A1 A2 A3 B1 B2 B3 C1 C2 C3 C4 b 4 5 NA 2 NA 4 5 1 3 NA c 4 5 1 NA 3 4 5 1 3 2 d NA 5 1 1 NA 4 5 1 3 2 e 4 5 4 5 NA 4 5 1 3 2 I want to replace any NA's in columns A1:A3 with the row mean for those columns only. So for ID=b, I want the NA in A3[ID=b] to be (4+5)/2 which is the average of the values in A1 and A2 for that row. Same thing for columns B1:B3 - I want the NA in B2[ID=b] to be the mean of the values of B1 and B3 in row ID=b so that B2[ID=b] becomes 3 which is (2+4)/2. And same in C1:C4, I want C4[ID=b] to become (5+1+3)/3 which is the mean of C1:C3. Then I want to go to row ID=c and do the same thing and so on. Can anybody help me do this? I have tried using rowMeans and subsetting but can't figure out the right code to do it. Thanks so much. Zahra
Hi Zahra, I can't think of an "apply" function that will do this, but: Zdf<-read.table(text="ID A1 A2 A3 B1 B2 B3 C1 C2 C3 C4 b 4 5 NA 2 NA 4 5 1 3 NA c 4 5 1 NA 3 4 5 1 3 2 d NA 5 1 1 NA 4 5 1 3 2 e 4 5 4 5 NA 4 5 1 3 2", header=TRUE) Zdf replace_NAs<-function(x,group_lab=c("A","B","C")) { for(lab in group_lab) { indices<-grep(lab,names(x),fixed=TRUE) na_indices<-is.na(x[indices]) if(any(indices)) x[indices][na_indices]<-rowMeans(x[indices],na.rm=TRUE) } return(x) } for(row in 1:dim(Zdf)[1]) Zdf[row,]<-replace_NAs(Zdf[row,]) Zdf Jim On Tue, Nov 3, 2015 at 6:49 AM, Zahra via R-help <r-help at r-project.org> wrote:> Hi there, > > I am looking for some help replacing missing values in R with the row > mean. This is survey data and I am trying to impute values for missing > variables in each set of questions separately using the mean of the scores > for the other questions within that set. > > I have a dataset that looks like this > > ID A1 A2 A3 B1 B2 B3 C1 C2 C3 C4 > b 4 5 NA 2 NA 4 5 1 > 3 NA > c 4 5 1 NA 3 4 5 1 > 3 2 > d NA 5 1 1 NA 4 5 1 > 3 2 > e 4 5 4 5 NA 4 5 1 > 3 2 > > > I want to replace any NA's in columns A1:A3 with the row mean for those > columns only. So for ID=b, I want the NA in A3[ID=b] to be (4+5)/2 which is > the average of the values in A1 and A2 for that row. > Same thing for columns B1:B3 - I want the NA in B2[ID=b] to be the mean of > the values of B1 and B3 in row ID=b so that B2[ID=b] becomes 3 which is > (2+4)/2. And same in C1:C4, I want C4[ID=b] to become (5+1+3)/3 which is > the mean of C1:C3. > > Then I want to go to row ID=c and do the same thing and so on. > > Can anybody help me do this? I have tried using rowMeans and subsetting > but can't figure out the right code to do it. > > Thanks so much. > Zahra > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi again, Small typo in line 5 - should be replace_NAs<-function(x,group_lab=c("A","B","C")) { for(lab in group_lab) { indices<-grep(lab,names(x),fixed=TRUE) na_indices<-is.na(x[indices]) if(any(na_indices)) x[indices][na_indices]<-rowMeans(x[indices],na.rm=TRUE) } return(x) } Jim [[alternative HTML version deleted]]
Excerpts from Zahra via R-help's message of 2015-11-02 17:49:01 -0200:> Hi there, > > I am looking for some help replacing missing values in R with the row mean. This is survey data and I am trying to impute values for missing variables in each set of questions separately using the mean of the scores for the other questions within that set. > > I have a dataset that looks like this > > ID A1 A2 A3 B1 B2 B3 C1 C2 C3 C4 > b 4 5 NA 2 NA 4 5 1 3 NA > c 4 5 1 NA 3 4 5 1 3 2 > d NA 5 1 1 NA 4 5 1 3 2 > e 4 5 4 5 NA 4 5 1 3 2 > > > I want to replace any NA's in columns A1:A3 with the row mean for those columns only. So for ID=b, I want the NA in A3[ID=b] to be (4+5)/2 which is the average of the values in A1 and A2 for that row. > Same thing for columns B1:B3 - I want the NA in B2[ID=b] to be the mean of the values of B1 and B3 in row ID=b so that B2[ID=b] becomes 3 which is (2+4)/2. And same in C1:C4, I want C4[ID=b] to become (5+1+3)/3 which is the mean of C1:C3. > > Then I want to go to row ID=c and do the same thing and so on. > > Can anybody help me do this? I have tried using rowMeans and subsetting but can't figure out the right code to do it. > > Thanks so much. > Zahra >use is.na(df[ which(df$ID) == 'b']) <- fmean(...), where fmean: Depends on column selection (Axx, Byy, etc..) and the row id itself (so consider pass the left hand of assignment entirely). I would use: fmean <- function(row, col_selection) { # homework for you here } Best Regards, -- Marco Arthur @ (M)arco Creatives