thr3ads.net - R help - [R] replace NA's with row means for specific columns [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Zahra

2015-Nov-02 19:49 UTC

[R] replace NA's with row means for specific columns

Hi there,

I am looking for some help replacing missing values in R with the row mean. This
is survey data and I am trying to impute values for missing variables in each
set of questions separately using the mean of the scores for the other questions
within that set.

I have a dataset that looks like this

ID      A1    A2    A3          B1     B2     B3         C1   C2   C3    C4
b        4       5      NA          2       NA      4          5      1        3
NA
c        4       5      1            NA      3        4          5      1       
3      2
d       NA     5      1            1        NA      4          5      1        3
2
e        4       5      4            5       NA      4           5      1       
3      2


I want to replace any NA's in columns A1:A3 with the row mean for those
columns only. So for ID=b, I want the NA in A3[ID=b] to be (4+5)/2 which is the
average of the values in A1 and A2 for that row.
Same thing for columns B1:B3 - I want the NA in B2[ID=b] to be the mean of the
values of B1 and B3 in row ID=b so that B2[ID=b] becomes 3 which is (2+4)/2. And
same in C1:C4, I want C4[ID=b] to become (5+1+3)/3 which is the mean of C1:C3.

Then I want to go to row ID=c and do the same thing and so on.

Can anybody help me do this? I have tried using rowMeans and subsetting but
can't figure out the right code to do it.

Thanks so much.
Zahra

Jim Lemon

2015-Nov-02 23:26 UTC

head link

[R] replace NA's with row means for specific columns

Hi Zahra,
I can't think of an "apply" function that will do this, but:

Zdf<-read.table(text="ID  A1 A2 A3   B1 B2  B3   C1 C2  C3   C4
b      4     5      NA        2       NA      4         5      1        3
   NA
c      4     5      1          NA      3        4         5      1        3
     2
d     NA   5      1          1        NA      4         5      1        3
   2
e      4     5      4          5       NA      4          5      1        3
     2",
header=TRUE)

Zdf

replace_NAs<-function(x,group_lab=c("A","B","C"))
{
 for(lab in group_lab) {
  indices<-grep(lab,names(x),fixed=TRUE)
  na_indices<-is.na(x[indices])
  if(any(indices))
   x[indices][na_indices]<-rowMeans(x[indices],na.rm=TRUE)
 }
 return(x)
}

for(row in 1:dim(Zdf)[1]) Zdf[row,]<-replace_NAs(Zdf[row,])

Zdf

Jim

On Tue, Nov 3, 2015 at 6:49 AM, Zahra via R-help <r-help at r-project.org>
wrote:
> Hi there,
>
> I am looking for some help replacing missing values in R with the row
> mean. This is survey data and I am trying to impute values for missing
> variables in each set of questions separately using the mean of the scores
> for the other questions within that set.
>
> I have a dataset that looks like this
>
> ID      A1    A2    A3          B1     B2     B3         C1   C2   C3    C4
> b        4       5      NA          2       NA      4          5      1
>     3      NA
> c        4       5      1            NA      3        4          5      1
>       3      2
> d       NA     5      1            1        NA      4          5      1
>     3      2
> e        4       5      4            5       NA      4           5      1
>       3      2
>
>
> I want to replace any NA's in columns A1:A3 with the row mean for those
> columns only. So for ID=b, I want the NA in A3[ID=b] to be (4+5)/2 which is
> the average of the values in A1 and A2 for that row.
> Same thing for columns B1:B3 - I want the NA in B2[ID=b] to be the mean of
> the values of B1 and B3 in row ID=b so that B2[ID=b] becomes 3 which is
> (2+4)/2. And same in C1:C4, I want C4[ID=b] to become (5+1+3)/3 which is
> the mean of C1:C3.
>
> Then I want to go to row ID=c and do the same thing and so on.
>
> Can anybody help me do this? I have tried using rowMeans and subsetting
> but can't figure out the right code to do it.
>
> Thanks so much.
> Zahra
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jim Lemon

2015-Nov-02 23:33 UTC

head link

[R] replace NA's with row means for specific columns

Hi again,
Small typo in line 5 - should be

replace_NAs<-function(x,group_lab=c("A","B","C"))
{
 for(lab in group_lab) {
  indices<-grep(lab,names(x),fixed=TRUE)
  na_indices<-is.na(x[indices])
  if(any(na_indices))
   x[indices][na_indices]<-rowMeans(x[indices],na.rm=TRUE)
 }
 return(x)
}

Jim

	[[alternative HTML version deleted]]

Marco

2015-Nov-12 16:19 UTC

head link

[R] replace NA's with row means for specific columns

Excerpts from Zahra via R-help's message of 2015-11-02 17:49:01
-0200:> Hi there,
> 
> I am looking for some help replacing missing values in R with the row mean.
This is survey data and I am trying to impute values for missing variables in
each set of questions separately using the mean of the scores for the other
questions within that set.
> 
> I have a dataset that looks like this
> 
> ID      A1    A2    A3          B1     B2     B3         C1   C2   C3    C4
> b        4       5      NA          2       NA      4          5      1    
3      NA
> c        4       5      1            NA      3        4          5      1  
3      2
> d       NA     5      1            1        NA      4          5      1    
3      2
> e        4       5      4            5       NA      4           5      1  
3      2
> 
> 
> I want to replace any NA's in columns A1:A3 with the row mean for those
columns only. So for ID=b, I want the NA in A3[ID=b] to be (4+5)/2 which is the
average of the values in A1 and A2 for that row.
> Same thing for columns B1:B3 - I want the NA in B2[ID=b] to be the mean of
the values of B1 and B3 in row ID=b so that B2[ID=b] becomes 3 which is (2+4)/2.
And same in C1:C4, I want C4[ID=b] to become (5+1+3)/3 which is the mean of
C1:C3.
> 
> Then I want to go to row ID=c and do the same thing and so on.
> 
> Can anybody help me do this? I have tried using rowMeans and subsetting but
can't figure out the right code to do it.
> 
> Thanks so much.
> Zahra
> use 

is.na(df[ which(df$ID) == 'b']) <- fmean(...), where fmean:

Depends on column selection (Axx, Byy, etc..) and the row id itself (so consider
pass
the left hand of assignment entirely). I would use:

fmean <- function(row, col_selection) { # homework for you here }

Best Regards,

-- 
Marco Arthur @ (M)arco Creatives

R help - Nov 2015 - replace NA's with row means for specific columns

[R] replace NA's with row means for specific columns

[R] replace NA's with row means for specific columns

[R] replace NA's with row means for specific columns

[R] replace NA's with row means for specific columns