thr3ads.net - R help - [R] replace Na values with the mean of the column which contains them [Jul 2013]

If this information is useful, please help other people find it:
Share via:

iza.ch1

2013-Jul-29 16:39 UTC

[R] replace Na values with the mean of the column which contains them

Hi everyone

I have a problem with replacing the NA values with the mean of the column which
contains them. If I replace Na with the means of the rest values in the column,
the mean of the whole column will be still the same as if I would have omitted
NA values. I have the following data

de
     [,1]        [,2]       [,3]
 [1,]          NA -0.26928087 -0.1192078
 [2,]          NA  1.20925752  0.9325334
 [3,]          NA  0.38012008 -1.8927164
 [4,]          NA -0.41778861  1.4330507
 [5,]          NA -0.49677462  0.2892706
 [6,]          NA -0.13248754  1.3976522
 [7,]          NA -0.54179054  0.2295291
 [8,]          NA  0.35788624 -0.5009389
 [9,]  0.27500571 -0.41467591 -0.3426560
[10,] -3.07568579 -0.59234248 -0.8439027
[11,] -0.42240954  0.73642396 -0.4971999
[12,] -0.26901731 -0.06768044 -1.6127122
[13,]  0.01766284 -0.40321968 -0.6508823
[14,] -0.80999580 -1.52283305  1.4729576
[15,]  0.20805934  0.25974308 -1.6093478
[16,]  0.03036708 -0.04013730  0.1686006

and I wrote the code 
de[which(is.na(de))]<-sapply(seq_len(ncol(de)),function(i)
{mean(de[,i],na.rm=TRUE)})

I get as the result 
   [,1]        [,2]       [,3]
 [1,] -0.50575168 -0.26928087 -0.1192078
 [2,] -0.12222376  1.20925752  0.9325334
 [3,] -0.13412312  0.38012008 -1.8927164
 [4,] -0.50575168 -0.41778861  1.4330507
 [5,] -0.12222376 -0.49677462  0.2892706
 [6,] -0.13412312 -0.13248754  1.3976522
 [7,] -0.50575168 -0.54179054  0.2295291
 [8,] -0.12222376  0.35788624 -0.5009389
 [9,]  0.27500571 -0.41467591 -0.3426560
[10,] -3.07568579 -0.59234248 -0.8439027
[11,] -0.42240954  0.73642396 -0.4971999
[12,] -0.26901731 -0.06768044 -1.6127122
[13,]  0.01766284 -0.40321968 -0.6508823
[14,] -0.80999580 -1.52283305  1.4729576
[15,]  0.20805934  0.25974308 -1.6093478
[16,]  0.03036708 -0.04013730  0.1686006

It has replaced the NA values in first column with mean of first column
-0.505... and second cell with mean of second column etc.
I want to have the result like this:
[,1]        [,2]       [,3]
 [1,] -0.50575168 -0.26928087 -0.1192078
 [2,] -0.50575168  1.20925752  0.9325334
 [3,] -0.50575168  0.38012008 -1.8927164
 [4,] -0.50575168 -0.41778861  1.4330507
 [5,] -0.50575168 -0.49677462  0.2892706
 [6,] -0.50575168 -0.13248754  1.3976522
 [7,] -0.50575168 -0.54179054  0.2295291
 [8,] -0.50575168  0.35788624 -0.5009389
 [9,]  0.27500571 -0.41467591 -0.3426560
[10,] -3.07568579 -0.59234248 -0.8439027
[11,] -0.42240954  0.73642396 -0.4971999
[12,] -0.26901731 -0.06768044 -1.6127122
[13,]  0.01766284 -0.40321968 -0.6508823
[14,] -0.80999580 -1.52283305  1.4729576
[15,]  0.20805934  0.25974308 -1.6093478
[16,]  0.03036708 -0.04013730  0.1686006

Thanks in advance

Berend Hasselman

2013-Jul-29 17:27 UTC

head link

[R] replace Na values with the mean of the column which contains them

On 29-07-2013, at 18:39, "iza.ch1" <iza.ch1 at op.pl> wrote:
> Hi everyone
> 
> I have a problem with replacing the NA values with the mean of the column
which contains them. If I replace Na with the means of the rest values in the
column, the mean of the whole column will be still the same as if I would have
omitted NA values. I have the following data
> 
> de
>     [,1]        [,2]       [,3]
> [1,]          NA -0.26928087 -0.1192078
> [2,]          NA  1.20925752  0.9325334
> [3,]          NA  0.38012008 -1.8927164
> [4,]          NA -0.41778861  1.4330507
> [5,]          NA -0.49677462  0.2892706
> [6,]          NA -0.13248754  1.3976522
> [7,]          NA -0.54179054  0.2295291
> [8,]          NA  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> and I wrote the code 
> de[which(is.na(de))]<-sapply(seq_len(ncol(de)),function(i)
{mean(de[,i],na.rm=TRUE)})
> 
> I get as the result 
>   [,1]        [,2]       [,3]
> [1,] -0.50575168 -0.26928087 -0.1192078
> [2,] -0.12222376  1.20925752  0.9325334
> [3,] -0.13412312  0.38012008 -1.8927164
> [4,] -0.50575168 -0.41778861  1.4330507
> [5,] -0.12222376 -0.49677462  0.2892706
> [6,] -0.13412312 -0.13248754  1.3976522
> [7,] -0.50575168 -0.54179054  0.2295291
> [8,] -0.12222376  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> It has replaced the NA values in first column with mean of first column
-0.505... and second cell with mean of second column etc.
> I want to have the result like this:
> [,1]        [,2]       [,3]
> [1,] -0.50575168 -0.26928087 -0.1192078
> [2,] -0.50575168  1.20925752  0.9325334
> [3,] -0.50575168  0.38012008 -1.8927164
> [4,] -0.50575168 -0.41778861  1.4330507
> [5,] -0.50575168 -0.49677462  0.2892706
> [6,] -0.50575168 -0.13248754  1.3976522
> [7,] -0.50575168 -0.54179054  0.2295291
> [8,] -0.50575168  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006

This seems to do what you want:

library(plyr)
de.res <- t(aaply(de,2,.fun=function(x) {x[which(is.na(x))] <-
mean(x,na.rm=TRUE);x}))
dimnames(de.res) <- NULL


Berend

John Fox

2013-Jul-29 17:29 UTC

head link

[R] replace Na values with the mean of the column which contains them

Dear iza.ch1,

I hesitate to say this, because mean imputation is such a bad idea, but it's
easy to do what you want with a loop, rather than puzzling over a
"cleverer" way to accomplish the task. Here's an example using the
Freedman data set in the car package:
> colSums(is.na(Freedman))population   nonwhite    density      crime 
        10          0         10          0 
> means <- colMeans(Freedman, na.rm=TRUE)
> for (j in 1:ncol(Freedman)){+     Freedman[is.na(Freedman[, j]), j] <- means[j]
+ }
> colSums(is.na(Freedman))population   nonwhite    density      crime 
         0          0          0          0 
> colMeans(Freedman)population   nonwhite    density      crime 
1135.99000   10.80273  765.67000 2714.08182 
> meanspopulation   nonwhite    density      crime 
1135.99000   10.80273  765.67000 2714.08182 

Now you should probably think about whether you really want to do this...

Best,
 John

On Mon, 29 Jul 2013 18:39:48 +0200
 "iza.ch1" <iza.ch1 at op.pl> wrote:> Hi everyone
> 
> I have a problem with replacing the NA values with the mean of the column
which contains them. If I replace Na with the means of the rest values in the
column, the mean of the whole column will be still the same as if I would have
omitted NA values. I have the following data
> 
> de
>      [,1]        [,2]       [,3]
>  [1,]          NA -0.26928087 -0.1192078
>  [2,]          NA  1.20925752  0.9325334
>  [3,]          NA  0.38012008 -1.8927164
>  [4,]          NA -0.41778861  1.4330507
>  [5,]          NA -0.49677462  0.2892706
>  [6,]          NA -0.13248754  1.3976522
>  [7,]          NA -0.54179054  0.2295291
>  [8,]          NA  0.35788624 -0.5009389
>  [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> and I wrote the code 
> de[which(is.na(de))]<-sapply(seq_len(ncol(de)),function(i)
{mean(de[,i],na.rm=TRUE)})
> 
> I get as the result 
>    [,1]        [,2]       [,3]
>  [1,] -0.50575168 -0.26928087 -0.1192078
>  [2,] -0.12222376  1.20925752  0.9325334
>  [3,] -0.13412312  0.38012008 -1.8927164
>  [4,] -0.50575168 -0.41778861  1.4330507
>  [5,] -0.12222376 -0.49677462  0.2892706
>  [6,] -0.13412312 -0.13248754  1.3976522
>  [7,] -0.50575168 -0.54179054  0.2295291
>  [8,] -0.12222376  0.35788624 -0.5009389
>  [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> It has replaced the NA values in first column with mean of first column
-0.505... and second cell with mean of second column etc.
> I want to have the result like this:
> [,1]        [,2]       [,3]
>  [1,] -0.50575168 -0.26928087 -0.1192078
>  [2,] -0.50575168  1.20925752  0.9325334
>  [3,] -0.50575168  0.38012008 -1.8927164
>  [4,] -0.50575168 -0.41778861  1.4330507
>  [5,] -0.50575168 -0.49677462  0.2892706
>  [6,] -0.50575168 -0.13248754  1.3976522
>  [7,] -0.50575168 -0.54179054  0.2295291
>  [8,] -0.50575168  0.35788624 -0.5009389
>  [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> Thanks in advance
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jorge I Velez

2013-Jul-29 17:32 UTC

head link

[R] replace Na values with the mean of the column which contains them

Consider the following:

f <- function(x){
m <- mean(x, na.rm = TRUE)
x[is.na(x)] <- m
x
}

apply(de, 2, f)

HTH,
Jorge.-


On Tue, Jul 30, 2013 at 2:39 AM, iza.ch1 <iza.ch1@op.pl> wrote:
> Hi everyone
>
> I have a problem with replacing the NA values with the mean of the column
> which contains them. If I replace Na with the means of the rest values in
> the column, the mean of the whole column will be still the same as if I
> would have omitted NA values. I have the following data
>
> de
>      [,1]        [,2]       [,3]
>  [1,]          NA -0.26928087 -0.1192078
>  [2,]          NA  1.20925752  0.9325334
>  [3,]          NA  0.38012008 -1.8927164
>  [4,]          NA -0.41778861  1.4330507
>  [5,]          NA -0.49677462  0.2892706
>  [6,]          NA -0.13248754  1.3976522
>  [7,]          NA -0.54179054  0.2295291
>  [8,]          NA  0.35788624 -0.5009389
>  [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
>
> and I wrote the code
> de[which(is.na(de))]<-sapply(seq_len(ncol(de)),function(i)
> {mean(de[,i],na.rm=TRUE)})
>
> I get as the result
>    [,1]        [,2]       [,3]
>  [1,] -0.50575168 -0.26928087 -0.1192078
>  [2,] -0.12222376  1.20925752  0.9325334
>  [3,] -0.13412312  0.38012008 -1.8927164
>  [4,] -0.50575168 -0.41778861  1.4330507
>  [5,] -0.12222376 -0.49677462  0.2892706
>  [6,] -0.13412312 -0.13248754  1.3976522
>  [7,] -0.50575168 -0.54179054  0.2295291
>  [8,] -0.12222376  0.35788624 -0.5009389
>  [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
>
> It has replaced the NA values in first column with mean of first column
> -0.505... and second cell with mean of second column etc.
> I want to have the result like this:
> [,1]        [,2]       [,3]
>  [1,] -0.50575168 -0.26928087 -0.1192078
>  [2,] -0.50575168  1.20925752  0.9325334
>  [3,] -0.50575168  0.38012008 -1.8927164
>  [4,] -0.50575168 -0.41778861  1.4330507
>  [5,] -0.50575168 -0.49677462  0.2892706
>  [6,] -0.50575168 -0.13248754  1.3976522
>  [7,] -0.50575168 -0.54179054  0.2295291
>  [8,] -0.50575168  0.35788624 -0.5009389
>  [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
>
> Thanks in advance
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Berend Hasselman

2013-Jul-29 17:33 UTC

head link

[R] replace Na values with the mean of the column which contains them

On 29-07-2013, at 18:39, "iza.ch1" <iza.ch1 at op.pl> wrote:
> Hi everyone
> 
> I have a problem with replacing the NA values with the mean of the column
which contains them. If I replace Na with the means of the rest values in the
column, the mean of the whole column will be still the same as if I would have
omitted NA values. I have the following data
> 
> de
>     [,1]        [,2]       [,3]
> [1,]          NA -0.26928087 -0.1192078
> [2,]          NA  1.20925752  0.9325334
> [3,]          NA  0.38012008 -1.8927164
> [4,]          NA -0.41778861  1.4330507
> [5,]          NA -0.49677462  0.2892706
> [6,]          NA -0.13248754  1.3976522
> [7,]          NA -0.54179054  0.2295291
> [8,]          NA  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> and I wrote the code 
> de[which(is.na(de))]<-sapply(seq_len(ncol(de)),function(i)
{mean(de[,i],na.rm=TRUE)})
> 
> I get as the result 
>   [,1]        [,2]       [,3]
> [1,] -0.50575168 -0.26928087 -0.1192078
> [2,] -0.12222376  1.20925752  0.9325334
> [3,] -0.13412312  0.38012008 -1.8927164
> [4,] -0.50575168 -0.41778861  1.4330507
> [5,] -0.12222376 -0.49677462  0.2892706
> [6,] -0.13412312 -0.13248754  1.3976522
> [7,] -0.50575168 -0.54179054  0.2295291
> [8,] -0.12222376  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> It has replaced the NA values in first column with mean of first column
-0.505... and second cell with mean of second column etc.
> I want to have the result like this:
> [,1]        [,2]       [,3]
> [1,] -0.50575168 -0.26928087 -0.1192078
> [2,] -0.50575168  1.20925752  0.9325334
> [3,] -0.50575168  0.38012008 -1.8927164
> [4,] -0.50575168 -0.41778861  1.4330507
> [5,] -0.50575168 -0.49677462  0.2892706
> [6,] -0.50575168 -0.13248754  1.3976522
> [7,] -0.50575168 -0.54179054  0.2295291
> [8,] -0.50575168  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
or this:

apply(de,2, function(x) {x[which(is.na(x))] <- mean(x,na.rm=TRUE);x})


Berend

arun

2013-Jul-29 17:57 UTC

head link

[R] replace Na values with the mean of the column which contains them

Hi,

de<- structure(c(NA, NA, NA, NA, NA, NA, NA, NA, 0.27500571, -3.07568579, 
-0.42240954, -0.26901731, 0.01766284, -0.8099958, 0.20805934, 
0.03036708, -0.26928087, 1.20925752, 0.38012008, -0.41778861, 
-0.49677462, -0.13248754, -0.54179054, 0.35788624, -0.41467591, 
-0.59234248, 0.73642396, -0.06768044, -0.40321968, -1.52283305, 
0.25974308, -0.0401373, -0.1192078, 0.9325334, -1.8927164, 1.4330507, 
0.2892706, 1.3976522, 0.2295291, -0.5009389, -0.342656, -0.8439027, 
-0.4971999, -1.6127122, -0.6508823, 1.4729576, -1.6093478, 0.1686006
), .Dim = c(16L, 3L))


Your code should be:
sapply(seq_len(ncol(de)),function(i)
{de[,i][is.na(de[,i])]<-mean(de[,i],na.rm=TRUE);de[,i]})
A.K.




Hi everyone 

I have a problem with replacing the NA values with the mean of 
the column which contains them. If I replace Na with the means of the 
rest values in the column, the mean of the whole column will be still 
the same as if I would have omitted NA values. I have the following data 

de 
? ? ?[,1] ? ? ? ?[,2] ? ? ? [,3] 
?[1,] ? ? ? ? ?NA -0.26928087 -0.1192078 
?[2,] ? ? ? ? ?NA ?1.20925752 ?0.9325334 
?[3,] ? ? ? ? ?NA ?0.38012008 -1.8927164 
?[4,] ? ? ? ? ?NA -0.41778861 ?1.4330507 
?[5,] ? ? ? ? ?NA -0.49677462 ?0.2892706 
?[6,] ? ? ? ? ?NA -0.13248754 ?1.3976522 
?[7,] ? ? ? ? ?NA -0.54179054 ?0.2295291 
?[8,] ? ? ? ? ?NA ?0.35788624 -0.5009389 
?[9,] ?0.27500571 -0.41467591 -0.3426560 
[10,] -3.07568579 -0.59234248 -0.8439027 
[11,] -0.42240954 ?0.73642396 -0.4971999 
[12,] -0.26901731 -0.06768044 -1.6127122 
[13,] ?0.01766284 -0.40321968 -0.6508823 
[14,] -0.80999580 -1.52283305 ?1.4729576 
[15,] ?0.20805934 ?0.25974308 -1.6093478 
[16,] ?0.03036708 -0.04013730 ?0.1686006 

and I wrote the code 
de[which(is.na(de))]<-sapply(seq_len(ncol(de)),function(i)
{mean(de[,i],na.rm=TRUE)})

I get as the result 
? ?[,1] ? ? ? ?[,2] ? ? ? [,3] 
?[1,] -0.50575168 -0.26928087 -0.1192078 
?[2,] -0.12222376 ?1.20925752 ?0.9325334 
?[3,] -0.13412312 ?0.38012008 -1.8927164 
?[4,] -0.50575168 -0.41778861 ?1.4330507 
?[5,] -0.12222376 -0.49677462 ?0.2892706 
?[6,] -0.13412312 -0.13248754 ?1.3976522 
?[7,] -0.50575168 -0.54179054 ?0.2295291 
?[8,] -0.12222376 ?0.35788624 -0.5009389 
?[9,] ?0.27500571 -0.41467591 -0.3426560 
[10,] -3.07568579 -0.59234248 -0.8439027 
[11,] -0.42240954 ?0.73642396 -0.4971999 
[12,] -0.26901731 -0.06768044 -1.6127122 
[13,] ?0.01766284 -0.40321968 -0.6508823 
[14,] -0.80999580 -1.52283305 ?1.4729576 
[15,] ?0.20805934 ?0.25974308 -1.6093478 
[16,] ?0.03036708 -0.04013730 ?0.1686006 

It has replaced the NA values in first column with mean of first
 column -0.505... and second cell with mean of second column etc. 
I want to have the result like this: 
[,1] ? ? ? ?[,2] ? ? ? [,3] 
?[1,] -0.50575168 -0.26928087 -0.1192078 
?[2,] -0.50575168 ?1.20925752 ?0.9325334 
?[3,] -0.50575168 ?0.38012008 -1.8927164 
?[4,] -0.50575168 -0.41778861 ?1.4330507 
?[5,] -0.50575168 -0.49677462 ?0.2892706 
?[6,] -0.50575168 -0.13248754 ?1.3976522 
?[7,] -0.50575168 -0.54179054 ?0.2295291 
?[8,] -0.50575168 ?0.35788624 -0.5009389 
?[9,] ?0.27500571 -0.41467591 -0.3426560 
[10,] -3.07568579 -0.59234248 -0.8439027 
[11,] -0.42240954 ?0.73642396 -0.4971999 
[12,] -0.26901731 -0.06768044 -1.6127122 
[13,] ?0.01766284 -0.40321968 -0.6508823 
[14,] -0.80999580 -1.52283305 ?1.4729576 
[15,] ?0.20805934 ?0.25974308 -1.6093478 
[16,] ?0.03036708 -0.04013730 ?0.1686006 

Thanks in advance

David Winsemius

2013-Jul-29 18:59 UTC

head link

[R] replace Na values with the mean of the column which contains them

On Jul 29, 2013, at 9:39 AM, iza.ch1 wrote:
> Hi everyone
> 
> I have a problem with replacing the NA values with the mean of the column
which contains them. If I replace Na with the means of the rest values in the
column, the mean of the whole column will be still the same as if I would have
omitted NA values. I have the following data
> 
> de
>     [,1]        [,2]       [,3]
> [1,]          NA -0.26928087 -0.1192078
> [2,]          NA  1.20925752  0.9325334
> [3,]          NA  0.38012008 -1.8927164
> [4,]          NA -0.41778861  1.4330507
> [5,]          NA -0.49677462  0.2892706
> [6,]          NA -0.13248754  1.3976522
> [7,]          NA -0.54179054  0.2295291
> [8,]          NA  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
Why not replace with a result that would have both the same mean and standard
deviation as the existing data?

set.seed(123)
de[,1][is.na(de[,1])] <- rnorm(sum(is.na(de[,1]),  #specify the number of
random values
                               mean(de[,1],na.rm=TRUE), sd(de[,1],na.rm=TRUE ) )
)

-- 
David.
> 
> and I wrote the code 
> de[which(is.na(de))]<-sapply(seq_len(ncol(de)),function(i)
{mean(de[,i],na.rm=TRUE)})
> 
> I get as the result 
>   [,1]        [,2]       [,3]
> [1,] -0.50575168 -0.26928087 -0.1192078
> [2,] -0.12222376  1.20925752  0.9325334
> [3,] -0.13412312  0.38012008 -1.8927164
> [4,] -0.50575168 -0.41778861  1.4330507
> [5,] -0.12222376 -0.49677462  0.2892706
> [6,] -0.13412312 -0.13248754  1.3976522
> [7,] -0.50575168 -0.54179054  0.2295291
> [8,] -0.12222376  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> It has replaced the NA values in first column with mean of first column
-0.505... and second cell with mean of second column etc.
> I want to have the result like this:
> [,1]        [,2]       [,3]
> [1,] -0.50575168 -0.26928087 -0.1192078
> [2,] -0.50575168  1.20925752  0.9325334
> [3,] -0.50575168  0.38012008 -1.8927164
> [4,] -0.50575168 -0.41778861  1.4330507
> [5,] -0.50575168 -0.49677462  0.2892706
> [6,] -0.50575168 -0.13248754  1.3976522
> [7,] -0.50575168 -0.54179054  0.2295291
> [8,] -0.50575168  0.35788624 -0.5009389
> [9,]  0.27500571 -0.41467591 -0.3426560
> [10,] -3.07568579 -0.59234248 -0.8439027
> [11,] -0.42240954  0.73642396 -0.4971999
> [12,] -0.26901731 -0.06768044 -1.6127122
> [13,]  0.01766284 -0.40321968 -0.6508823
> [14,] -0.80999580 -1.52283305  1.4729576
> [15,]  0.20805934  0.25974308 -1.6093478
> [16,]  0.03036708 -0.04013730  0.1686006
> 
> Thanks in advance
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

R help - Jul 2013 - replace Na values with the mean of the column which contains them

[R] replace Na values with the mean of the column which contains them

[R] replace Na values with the mean of the column which contains them

[R] replace Na values with the mean of the column which contains them

[R] replace Na values with the mean of the column which contains them

[R] replace Na values with the mean of the column which contains them

[R] replace Na values with the mean of the column which contains them

[R] replace Na values with the mean of the column which contains them