thr3ads.net - R help - [R] replacing missing values with row average [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Daniel M.

2011-Feb-27 23:25 UTC

[R] replacing missing values with row average

Hello, 

I have some dataset, which i read it from external file using the (data <- 
read.csv("my file location")) and read as a dataframe 
> is(data)[1] "data.frame" "list"       "oldClass"  
"vector"
but i have also converted this into a matrix and tried to apply my code but 
didnt work.

Anyways, suppose i have the following data.


    data <- as.data.frame(matrix(rnorm(100), nrow = 10))

And let's put some missing values

    data[sample(1:10, 3), sample(1:10, 3)] <- NA

I want to replace all NA's by row averages or column averages of my matrix.

I tried to use(with my original data matrix)

    data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
But got an error message of

       Error in rowMeans(data, na.rm = TRUE) : 'x' must be numeric
Then I converted  data<- as.matrix(data)
                  data<- as.numeric(data)
And applying my code

     data[is.na(data)] <- rowMeans(data, na.rm = TRUE)

Error message


      Error in rowMeans(data, na.rm = TRUE) : 
  'x' must be an array of at least two dimensions

Then again i tried to convert it into Arrays....but the errors continues....

I Also tried the code

    data[is.na(data)] <- apply(data,1,mean)

But still didnt work out.

Can anyone pls help me as to how to fix it and get out of this, please?

Thank you very much

Daniel


      
	[[alternative HTML version deleted]]

Joshua Wiley

2011-Feb-28 00:14 UTC

head link

[R] replacing missing values with row average

Hi Daniel,

If your data is stored in a matrix, the following should work (and be
fairly efficient):

#############
dat <- matrix(rnorm(100), nrow = 10)
dat[sample(1:10, 3), sample(1:10, 3)] <- NA
## create an index of missing values
index <- which(is.na(dat), arr.ind = TRUE)
## calculate the row means and "duplicate" them to assign to
appropriate cells
dat[index] <- rowMeans(dat, na.rm = TRUE)[index[, "row"]]

## for documentation see
?which # particularly the arr.ind argument
?"[" # for extraction or selecting a subset to overwrite
#############

the only reason this does not work as is with data frames is because
of how they are indexed/subset.  dat[index] does not work.  The
required modification is probably fairly minimal, but if you are happy
to use a matrix, then its a moot issue.

HTH,

Josh

On Sun, Feb 27, 2011 at 3:25 PM, Daniel M. <danielmessay at yahoo.com>
wrote:> Hello,
>
> I have some dataset, which i read it from external file using the (data
<-
> read.csv("my file location")) and read as a dataframe
>
>> is(data)
> [1] "data.frame" "list" ? ? ? "oldClass" ?
"vector"
> but i have also converted this into a matrix and tried to apply my code but
> didnt work.
>
> Anyways, suppose i have the following data.
>
>
> ? ?data <- as.data.frame(matrix(rnorm(100), nrow = 10))
>
> And let's put some missing values
>
> ? ?data[sample(1:10, 3), sample(1:10, 3)] <- NA
>
> I want to replace all NA's by row averages or column averages of my
matrix.
>
> I tried to use(with my original data matrix)
>
> ? ?data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
> But got an error message of
>
> ? ? ? Error in rowMeans(data, na.rm = TRUE) : 'x' must be numeric
> Then I converted ?data<- as.matrix(data)
> ? ? ? ? ? ? ? ? ?data<- as.numeric(data)
> And applying my code
>
> ? ? data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
>
> Error message
>
>
> ? ? ?Error in rowMeans(data, na.rm = TRUE) :
> ?'x' must be an array of at least two dimensions
>
> Then again i tried to convert it into Arrays....but the errors
continues....
>
> I Also tried the code
>
> ? ?data[is.na(data)] <- apply(data,1,mean)
>
> But still didnt work out.
>
> Can anyone pls help me as to how to fix it and get out of this, please?
>
> Thank you very much
>
> Daniel
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Bert Gunter

2011-Feb-28 01:02 UTC

head link

[R] replacing missing values with row average

Warning: This is not a helpful answer. Actually, it's a question: Why
do you want to do this? Replacing missing values with row or column
averages and then analyzing the data as if the missing values were not
there is a dangerous thing to do it can produce biased estimates and
understate the true error, likely resulting in biased inference. Of
course, this depends on the specifics (how many are missing and
where).

R has a lot of built-in capabilities for handling missing values. I
agree: it's not easy stuff. Nor do you necessarily need to get that
complicated: Maybe your scheme is perfectly adequate for your
situation. I just wanted to caution you think about this carefully if
you aren't aware of the possible problems and haven't already done so.

-- Bert



On Sun, Feb 27, 2011 at 3:25 PM, Daniel M. <danielmessay at yahoo.com>
wrote:> Hello,
>
> I have some dataset, which i read it from external file using the (data
<-
> read.csv("my file location")) and read as a dataframe
>
>> is(data)
> [1] "data.frame" "list" ? ? ? "oldClass" ?
"vector"
> but i have also converted this into a matrix and tried to apply my code but
> didnt work.
>
> Anyways, suppose i have the following data.
>
>
> ? ?data <- as.data.frame(matrix(rnorm(100), nrow = 10))
>
> And let's put some missing values
>
> ? ?data[sample(1:10, 3), sample(1:10, 3)] <- NA
>
> I want to replace all NA's by row averages or column averages of my
matrix.
>
> I tried to use(with my original data matrix)
>
> ? ?data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
> But got an error message of
>
> ? ? ? Error in rowMeans(data, na.rm = TRUE) : 'x' must be numeric
> Then I converted ?data<- as.matrix(data)
> ? ? ? ? ? ? ? ? ?data<- as.numeric(data)
> And applying my code
>
> ? ? data[is.na(data)] <- rowMeans(data, na.rm = TRUE)
>
> Error message
>
>
> ? ? ?Error in rowMeans(data, na.rm = TRUE) :
> ?'x' must be an array of at least two dimensions
>
> Then again i tried to convert it into Arrays....but the errors
continues....
>
> I Also tried the code
>
> ? ?data[is.na(data)] <- apply(data,1,mean)
>
> But still didnt work out.
>
> Can anyone pls help me as to how to fix it and get out of this, please?
>
> Thank you very much
>
> Daniel
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Bert Gunter
Genentech Nonclinical Biostatistics

Apparently Analagous Threads

Search for more maybe matching threads

R help - Feb 2011 - replacing missing values with row average

[R] replacing missing values with row average

[R] replacing missing values with row average

[R] replacing missing values with row average

Apparently Analagous Threads