Hello,
I have a lot of data and it has a lot of NaN values. I want to compress the
data so I don't have memory issues later.
Using the Matrix package, sparseMatrix function, and some fiddling around,
I have successfully reduced the 'size' of my data (as measured by
object.size()). However, NaN values are found all over in my data and zeros
are important, but zeros are found very infrequently in my data. So I turn
NaN's into zeros and zeros into very small numbers. I don't like
changing
the zeros into small numbers, because that is not the truth. I know this is
a judgement call on my part based on the impact non-zero zeros will have on
my analysis.
My question is: Do I have any other option? Is there a better solution for
this issue?
Here is a small example:
# make sample data
M <- Matrix(10 + 1:28, 4, 7)
M2 <- cBind(-1, M)
M2[, c(2,4:6)] <- 0
M2[1:2,2] <- M2[c(3,4),]<- M2[,c(3,4,5)]<- NaN
M3 = M2
# my 'fiddling' to make sparseMatrix save space
M3[M3==0] = 1e-08 # turn zeros into small values
M3[is.nan(M3)] = 0 # turn NaN's into zeros
# saving space
sM <- as(M3, "sparseMatrix")
#Note that this is just a sample of what I am doing. This reduces the
object.size() if you have a lot more data. In this simple example it
actually increases the object.size() because the data is so small.
What I know about Matrix:
http://cran.r-project.org/web/packages/Matrix/vignettes/Intro2Matrix.pdf
Thanks,
Ben
[[alternative HTML version deleted]]