On Mon, 19 Oct 2009, Jun Chen wrote:
> Dear,
> I would like to deal with microarray data, it can run when i deal with
> little data. However, the amount number of SNP data are 45181, amount
> numbers of animal are 3081,it can not be allocated 1000Mb memory when
> i importing them to R
>
> Procedure sentence show:
>
> m<-matrix(scan("D:/SNPdata.txt"),ncol=nmarkers,byrow=TRUE)
>
> Error show:
> Error: cannot allocate vector of size 1000.0 Mb
It says you don't have enough memory. When stored as floating point numbers
the SNPs will take up 1Gb, which is quite a lot -- more than you can
conveniently analyze in a 32-bit version of R[*] -- you probably have more than
1Gb of memory, but R does need to make copies of things.
In my experience with SNP data, there are two strategies: storing the data more
efficiently (1 byte/SNP), as the Bioconductor package snpMatrix does, or reading
in just part of the data at a time (what I have usually done). My approach is
to read the data in chunks and store it in a netCDF file with the ncdf package,
and then at analysis time to read data as needed from netCDF. This also works
well for parallel processing -- many R sessions can read efficiently from the
netCDF file.
[*] you didn't provide the requested information about your system, but
"D:" looks Windows.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle