Hello, In my script I have one list of 1,132,533 vectors (each vector contains 381 elements). When I use "write" to save this list in a flat text file (I unlist my list, separate by tabs, and set ncol to 381), I end up with a file of 1,132,535 lines (2 additional lines). I checked back, my R list do not have those two additional items before writing. With awk, I determined if lines where not made of 381 fields: there were two, separated by around 400k lines. I made sub-files, using those "incomplete" lines as boundaries. My files are very close in size : 1.9 GB (respectively 1971841853 B and 1972614897 B). It feels like a 32 bit / 64 bit issue. My R version is this: ./Rscript -e 'sessionInfo()$platform' [1] "x86_64-unknown-linux-gnu (64-bit)" There is somewhere, reaching 1.9 GB, something that is changing my tabs to unwanted carriage returns... Any idea that might cause this, and if it looks solvable in R? Cheers, --Maxime ----------------------------------------------------------------------- This message and its attachments are strictly confidenti...{{dropped:8}}
Stefan Evert (Mailing Lists)
2014-Sep-17 13:39 UTC
[R] R "write" strange behavior in huge file
You probably told R to write out the file as a single long line with fields separated alternately by 380 TABs and one newline ? that?s what the ncol argument does (write is just a small wrapper around cat()). cat() doesn?t print lines that are longer than 2 GiB, so it will insert an extra \n after every 2 GiB of data. (IIRC, this is because in the C code, fill=FALSE is replaced by fill=MAX_INT or so.) The only way around this limitation that I can think of is to write a wrapper function that breaks up the matrix or list of vectors in smaller chunks and appends them separately to the output file. I?m planning to add such a function to one of my packages, so I?d be interested if somebody has a better solution. Best, Stefan On 16 Sep 2014, at 18:54, Maxime Vallee <ValleeM at iarc.fr> wrote:> In my script I have one list of 1,132,533 vectors (each vector contains > 381 elements). > > When I use "write" to save this list in a flat text file (I unlist my > list, separate by tabs, and set ncol to 381), I end up with a file of > 1,132,535 lines (2 additional lines). I checked back, my R list do not > have those two additional items before writing. > > With awk, I determined if lines where not made of 381 fields: there were > two, separated by around 400k lines. > > I made sub-files, using those "incomplete" lines as boundaries. My files > are very close in size : 1.9 GB (respectively 1971841853 B and 1972614897 > B). It feels like a 32 bit / 64 bit issue. > > My R version is this: > ./Rscript -e 'sessionInfo()$platform' > [1] "x86_64-unknown-linux-gnu (64-bit)" > > There is somewhere, reaching 1.9 GB, something that is changing my tabs to > unwanted carriage returns... > Any idea that might cause this, and if it looks solvable in R?[[alternative HTML version deleted]]