Serguei Sokol
2018-Apr-19 09:47 UTC
[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
Le 19/04/2018 ? 09:30, Tomas Kalibera a ?crit?:> On 04/19/2018 02:06 AM, Duncan Murdoch wrote: >> On 18/04/2018 5:08 PM, Tousey, Colton wrote: >>> Hello, >>> >>> I want to report a bug in R that is limiting my capabilities to >>> export a matrix with write.csv or write.table with over >>> 2,147,483,648 elements (C's int limit). I found this bug already >>> reported about before: >>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However, >>> there appears to be no solution or fixes in upcoming R version >>> releases. >>> >>> The error message is coming from the writetable part of the utils >>> package in the io.c source >>> code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c): >>> /* quick integrity check */ >>> ???????????????? if(XLENGTH(x) != (R_len_t)nr * nc) >>> ???????????????????? error(_("corrupt matrix -- dims not not match >>> length")); >>> >>> The issue is that nr*nc is an integer and the size of my matrix, 2.8 >>> billion elements, exceeds C's limit, so the check forces the code to >>> fail. >> >> Yes, looks like a typo:? R_len_t is an int, and that's how nr was >> declared.? It should be R_xlen_t, which is bigger on machines that >> support big vectors. >> >> I haven't tested the change; there may be something else in that >> function that assumes short vectors. > Indeed, I think the function won't work for long vectors because of > EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be > changed, including their signaturesThat would be a definite fix but before such deep rewriting is undertaken may the following small fix (in addition to "(R_xlen_t)nr * nc") will be sufficient for cases where nr and nc are in int range but their product can reach long vector limit: replace ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod, ??? ??? ??? ??? ??? &strBuf, sdec); by ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, quote_col[j], qmethod, ??? ??? ??? ??? ??? &strBuf, sdec); Serguei
Tomas Kalibera
2018-Apr-19 10:15 UTC
[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
On 04/19/2018 11:47 AM, Serguei Sokol wrote:> Le 19/04/2018 ? 09:30, Tomas Kalibera a ?crit?: >> On 04/19/2018 02:06 AM, Duncan Murdoch wrote: >>> On 18/04/2018 5:08 PM, Tousey, Colton wrote: >>>> Hello, >>>> >>>> I want to report a bug in R that is limiting my capabilities to >>>> export a matrix with write.csv or write.table with over >>>> 2,147,483,648 elements (C's int limit). I found this bug already >>>> reported about before: >>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However, >>>> there appears to be no solution or fixes in upcoming R version >>>> releases. >>>> >>>> The error message is coming from the writetable part of the utils >>>> package in the io.c source >>>> code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c): >>>> /* quick integrity check */ >>>> ???????????????? if(XLENGTH(x) != (R_len_t)nr * nc) >>>> ???????????????????? error(_("corrupt matrix -- dims not not match >>>> length")); >>>> >>>> The issue is that nr*nc is an integer and the size of my matrix, >>>> 2.8 billion elements, exceeds C's limit, so the check forces the >>>> code to fail. >>> >>> Yes, looks like a typo:? R_len_t is an int, and that's how nr was >>> declared.? It should be R_xlen_t, which is bigger on machines that >>> support big vectors. >>> >>> I haven't tested the change; there may be something else in that >>> function that assumes short vectors. >> Indeed, I think the function won't work for long vectors because of >> EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be >> changed, including their signatures > > That would be a definite fix but before such deep rewriting is > undertaken may the following small fix (in addition to "(R_xlen_t)nr * > nc") will be sufficient for cases where nr and nc are in int range but > their product can reach long vector limit: > > replace > ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod, > ??? ??? ??? ??? ??? &strBuf, sdec); > by > ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, > quote_col[j], qmethod, > ??? ??? ??? ??? ??? &strBuf, sdec);Unfortunately we can't do that, x is a matrix of an atomic vector type. VECTOR_ELT is taking elements of a generic vector, so it cannot be applied to "x". But even if we extracted a single element from "x" (e.g. via a type-switch etc), we would not be able to pass it to EncodeElement0 which expects a full atomic vector (that is, including its header). Instead we would have to call functions like EncodeInteger, EncodeReal0, etc on the individual elements. Which is then the same as changing EncodeElement0 or implementing a new version of it. This does not seem that hard to fix, just is not as trivial as changing the cast.. Tomas> > Serguei
Serguei Sokol
2018-Apr-19 11:29 UTC
[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
Le 19/04/2018 ? 12:15, Tomas Kalibera a ?crit?:> On 04/19/2018 11:47 AM, Serguei Sokol wrote: >>>> replace >> ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod, >> ??? ??? ??? ??? ??? &strBuf, sdec); >> by >> ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, >> quote_col[j], qmethod, >> ??? ??? ??? ??? ??? &strBuf, sdec); > > Unfortunately we can't do that, x is a matrix of an atomic vector > type. VECTOR_ELT is taking elements of a generic vector, so it cannot > be applied to "x". But even if we extracted a single element from "x" > (e.g. via a type-switch etc), we would not be able to pass it to > EncodeElement0 which expects a full atomic vector (that is, including > its header). Instead we would have to call functions like > EncodeInteger, EncodeReal0, etc on the individual elements. Which is > then the same as changing EncodeElement0 or implementing a new version > of it. This does not seem that hard to fix, just is not as trivial as > changing the cast..Thanks Tomas for this detailed explanation. I would like also to signal a problem with the list. It must be corrupted in some way because beside the Tomas'? response I've got five or six (so far) dating spam. All of them coming from two emails: Kristina Oliynik <kristinaoliynik604324 at kw.taluss.com> and Samantha Smith <samanthasmith317260 at kw.fefty.com>. Serguei.
Tomas Kalibera
2018-May-22 13:13 UTC
[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
Fixed in R-devel 74754. Tomas On 04/19/2018 12:15 PM, Tomas Kalibera wrote:> On 04/19/2018 11:47 AM, Serguei Sokol wrote: >> Le 19/04/2018 ? 09:30, Tomas Kalibera a ?crit?: >>> On 04/19/2018 02:06 AM, Duncan Murdoch wrote: >>>> On 18/04/2018 5:08 PM, Tousey, Colton wrote: >>>>> Hello, >>>>> >>>>> I want to report a bug in R that is limiting my capabilities to >>>>> export a matrix with write.csv or write.table with over >>>>> 2,147,483,648 elements (C's int limit). I found this bug already >>>>> reported about before: >>>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. >>>>> However, there appears to be no solution or fixes in upcoming R >>>>> version releases. >>>>> >>>>> The error message is coming from the writetable part of the utils >>>>> package in the io.c source >>>>> code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c): >>>>> /* quick integrity check */ >>>>> ???????????????? if(XLENGTH(x) != (R_len_t)nr * nc) >>>>> ???????????????????? error(_("corrupt matrix -- dims not not match >>>>> length")); >>>>> >>>>> The issue is that nr*nc is an integer and the size of my matrix, >>>>> 2.8 billion elements, exceeds C's limit, so the check forces the >>>>> code to fail. >>>> >>>> Yes, looks like a typo:? R_len_t is an int, and that's how nr was >>>> declared.? It should be R_xlen_t, which is bigger on machines that >>>> support big vectors. >>>> >>>> I haven't tested the change; there may be something else in that >>>> function that assumes short vectors. >>> Indeed, I think the function won't work for long vectors because of >>> EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be >>> changed, including their signatures >> >> That would be a definite fix but before such deep rewriting is >> undertaken may the following small fix (in addition to "(R_xlen_t)nr >> * nc") will be sufficient for cases where nr and nc are in int range >> but their product can reach long vector limit: >> >> replace >> ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod, >> ??? ??? ??? ??? ??? &strBuf, sdec); >> by >> ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, >> quote_col[j], qmethod, >> ??? ??? ??? ??? ??? &strBuf, sdec); > > Unfortunately we can't do that, x is a matrix of an atomic vector > type. VECTOR_ELT is taking elements of a generic vector, so it cannot > be applied to "x". But even if we extracted a single element from "x" > (e.g. via a type-switch etc), we would not be able to pass it to > EncodeElement0 which expects a full atomic vector (that is, including > its header). Instead we would have to call functions like > EncodeInteger, EncodeReal0, etc on the individual elements. Which is > then the same as changing EncodeElement0 or implementing a new version > of it. This does not seem that hard to fix, just is not as trivial as > changing the cast.. > > Tomas > > >> >> Serguei > >
Maybe Matching Threads
- R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
- R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
- R Bug: write.table for matrix of more than 2, 147, 483, 648 elements
- IO error when writing to disk
- R Bug: write.table for matrix of more than 2, 147, 483, 648 elements