thr3ads.net - R devel - [Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements [Apr 2018]

If this information is useful, please help other people find it:
Share via:

Serguei Sokol

2018-Apr-19 09:47 UTC

[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Le 19/04/2018 ? 09:30, Tomas Kalibera a ?crit?:> On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
>> On 18/04/2018 5:08 PM, Tousey, Colton wrote:
>>> Hello,
>>>
>>> I want to report a bug in R that is limiting my capabilities to 
>>> export a matrix with write.csv or write.table with over 
>>> 2,147,483,648 elements (C's int limit). I found this bug
already
>>> reported about before: 
>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. However,
>>> there appears to be no solution or fixes in upcoming R version 
>>> releases.
>>>
>>> The error message is coming from the writetable part of the utils 
>>> package in the io.c source 
>>> code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):
>>> /* quick integrity check */
>>> ???????????????? if(XLENGTH(x) != (R_len_t)nr * nc)
>>> ???????????????????? error(_("corrupt matrix -- dims not not
match
>>> length"));
>>>
>>> The issue is that nr*nc is an integer and the size of my matrix,
2.8
>>> billion elements, exceeds C's limit, so the check forces the
code to
>>> fail.
>>
>> Yes, looks like a typo:? R_len_t is an int, and that's how nr was 
>> declared.? It should be R_xlen_t, which is bigger on machines that 
>> support big vectors.
>>
>> I haven't tested the change; there may be something else in that 
>> function that assumes short vectors.
> Indeed, I think the function won't work for long vectors because of 
> EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be 
> changed, including their signatures
That would be a definite fix but before such deep rewriting is 
undertaken may the following small fix (in addition to "(R_xlen_t)nr * 
nc") will be sufficient for cases where nr and nc are in int range but 
their product can reach long vector limit:

replace
 ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
 ??? ??? ??? ??? ??? &strBuf, sdec);
by
 ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, 
quote_col[j], qmethod,
 ??? ??? ??? ??? ??? &strBuf, sdec);

Serguei

Tomas Kalibera

2018-Apr-19 10:15 UTC

head link

[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

On 04/19/2018 11:47 AM, Serguei Sokol wrote:> Le 19/04/2018 ? 09:30, Tomas Kalibera a ?crit?:
>> On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
>>> On 18/04/2018 5:08 PM, Tousey, Colton wrote:
>>>> Hello,
>>>>
>>>> I want to report a bug in R that is limiting my capabilities to
>>>> export a matrix with write.csv or write.table with over 
>>>> 2,147,483,648 elements (C's int limit). I found this bug
already
>>>> reported about before: 
>>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182.
However,
>>>> there appears to be no solution or fixes in upcoming R version 
>>>> releases.
>>>>
>>>> The error message is coming from the writetable part of the
utils
>>>> package in the io.c source 
>>>>
code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):
>>>> /* quick integrity check */
>>>> ???????????????? if(XLENGTH(x) != (R_len_t)nr * nc)
>>>> ???????????????????? error(_("corrupt matrix -- dims not
not match
>>>> length"));
>>>>
>>>> The issue is that nr*nc is an integer and the size of my
matrix,
>>>> 2.8 billion elements, exceeds C's limit, so the check
forces the
>>>> code to fail.
>>>
>>> Yes, looks like a typo:? R_len_t is an int, and that's how nr
was
>>> declared.? It should be R_xlen_t, which is bigger on machines that 
>>> support big vectors.
>>>
>>> I haven't tested the change; there may be something else in
that
>>> function that assumes short vectors.
>> Indeed, I think the function won't work for long vectors because of
>> EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to be 
>> changed, including their signatures
>
> That would be a definite fix but before such deep rewriting is 
> undertaken may the following small fix (in addition to "(R_xlen_t)nr *
> nc") will be sufficient for cases where nr and nc are in int range but
> their product can reach long vector limit:
>
> replace
> ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
> ??? ??? ??? ??? ??? &strBuf, sdec);
> by
> ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, 
> quote_col[j], qmethod,
> ??? ??? ??? ??? ??? &strBuf, sdec);
Unfortunately we can't do that, x is a matrix of an atomic vector type. 
VECTOR_ELT is taking elements of a generic vector, so it cannot be 
applied to "x". But even if we extracted a single element from
"x" (e.g.
via a type-switch etc), we would not be able to pass it to 
EncodeElement0 which expects a full atomic vector (that is, including 
its header). Instead we would have to call functions like EncodeInteger, 
EncodeReal0, etc on the individual elements. Which is then the same as 
changing EncodeElement0 or implementing a new version of it. This does 
not seem that hard to fix, just is not as trivial as changing the cast..

Tomas

>
> Serguei

Serguei Sokol

2018-Apr-19 11:29 UTC

head link

[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Le 19/04/2018 ? 12:15, Tomas Kalibera a ?crit?:> On 04/19/2018 11:47 AM, Serguei Sokol wrote:
>>
>> replace
>> ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
>> ??? ??? ??? ??? ??? &strBuf, sdec);
>> by
>> ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, 
>> quote_col[j], qmethod,
>> ??? ??? ??? ??? ??? &strBuf, sdec);
>
> Unfortunately we can't do that, x is a matrix of an atomic vector 
> type. VECTOR_ELT is taking elements of a generic vector, so it cannot 
> be applied to "x". But even if we extracted a single element from
"x"
> (e.g. via a type-switch etc), we would not be able to pass it to 
> EncodeElement0 which expects a full atomic vector (that is, including 
> its header). Instead we would have to call functions like 
> EncodeInteger, EncodeReal0, etc on the individual elements. Which is 
> then the same as changing EncodeElement0 or implementing a new version 
> of it. This does not seem that hard to fix, just is not as trivial as 
> changing the cast..
Thanks Tomas for this detailed explanation.

I would like also to signal a problem with the list. It must be 
corrupted in some way because beside the Tomas'? response I've got five 
or six (so far) dating spam. All of them coming from two emails: 
Kristina Oliynik <kristinaoliynik604324 at kw.taluss.com> and Samantha 
Smith <samanthasmith317260 at kw.fefty.com>.

Serguei.

Tomas Kalibera

2018-May-22 13:13 UTC

head link

[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Fixed in R-devel 74754.
Tomas

On 04/19/2018 12:15 PM, Tomas Kalibera wrote:> On 04/19/2018 11:47 AM, Serguei Sokol wrote:
>> Le 19/04/2018 ? 09:30, Tomas Kalibera a ?crit?:
>>> On 04/19/2018 02:06 AM, Duncan Murdoch wrote:
>>>> On 18/04/2018 5:08 PM, Tousey, Colton wrote:
>>>>> Hello,
>>>>>
>>>>> I want to report a bug in R that is limiting my
capabilities to
>>>>> export a matrix with write.csv or write.table with over 
>>>>> 2,147,483,648 elements (C's int limit). I found this
bug already
>>>>> reported about before: 
>>>>> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17182. 
>>>>> However, there appears to be no solution or fixes in
upcoming R
>>>>> version releases.
>>>>>
>>>>> The error message is coming from the writetable part of the
utils
>>>>> package in the io.c source 
>>>>>
code(https://svn.r-project.org/R/trunk/src/library/utils/src/io.c):
>>>>> /* quick integrity check */
>>>>> ???????????????? if(XLENGTH(x) != (R_len_t)nr * nc)
>>>>> ???????????????????? error(_("corrupt matrix -- dims
not not match
>>>>> length"));
>>>>>
>>>>> The issue is that nr*nc is an integer and the size of my
matrix,
>>>>> 2.8 billion elements, exceeds C's limit, so the check
forces the
>>>>> code to fail.
>>>>
>>>> Yes, looks like a typo:? R_len_t is an int, and that's how
nr was
>>>> declared.? It should be R_xlen_t, which is bigger on machines
that
>>>> support big vectors.
>>>>
>>>> I haven't tested the change; there may be something else in
that
>>>> function that assumes short vectors.
>>> Indeed, I think the function won't work for long vectors
because of
>>> EncodeElement2 and EncodeElement0. EncodeElement2/0 would have to
be
>>> changed, including their signatures
>>
>> That would be a definite fix but before such deep rewriting is 
>> undertaken may the following small fix (in addition to
"(R_xlen_t)nr
>> * nc") will be sufficient for cases where nr and nc are in int
range
>> but their product can reach long vector limit:
>>
>> replace
>> ??? tmp = EncodeElement2(x, i + j*nr, quote_col[j], qmethod,
>> ??? ??? ??? ??? ??? &strBuf, sdec);
>> by
>> ??? tmp = EncodeElement2(VECTOR_ELT(x, (R_xlen_t)i + j*nr), 0, 
>> quote_col[j], qmethod,
>> ??? ??? ??? ??? ??? &strBuf, sdec);
>
> Unfortunately we can't do that, x is a matrix of an atomic vector 
> type. VECTOR_ELT is taking elements of a generic vector, so it cannot 
> be applied to "x". But even if we extracted a single element from
"x"
> (e.g. via a type-switch etc), we would not be able to pass it to 
> EncodeElement0 which expects a full atomic vector (that is, including 
> its header). Instead we would have to call functions like 
> EncodeInteger, EncodeReal0, etc on the individual elements. Which is 
> then the same as changing EncodeElement0 or implementing a new version 
> of it. This does not seem that hard to fix, just is not as trivial as 
> changing the cast..
>
> Tomas
>
>
>>
>> Serguei
>
>

Reasonably Related Threads

Search for more reasonably related threads

R devel - Apr 2018 - R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

[Rd] R Bug: write.table for matrix of more than 2, 147, 483, 648 elements

Reasonably Related Threads