Robert Zimbardo
2017-Jul-03 08:57 UTC
[R] R memory limits on table(x, y) (and bigtabulate)
I have two character vectors x and y that have the following characteristics: length(x) # same as length(y) # 872099 length(unique(x)) # 47740 length(unique(y)) # 52478 I need to crosstabulate them, which would lead to a table with 47740*52478 # 2505299720 cells, which is more than 2^31 # 2147483648 cells, which seems to be R's limit because I am getting the error message Error in table(x, y) : attempt to make a table with >= 2^31 elements Two questions: - is this really R's limit, even on a 64bit machine? It seems like it (given <https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html> and <http://www.win-vector.com/blog/2015/06/r-in-a-64-bit-world/>, but I just want to make sure I understood that right); - I thought I could handle this with the package bigtabulate, but whenever I run xy.tab <- bigtable(data.frame(x, y), ccols=1:2) R crashes as follows: terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted Any idea on what I am doing wrong with bigtabulate? Thanks for your consideration
Sorry, don't know enough to give you trustworthy answers, but I can say that crashes due to (or linked to) packages should usually be reported to the package maintainer, who can be found by the ?maintainer function. That person may not monitor this list. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jul 3, 2017 at 1:57 AM, Robert Zimbardo <robertzimbardo at gmail.com> wrote:> I have two character vectors x and y that have the following characteristics: > > length(x) # same as > length(y) # 872099 > > length(unique(x)) # 47740 > length(unique(y)) # 52478 > > I need to crosstabulate them, which would lead to a table with > > 47740*52478 # 2505299720 > > cells, which is more than > > 2^31 # 2147483648 > > cells, which seems to be R's limit because I am getting the error message > > Error in table(x, y) : attempt to make a table with >= 2^31 elements > > Two questions: > > - is this really R's limit, even on a 64bit machine? It seems like it > (given <https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html> > and <http://www.win-vector.com/blog/2015/06/r-in-a-64-bit-world/>, but > I just want to make sure I understood that right); > - I thought I could handle this with the package bigtabulate, but whenever I run > > xy.tab <- bigtable(data.frame(x, y), ccols=1:2) > > R crashes as follows: > > terminate called after throwing an instance of 'std::bad_alloc' > what(): std::bad_alloc > Aborted > > Any idea on what I am doing wrong with bigtabulate? Thanks for your > consideration > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2017-Jul-03 15:58 UTC
[R] R memory limits on table(x, y) (and bigtabulate)
Yes. Table and matrix size limits are set by the max.integer size which is fixed at what can be represented with 4 bytes. David Sent from my iPhone> On Jul 3, 2017, at 8:04 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > Sorry, don't know enough to give you trustworthy answers, but I can > say that crashes due to (or linked to) packages should usually be > reported to the package maintainer, who can be found by the > ?maintainer function. That person may not monitor this list. > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Jul 3, 2017 at 1:57 AM, Robert Zimbardo > <robertzimbardo at gmail.com> wrote: >> I have two character vectors x and y that have the following characteristics: >> >> length(x) # same as >> length(y) # 872099 >> >> length(unique(x)) # 47740 >> length(unique(y)) # 52478 >> >> I need to crosstabulate them, which would lead to a table with >> >> 47740*52478 # 2505299720 >> >> cells, which is more than >> >> 2^31 # 2147483648 >> >> cells, which seems to be R's limit because I am getting the error message >> >> Error in table(x, y) : attempt to make a table with >= 2^31 elements >> >> Two questions: >> >> - is this really R's limit, even on a 64bit machine? It seems like it >> (given <https://stat.ethz.ch/R-manual/R-devel/library/base/html/Memory-limits.html> >> and <http://www.win-vector.com/blog/2015/06/r-in-a-64-bit-world/>, but >> I just want to make sure I understood that right); >> - I thought I could handle this with the package bigtabulate, but whenever I run >> >> xy.tab <- bigtable(data.frame(x, y), ccols=1:2) >> >> R crashes as follows: >> >> terminate called after throwing an instance of 'std::bad_alloc' >> what(): std::bad_alloc >> Aborted >> >> Any idea on what I am doing wrong with bigtabulate? Thanks for your >> consideration >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.