g.russell at eos-solutions.com
2009-Nov-16 13:10 UTC
[Rd] R crash with intToUtf8 on huge vectors (PR#14068)
Full_Name: George Russell Version: 2.10.0 OS: Windows XP Professional Version 2002 Service Pack 2 Submission from: (NULL) (217.111.3.131) Typing the following command into R --vanilla causes R to crash: k <- intToUtf8(rep(1e3,1e7)) This is the output of sessionInfo(): R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] RODBC_1.3-1 Many thanks for your help and best wishes, George Russell
Duncan Murdoch
2009-Nov-16 21:08 UTC
[Rd] R crash with intToUtf8 on huge vectors (PR#14068)
On 11/16/2009 8:10 AM, g.russell at eos-solutions.com wrote:> Full_Name: George Russell > Version: 2.10.0 > OS: Windows XP Professional Version 2002 Service Pack 2 > Submission from: (NULL) (217.111.3.131) > > > Typing the following command into R --vanilla causes R to crash: > > k <- intToUtf8(rep(1e3,1e7))Thanks, I see this in R-patched and R-devel. Will try to track it down.> > This is the output of sessionInfo(): > R version 2.10.0 (2009-10-26) > i386-pc-mingw32> locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 > [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > [5] LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] RODBC_1.3-1I didn't have RODBC present, and was working in an English_United States.1252 locale. Duncan Murdoch> > Many thanks for your help and best wishes, > > George Russell > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Prof Brian Ripley
2009-Nov-17 16:20 UTC
[Rd] R crash with intToUtf8 on huge vectors (PR#14068)
Basically you have exceeded a resource limit, and Windows has not handled that gracefully (other OSes do in your example). You are trying to create a single 20Mb string and no one envisaged anyone wanting to do that (nor that Windows would not fail gracefully, although generically that comes as no real surprise)). We'll change the method to cope with very large strings (more slowly), but perhaps you could explain the real-world problem that needs 20Mb strings to be produced from integer representations of Unicode points? On Mon, 16 Nov 2009, g.russell at eos-solutions.com wrote:> Full_Name: George Russell > Version: 2.10.0 > OS: Windows XP Professional Version 2002 Service Pack 2 > Submission from: (NULL) (217.111.3.131) > > > Typing the following command into R --vanilla causes R to crash: > > k <- intToUtf8(rep(1e3,1e7)) > > This is the output of sessionInfo(): > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 > [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > [5] LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] RODBC_1.3-1 > > Many thanks for your help and best wishes, > > George Russell > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Duncan Murdoch
2009-Nov-17 16:30 UTC
[Rd] R crash with intToUtf8 on huge vectors (PR#14068)
On 11/16/2009 8:10 AM, g.russell at eos-solutions.com wrote:> Full_Name: George Russell > Version: 2.10.0 > OS: Windows XP Professional Version 2002 Service Pack 2 > Submission from: (NULL) (217.111.3.131) > > > Typing the following command into R --vanilla causes R to crash: > > k <- intToUtf8(rep(1e3,1e7))Brian Ripley has tracked this one down and a fix should appear shortly. The intToUtf8 code was written for reasonably small conversions, and it overflowed the stack when it tried to produce a 20 megabyte string there. Do you have a real application that works with such large strings? Duncan Murdoch> This is the output of sessionInfo(): > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 > [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > [5] LC_TIME=German_Germany.1252 > > attached base packages: > [1] stats graphics grDevices datasets utils methods base > > other attached packages: > [1] RODBC_1.3-1 > > Many thanks for your help and best wishes, > > George Russell > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel