g.russell at eos-solutions.com
2009-Dec-10 08:45 UTC
[Rd] Antwort: Re: Crash with Unicode and sub (PR#14114)
I don't know about the technicalities, but Peter Dalgaard said the offending code also causes R to come to a stop using SUSE + WINE. Is it possible to run that lot on top of valgrind? Of course, it will probably take all day ... If not, I have a clue which might help. The problem seems to lie in the "sub" routine. In the original report I used -- cut here -- gctorture() u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2)))) v <- rep(u,1e2) v <- sub(" ","",v) v %in% "" -- cut here -- I've tried reducing this a bit more. Replacing intToUtf8 with a direct assignment writing out the string with Unicode escapes seems to make no difference. The %in% can be replaced with "match", leaving the following: -- cut here -- gctorture() u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2)))) v <- rep(u,1e2) v <- sub(" ","",v) match(v,"") -- cut here -- This also crashes R-2.10.0 and R-2.10.1 RC (2009-12-06 r50684). The sub line is essential, so far as I can see, without it we don't get the crash. If we add "perl = TRUE" this seems to make no difference (there is still a crash). If instead we use "fixed = TRUE", the result is strange and differs for R-2.10.0 and R-2.10.1 RC. This is especially strange, because in an unbugged R, the result of v returned from sub should be the same either with fixed = TRUE or perl = TRUE. R-2.10.0 pauses several seconds, then produces the enigmatic output:> match(v,"")[1] 00 00 06 9d 78 9c cd 54 5d 4f 83 30 14 2d ec 9b a9 33 99 2f fe 89 65 1a e3 [26] c3 de 8c 26 be 38 7d d5 c7 4a af 0c 57 ca 42 cb 8c bf dc 18 93 61 29 1d 83 [51] 8e 7d c4 18 23 49 a1 f4 de 9e 7b 4e ef 81 47 07 21 64 23 5b 3e ec 1a 42 56 [76] 4b de ea b6 5c b3 e4 e8 c8 d1 f2 80 41 e4 bb 32 7e 5c 58 ae f3 49 f8 66 a6 [101] ce b0 3b c5 1e c8 69 31 b5 15 80 98 84 84 cb e9 22 db 61 27 1b 53 4a 80 0d [126] 2f 0a e3 99 9c f4 91 ba 4a 41 67 8e 69 0c d7 14 73 ae d1 cc 8c 0e f7 3d 86 [151] 45 1c 81 41 be 19 3e bf 82 2b 4c 40 ee 07 33 0a 0f 8c be a7 6f 3a 62 ad 68 [176] af e8 12 78 c1 31 15 c8 aa 9d 9a a1 5c 89 dd 57 cb eb 27 da 14 38 f2 20 dd [201] 5c 45 ba c1 70 00 9b 14 35 dc 4c 6e 49 4d 51 e6 8e d3 4d 95 54 aa f1 19 90 [226] 32 21 27 29 73 e8 26 bf 53 d6 32 71 0a 4e da 01 51 49 e3 84 48 77 ce 81 dc [251] 64 2d 19 ab 0d 7b 33 42 9f 99 42 64 af a7 a8 5e 9d 0f ce 86 83 a1 d9 c1 a5 [276] 7f d0 97 86 69 16 a2 dd 54 90 a6 93 21 1f 25 ba c2 d2 54 68 25 c8 31 49 d6 [301] ae ee 9f 2a ba d4 96 a6 89 03 60 22 83 2b 3b 17 53 3a ce 79 17 3e 96 b5 d3 [326] ea ea b4 3b 9f 8b 9f b9 a5 cd a7 40 41 84 4c a9 ce dd dd 49 fe c6 3e 07 7f [351] 54 e7 7f d9 f4 50 77 5c 19 a9 e0 b9 5e b2 dd 60 79 6c 13 ad 1e 17 98 11 1c [376] 91 db e5 5f 7e df 0f e7 43 57 c9 7d 01 6c 3e 1a 5d 0c 2f ab 99 6e a9 a8 88 [401] 57 1c 35 5a 7c 03 73 22 e4 b1 R-2.10.1 RC produces the following equally enigmatic output:> match(v,"")NULL Fehler: 'getEncChar' muss für CHARSXP aufgerufen werden So my provisional guess is the bug is somewhere in the part of the internal code for sub which is invoked whatever the value of fixed or perl. It is strange though that it makes a difference whether you specify fixed = TRUE or not. George Russell Prof Brian Ripley <ripley@stats.ox.ac.uk> schrieb am 10.12.2009 08:00:36:> It seems (from the debugger output) that this is corruption in the R > memory allocation routines. Such things can usually be tracked down > via valgrind and a valgrind-instrumented build of R, but I cannot > trigger this on any system with valgrind. I've tried 64- and 32-bit > versions, and Latin-1 locales as well as UTF-8. > > So I am inclining to think this is Windows-specific. One thing that > is specific to Windows is UCS-2 (16-bit) wide characters, which might > be the issue. But we simply don't have the tools on Windows that we > do on other platforms. > > On Wed, 9 Dec 2009, g.russell@eos-solutions.com wrote: > > > Hello Peter, > > > > I have now installed R-2.10.1 RC (sessionInfo() says "R version 2.10.1RC> > (2009-12-06 r50684)", the rest I believe is as before). The followingcode> > always brings R --vanilla down (with a crash, not a normal exit): > > -- cut here -- > > gctorture() > > u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2)))) > > v <- rep(u,1e2) > > v <- sub(" ","",v) > > v %in% "" > > q() > > -- cut here -- > > > > I've tried this several times now, with different effects. Sometimes R > > crashes after 'v %in% ""'. Sometimes it survives that command, butcrashes> > during the q(). I have also had the error message "Fehler in match(x, > > table, nomatch = 0L) > 0L : Vergleich (6) ist nur für atomare und > > Listentypen möglich" from that command (the match seems to be the > > problem), when I type q() R still crashes. > > > > Best wishes, > > > > George Russell | KG EOS Holding GmbH & Co > > > > Tel: +49 40 2850 – 1574 | g.russell@eos-solutions.com > > > > EOS. With head and heart in finance > > > > KG EOS Holding GmbH & Co | Steindamm 71, 20099 Hamburg | AG HamburgHRA 95> > 748 > > Persönlich haftend | EOS Holding GmbH | AG Hamburg HRB 78 748 > > Geschäftsführer | Hans-Werner Scherer, Klaus Engberding, Justus > > Hecking-Veltman, Paul Leary sen., Christos Savvides, Dr. AndreasWitzig> > Vorsitzender des Beirates | Jürgen Schulte-Laggenbeck > > > > Save a tree. Don’t print this email unless it’s really necessary. > > > > Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte > > Informationen. Wenn Sie nicht der richtige Adressat sind oder dieseE-Mail> > irrtümlich erhalten haben, informieren Sie bitte sofort den Absenderund> > vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte > > Weitergabe dieser Mail ist nicht gestattet. > > > > This email may contain confidential and/or privileged information. > > If you are not the intended recipient or have received this email in > > error, please notify the sender immediately and destroy this email. > > Any unauthorized copying, disclosure or distribution of the materialin> > this email is strictly forbidden. > > > > Peter Dalgaard <P.Dalgaard@biostat.ku.dk> schrieb am 08.12.200911:24:50:> > > >> g.russell@eos-solutions.com wrote: > >>> Full_Name: George Russell > >>> Version: 2.10.0 > >>> OS: Windows XP Version 2002 SP 2 > >>> Submission from: (NULL) (217.111.3.131) > >>> > >>> > >>> The following typed into R --vanilla induces a crash: > >>> -- cut here -- > >>> gctorture() > >>> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2)))) > >>> v <- rep(u,1e2) > >>> v <- sub(" ","",v) > >>> v %in% "" > >>> -- cut here -- > >>> > >>> sessionInfo() says: > >>> > >>> -- cut here -- > >>> R version 2.10.0 (2009-10-26) > >>> i386-pc-mingw32 > >>> > >>> locale: > >>> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 > >>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > >>> [5] LC_TIME=German_Germany.1252 > >>> > >>> attached base packages: > >>> [1] stats graphics grDevices datasets utils methods base > >>> -- cut here -- > >>> > >>> I apologise for not testing this with R-2.10.1 but as far as I can > >> see there are > >>> only source releases available so far, which I am not able tocompile.> >>> > >> > >> 2.10.1 RC is available now. Please check. It does seem to be > >> reproducible in the Windows version, or at least it takes a very long > >> time, but that means running under Wine on SUSE for me. I don't seethe> >> effect with the Linux build. > >> > >> -- > >> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B > >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > >> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)35327918> >> ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45)35327907> >> > > > > ______________________________________________ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Brian D. Ripley, ripley@stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595
Peter Dalgaard
2009-Dec-10 14:15 UTC
[Rd] Antwort: Re: Crash with Unicode and sub (PR#14114)
g.russell at eos-solutions.com wrote:> SSBkb24ndCBrbm93IGFib3V0IHRoZSB0ZWNobmljYWxpdGllcywgYnV0IFBldGVyIERhbGdhYXJk > IHNhaWQgdGhlIA0Kb2ZmZW5kaW5nIGNvZGUgYWxzbyBjYXVzZXMgUiB0byBjb21lIHRvIGEgc3Rv > cCB1c2luZyBTVVNFICsgV0lORS4gSXMgaXQgDQpwb3NzaWJsZSB0byBydW4gdGhhdCBsb3Qgb24g[...Argh!, Jitterbug must die....] For those who cannot read base64 coded mails by eye, these are the contents (an unmangled version reached r-devel, but probably not r-bugs): I don't know about the technicalities, but Peter Dalgaard said the offending code also causes R to come to a stop using SUSE + WINE. Is it possible to run that lot on top of valgrind? Of course, it will probably take all day ... If not, I have a clue which might help. The problem seems to lie in the "sub" routine. In the original report I used -- cut here -- gctorture() u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2)))) v <- rep(u,1e2) v <- sub(" ","",v) v %in% "" -- cut here -- I've tried reducing this a bit more. Replacing intToUtf8 with a direct assignment writing out the string with Unicode escapes seems to make no difference. The %in% can be replaced with "match", leaving the following: -- cut here -- gctorture() u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2)))) v <- rep(u,1e2) v <- sub(" ","",v) match(v,"") -- cut here -- This also crashes R-2.10.0 and R-2.10.1 RC (2009-12-06 r50684). The sub line is essential, so far as I can see, without it we don't get the crash. If we add "perl = TRUE" this seems to make no difference (there is still a crash). If instead we use "fixed = TRUE", the result is strange and differs for R-2.10.0 and R-2.10.1 RC. This is especially strange, because in an unbugged R, the result of v returned from sub should be the same either with fixed = TRUE or perl = TRUE. R-2.10.0 pauses several seconds, then produces the enigmatic output:> match(v,"")[1] 00 00 06 9d 78 9c cd 54 5d 4f 83 30 14 2d ec 9b a9 33 99 2f fe 89 65 1a e3 [26] c3 de 8c 26 be 38 7d d5 c7 4a af 0c 57 ca 42 cb 8c bf dc 18 93 61 29 1d 83 [51] 8e 7d c4 18 23 49 a1 f4 de 9e 7b 4e ef 81 47 07 21 64 23 5b 3e ec 1a 42 56 [76] 4b de ea b6 5c b3 e4 e8 c8 d1 f2 80 41 e4 bb 32 7e 5c 58 ae f3 49 f8 66 a6 [101] ce b0 3b c5 1e c8 69 31 b5 15 80 98 84 84 cb e9 22 db 61 27 1b 53 4a 80 0d [126] 2f 0a e3 99 9c f4 91 ba 4a 41 67 8e 69 0c d7 14 73 ae d1 cc 8c 0e f7 3d 86 [151] 45 1c 81 41 be 19 3e bf 82 2b 4c 40 ee 07 33 0a 0f 8c be a7 6f 3a 62 ad 68 [176] af e8 12 78 c1 31 15 c8 aa 9d 9a a1 5c 89 dd 57 cb eb 27 da 14 38 f2 20 dd [201] 5c 45 ba c1 70 00 9b 14 35 dc 4c 6e 49 4d 51 e6 8e d3 4d 95 54 aa f1 19 90 [226] 32 21 27 29 73 e8 26 bf 53 d6 32 71 0a 4e da 01 51 49 e3 84 48 77 ce 81 dc [251] 64 2d 19 ab 0d 7b 33 42 9f 99 42 64 af a7 a8 5e 9d 0f ce 86 83 a1 d9 c1 a5 [276] 7f d0 97 86 69 16 a2 dd 54 90 a6 93 21 1f 25 ba c2 d2 54 68 25 c8 31 49 d6 [301] ae ee 9f 2a ba d4 96 a6 89 03 60 22 83 2b 3b 17 53 3a ce 79 17 3e 96 b5 d3 [326] ea ea b4 3b 9f 8b 9f b9 a5 cd a7 40 41 84 4c a9 ce dd dd 49 fe c6 3e 07 7f [351] 54 e7 7f d9 f4 50 77 5c 19 a9 e0 b9 5e b2 dd 60 79 6c 13 ad 1e 17 98 11 1c [376] 91 db e5 5f 7e df 0f e7 43 57 c9 7d 01 6c 3e 1a 5d 0c 2f ab 99 6e a9 a8 88 [401] 57 1c 35 5a 7c 03 73 22 e4 b1 R-2.10.1 RC produces the following equally enigmatic output:> match(v,"")NULL Fehler: 'getEncChar' muss f?r CHARSXP aufgerufen werden So my provisional guess is the bug is somewhere in the part of the internal code for sub which is invoked whatever the value of fixed or perl. It is strange though that it makes a difference whether you specify fixed = TRUE or not. George Russell Prof Brian Ripley <ripley at stats.ox.ac.uk> schrieb am 10.12.2009 08:00:36:> It seems (from the debugger output) that this is corruption in the R > memory allocation routines. Such things can usually be tracked down > via valgrind and a valgrind-instrumented build of R, but I cannot > trigger this on any system with valgrind. I've tried 64- and 32-bit > versions, and Latin-1 locales as well as UTF-8. > > So I am inclining to think this is Windows-specific. One thing that > is specific to Windows is UCS-2 (16-bit) wide characters, which might > be the issue. But we simply don't have the tools on Windows that we > do on other platforms. > > On Wed, 9 Dec 2009, g.russell at eos-solutions.com wrote: > > > Hello Peter, > > > > I have now installed R-2.10.1 RC (sessionInfo() says "R version 2.10.1RC> > (2009-12-06 r50684)", the rest I believe is as before). The followingcode> > always brings R --vanilla down (with a crash, not a normal exit): > > -- cut here -- > > gctorture() > > u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2)))) > > v <- rep(u,1e2) > > v <- sub(" ","",v) > > v %in% "" > > q() > > -- cut here -- > > > > I've tried this several times now, with different effects. Sometimes R > > crashes after 'v %in% ""'. Sometimes it survives that command, butcrashes> > during the q(). I have also had the error message "Fehler in match(x, > > table, nomatch = 0L) > 0L : Vergleich (6) ist nur f?r atomare und > > Listentypen m?glich" from that command (the match seems to be the > > problem), when I type q() R still crashes. > > > > Best wishes, > > > > George Russell | KG EOS Holding GmbH & Co > > > > Tel: +49 40 2850 ? 1574 | g.russell at eos-solutions.com > > > > EOS. With head and heart in finance > > > > KG EOS Holding GmbH & Co | Steindamm 71, 20099 Hamburg | AG HamburgHRA 95> > 748 > > Pers?nlich haftend | EOS Holding GmbH | AG Hamburg HRB 78 748 > > Gesch?ftsf?hrer | Hans-Werner Scherer, Klaus Engberding, Justus > > Hecking-Veltman, Paul Leary sen., Christos Savvides, Dr. AndreasWitzig> > Vorsitzender des Beirates | J?rgen Schulte-Laggenbeck > > > > Save a tree. Don?t print this email unless it?s really necessary. > > > > Diese E-Mail enth?lt vertrauliche und/oder rechtlich gesch?tzte > > Informationen. Wenn Sie nicht der richtige Adressat sind oder dieseE-Mail> > irrt?mlich erhalten haben, informieren Sie bitte sofort den Absenderund> > vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte > > Weitergabe dieser Mail ist nicht gestattet. > > > > This email may contain confidential and/or privileged information. > > If you are not the intended recipient or have received this email in > > error, please notify the sender immediately and destroy this email. > > Any unauthorized copying, disclosure or distribution of the materialin> > this email is strictly forbidden. > > > > Peter Dalgaard <P.Dalgaard at biostat.ku.dk> schrieb am 08.12.200911:24:50:> > > >> g.russell at eos-solutions.com wrote: > >>> Full_Name: George Russell > >>> Version: 2.10.0 > >>> OS: Windows XP Version 2002 SP 2 > >>> Submission from: (NULL) (217.111.3.131) > >>> > >>> > >>> The following typed into R --vanilla induces a crash: > >>> -- cut here -- > >>> gctorture() > >>> u <- intToUtf8(c(rep(1e3,1e2),32,c(rep(1e3,1e2)))) > >>> v <- rep(u,1e2) > >>> v <- sub(" ","",v) > >>> v %in% "" > >>> -- cut here -- > >>> > >>> sessionInfo() says: > >>> > >>> -- cut here -- > >>> R version 2.10.0 (2009-10-26) > >>> i386-pc-mingw32 > >>> > >>> locale: > >>> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 > >>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C > >>> [5] LC_TIME=German_Germany.1252 > >>> > >>> attached base packages: > >>> [1] stats graphics grDevices datasets utils methods base > >>> -- cut here -- > >>> > >>> I apologise for not testing this with R-2.10.1 but as far as I can > >> see there are > >>> only source releases available so far, which I am not able tocompile.> >>> > >> > >> 2.10.1 RC is available now. Please check. It does seem to be > >> reproducible in the Windows version, or at least it takes a very long > >> time, but that means running under Wine on SUSE for me. I don't seethe> >> effect with the Linux build. > >> > >> -- > >> O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B > >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > >> (*) \(*) -- University of Copenhagen Denmark Ph: (+45)35327918> >> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45)35327907> >> > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907