Full_Name: Stefan Raberger Version: 2.8.1 OS: Windows XP Submission from: (NULL) (213.185.163.242) Hi there, I recently noticed some strange behaviour of the command "type.convert", depending on the startup mode used. But there also seems to be different behaviour on different PCs (all running the same OS and the same version of R). On PC1: When I start R in SDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ) and try to convert, the result is> type.convert("?")[1] NA If I use MDI mode (RGui --no-save --no-restore --no-site-file --no-init-file --no-environ --no-Rconsole) instead, the result is> type.convert("?")[1] ? Levels: ? On PC2 it's exactly the other way round (SDI: ?, MDI: NA), on PC2 the result is always NA, independent of the startup mode used, and on PC4 it's always ?. What's the result I should expect R to return, and why is it different in so many cases? Any help is much appreciated! Regards, Stefan
s.raberger at innovest.at wrote:> Full_Name: Stefan Raberger > Version: 2.8.1 > OS: Windows XP > Submission from: (NULL) (213.185.163.242) > > > Hi there, > > I recently noticed some strange behaviour of the command "type.convert", > depending on the startup mode used. But there also seems to be different > behaviour on different PCs (all running the same OS and the same version of R). > > On PC1: > When I start R in SDI mode (RGui --no-save --no-restore --no-site-file > --no-init-file --no-environ) and try to convert, the result is > >> type.convert("?") > [1] NA > > If I use MDI mode (RGui --no-save --no-restore --no-site-file --no-init-file > --no-environ --no-Rconsole) instead, the result is > >> type.convert("?") > [1] ? > Levels: ? > > On PC2 it's exactly the other way round (SDI: ?, MDI: NA), on PC2 the result is > always NA, independent of the startup mode used, and on PC4 it's always ?. > > What's the result I should expect R to return, and why is it different in so > many cases?Which locale does R think it is in in the four cases? (Sys.setlocale("LC_CTYPE"), I think). Might well not be a bug (so please don't file it as one).> Any help is much appreciated! > Regards, Stefan > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
William Dunlap wrote:> You may have to use > (unsigned int)(unsigned char)*s++ > instead of just > (unsigned int)*s++ > to avoid the sign extension.Thanks again, I probably won't be doing the change since I don't have a Windows build environment around, and I'm a bit superstitious about fixing bugs that I cannot see... Let me just filter this information into the bug repository for now. -pd> > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > >> -----Original Message----- >> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] >> Sent: Friday, April 10, 2009 1:41 PM >> To: William Dunlap >> Cc: r-devel at r-project.org >> Subject: Re: [Rd] type.convert (PR#13646) >> >> William Dunlap wrote: >>> I can reproduce the difference that Stefan saw, depending >>> on whether or not I start Rgui with the flags >>> --no-environ --no-Rconsole >>> I think it boils down to the isBlankString() function. >>> For the string "\247" it returns 1 when those flags are >>> not present and 0 when they are. isBlankString does use >>> some locale-specific functions: >>> Rboolean isBlankString(const char *s) >>> { >>> #ifdef SUPPORT_MBCS >>> if(mbcslocale) { >>> wchar_t wc; int used; mbstate_t mb_st; >>> mbs_init(&mb_st); >>> while( (used = Mbrtowc(&wc, s, MB_CUR_MAX, &mb_st)) ) { >>> if(!iswspace(wc)) return FALSE; >>> s += used; >>> } >>> } else >>> #endif >>> while (*s) >>> if (!isspace((int)*s++)) return FALSE; >>> return TRUE; >>> } >>> >>> I was using R 2.8.1, downloaded precompiled from CRAN, on Windows >>> XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same >>> in both sessions. 'Process Explorer' shows that the 2 sessions >>> have the same dll's opened. >> Thanks for that analysis Bill! >> >> Stefan was in "German_Austria.1252" which I don't think is >> multibyte, so >> only the else-clause should be relevant, pointing the finger rather >> squarely at isspace(). Googling indicates that others have >> been caught >> out by signed/unsigned char issues there. Should this >> possibly rather read >> >> if (!isspace((unsigned int)*s++)) return FALSE; >> >> ?? >> >>>> sessionInfo() >>> R version 2.8.1 (2008-12-22) >>> i386-pc-mingw32 >>> >>> locale: >>> LC_COLLATE=English_United >> States.1252;LC_CTYPE=English_United >> States.1252;LC_MONETARY=English_United >> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 >>> attached base packages: >>> [1] stats graphics grDevices utils datasets >> methods base >>> I did the test with a dll compiled from >>> #include <R.h> >>> #include <R_ext/Utils.h> >>> >>> void test_isBlankString(char **s, int *res) >>> { >>> *res = isBlankString(*s) ; >>> } >>> >>> and called by .C("test_isBlankString","\247",-1L) >>> >>> I don't see the difference while running a version of 2.9.0(devel) >>> compiled locally on 11 March 2009 (from svn rev 48116). >>> >>> Bill Dunlap >>> TIBCO Software Inc - Spotfire Division >>> wdunlap tibco.com >>> >>>> -----Original Message----- >>>> From: r-devel-bounces at r-project.org >>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Peter Dalgaard >>>> Sent: Friday, April 10, 2009 2:03 AM >>>> To: Raberger, Stefan >>>> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch >>>> Subject: Re: [Rd] type.convert (PR#13646) >>>> >>>> Raberger, Stefan wrote: >>>>> Hi Peter, >>>>> >>>>> each of the four PCs actually has the same locale setting: >>>>> >>>>>> Sys.setlocale("LC_CTYPE") >>>>> [1] "German_Austria.1252" >>>>> >>>>> (all the other settings returned by invoking >>>> Sys.getlocale() are identical as well). >>>>> Just to be sure (because it's displayed incorrectly in my >>>> browser on the bugtracking page): the character inside the >>>> type.convert function ought to be a "section"-sign (HTML Code >>>> § or § , in R "\247", and not a dot "."). >>>> >>>> I saw it correctly. It's "\302\247" in UTF8 locales, which is >>>> of course >>>> the reason I suspected locale settings, but I can't seem to >>>> trigger the >>>> NA behaviour. >>>> >>>> I'm at a loss here, but some ideas: >>>> >>>> In the cases where it returns NA, what type is it? (I.e. >>>> storage.mode(type.convert(....))) >>>> >>>> What do you get from >>>> >>>> > charToRaw("?") >>>> [1] c2 a7 >>>> >>>> (a7, presumably, but better check). >>>> >>>> -p >>>> >>>>> -----Urspr?ngliche Nachricht----- >>>>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk] >>>>> Gesendet: Donnerstag, 09. April 2009 19:26 >>>>> An: Raberger, Stefan >>>>> Cc: r-devel at stat.math.ethz.ch; R-bugs at r-project.org >>>>> Betreff: Re: [Rd] type.convert (PR#13646) >>>>> >>>>> s.raberger at innovest.at wrote: >>>>>> Full_Name: Stefan Raberger >>>>>> Version: 2.8.1 >>>>>> OS: Windows XP >>>>>> Submission from: (NULL) (213.185.163.242) >>>>>> >>>>>> >>>>>> Hi there, >>>>>> >>>>>> I recently noticed some strange behaviour of the command >>>> "type.convert", >>>>>> depending on the startup mode used. But there also seems >>>> to be different >>>>>> behaviour on different PCs (all running the same OS and >>>> the same version of R). >>>>>> On PC1: >>>>>> When I start R in SDI mode (RGui --no-save --no-restore >>>> --no-site-file >>>>>> --no-init-file --no-environ) and try to convert, the result is >>>>>> >>>>>>> type.convert("?") >>>>>> [1] NA >>>>>> >>>>>> If I use MDI mode (RGui --no-save --no-restore >>>> --no-site-file --no-init-file >>>>>> --no-environ --no-Rconsole) instead, the result is >>>>>> >>>>>>> type.convert("?") >>>>>> [1] ? >>>>>> Levels: ? >>>>>> >>>>>> On PC2 it's exactly the other way round (SDI: ?, MDI: NA), >>>> on PC2 the result is >>>>>> always NA, independent of the startup mode used, and on >>>> PC4 it's always ?. >>>>>> What's the result I should expect R to return, and why is >>>> it different in so >>>>>> many cases? >>>>> Which locale does R think it is in in the four cases? >>>>> (Sys.setlocale("LC_CTYPE"), I think). >>>>> >>>>> Might well not be a bug (so please don't file it as one). >>>>> >>>>>> Any help is much appreciated! >>>>>> Regards, Stefan >>>>>> >>>>>> ______________________________________________ >>>>>> R-devel at r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> -- >>>> O__ ---- Peter Dalgaard ?ster >> Farimagsgade 5, Entr.B >>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K >>>> (*) \(*) -- University of Copenhagen Denmark Ph: >>>> (+45) 35327918 >>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: >>>> (+45) 35327907 >>>> >>>> ______________________________________________ >>>> R-devel at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >> >> -- >> O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K >> (*) \(*) -- University of Copenhagen Denmark Ph: >> (+45) 35327918 >> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: >> (+45) 35327907 >>-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Using the (unsigned int)(unsigned char) in isspace() resolved the problem in my Windows build. I put some Rprintf statements into isBlankString and for type.convert("\247") it printed *s=3D-89 (4294967207 if unsigned) 8=3Disspace(*s) 8=3Disspace((unsigned int)*s) 0=3Disspace((unsigned int)(unsigned char)*s) I think the 8 is the value of a random bit of memory. When I converted S+ to use full 8-bit characters I ran into the same problem. The is<class> macros in <ctype.h> all take unsigned int argument and if char was signed you had to do the double cast to avoid sign extension. Whoever designed the interface either didn't worry about 8-bit characters or had chars that were unsigned by default. It doesn't look like any of the isspace calls in R do this double casting. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com =20> -----Original Message----- > From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20 > Sent: Friday, April 10, 2009 2:50 PM > To: William Dunlap > Cc: R-bugs at r-project.org; Raberger, Stefan > Subject: Re: [Rd] type.convert (PR#13646) >=20 > William Dunlap wrote: > > You may have to use > > (unsigned int)(unsigned char)*s++ > > instead of just > > (unsigned int)*s++ > > to avoid the sign extension. >=20 > Thanks again, >=20 > I probably won't be doing the change since I don't have a=20 > Windows build=20 > environment around, and I'm a bit superstitious about fixing=20 > bugs that I=20 > cannot see... >=20 > Let me just filter this information into the bug repository for now. >=20 > -pd >=20 > >=20 > > Bill Dunlap > > TIBCO Software Inc - Spotfire Division > > wdunlap tibco.com =20 > >=20 > >> -----Original Message----- > >> From: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20 > >> Sent: Friday, April 10, 2009 1:41 PM > >> To: William Dunlap > >> Cc: r-devel at r-project.org > >> Subject: Re: [Rd] type.convert (PR#13646) > >> > >> William Dunlap wrote: > >>> I can reproduce the difference that Stefan saw, depending > >>> on whether or not I start Rgui with the flags > >>> --no-environ --no-Rconsole > >>> I think it boils down to the isBlankString() function. > >>> For the string "\247" it returns 1 when those flags are > >>> not present and 0 when they are. isBlankString does use > >>> some locale-specific functions: > >>> Rboolean isBlankString(const char *s) > >>> { > >>> #ifdef SUPPORT_MBCS > >>> if(mbcslocale) { > >>> wchar_t wc; int used; mbstate_t mb_st; > >>> mbs_init(&mb_st); > >>> while( (used =3D Mbrtowc(&wc, s, MB_CUR_MAX, &mb_st)) ) { > >>> if(!iswspace(wc)) return FALSE; > >>> s +=3D used; > >>> } > >>> } else > >>> #endif > >>> while (*s) > >>> if (!isspace((int)*s++)) return FALSE; > >>> return TRUE; > >>> } > >>> > >>> I was using R 2.8.1, downloaded precompiled from CRAN, on Windows > >>> XP SP3. The outputs of sessionInfo() and Sys.getenv() are the same > >>> in both sessions. 'Process Explorer' shows that the 2 sessions > >>> have the same dll's opened. > >> Thanks for that analysis Bill! > >> > >> Stefan was in "German_Austria.1252" which I don't think is=20 > >> multibyte, so=20 > >> only the else-clause should be relevant, pointing the=20 > finger rather=20 > >> squarely at isspace(). Googling indicates that others have=20 > >> been caught=20 > >> out by signed/unsigned char issues there. Should this=20 > >> possibly rather read > >> > >> if (!isspace((unsigned int)*s++)) return FALSE; > >> > >> ?? > >> > >>>> sessionInfo() > >>> R version 2.8.1 (2008-12-22)=20 > >>> i386-pc-mingw32=20 > >>> > >>> locale: > >>> LC_COLLATE=3DEnglish_United=20 > >> States.1252;LC_CTYPE=3DEnglish_United=20 > >> States.1252;LC_MONETARY=3DEnglish_United=20 > >> States.1252;LC_NUMERIC=3DC;LC_TIME=3DEnglish_United States.1252 > >>> attached base packages: > >>> [1] stats graphics grDevices utils datasets =20 > >> methods base =20 > >>> I did the test with a dll compiled from > >>> #include <R.h> > >>> #include <R_ext/Utils.h> > >>> > >>> void test_isBlankString(char **s, int *res) > >>> { > >>> *res =3D isBlankString(*s) ; > >>> } > >>> > >>> and called by .C("test_isBlankString","\247",-1L) > >>> > >>> I don't see the difference while running a version of 2.9.0(devel) > >>> compiled locally on 11 March 2009 (from svn rev 48116). > >>> > >>> Bill Dunlap > >>> TIBCO Software Inc - Spotfire Division > >>> wdunlap tibco.com =20 > >>> > >>>> -----Original Message----- > >>>> From: r-devel-bounces at r-project.org=20 > >>>> [mailto:r-devel-bounces at r-project.org] On Behalf Of=20 > Peter Dalgaard > >>>> Sent: Friday, April 10, 2009 2:03 AM > >>>> To: Raberger, Stefan > >>>> Cc: R-bugs at r-project.org; r-devel at stat.math.ethz.ch > >>>> Subject: Re: [Rd] type.convert (PR#13646) > >>>> > >>>> Raberger, Stefan wrote: > >>>>> Hi Peter, > >>>>> > >>>>> each of the four PCs actually has the same locale setting:=20 > >>>>> > >>>>>> Sys.setlocale("LC_CTYPE") > >>>>> [1] "German_Austria.1252" > >>>>> > >>>>> (all the other settings returned by invoking=20 > >>>> Sys.getlocale() are identical as well). > >>>>> Just to be sure (because it's displayed incorrectly in my=20 > >>>> browser on the bugtracking page): the character inside the=20 > >>>> type.convert function ought to be a "section"-sign (HTML Code=20 > >>>> § or § , in R "\247", and not a dot "."). > >>>> > >>>> I saw it correctly. It's "\302\247" in UTF8 locales, which is=20 > >>>> of course=20 > >>>> the reason I suspected locale settings, but I can't seem to=20 > >>>> trigger the=20 > >>>> NA behaviour. > >>>> > >>>> I'm at a loss here, but some ideas: > >>>> > >>>> In the cases where it returns NA, what type is it? (I.e.=20 > >>>> storage.mode(type.convert(....))) > >>>> > >>>> What do you get from > >>>> > >>>> > charToRaw("=A7") > >>>> [1] c2 a7 > >>>> > >>>> (a7, presumably, but better check). > >>>> > >>>> -p > >>>> > >>>>> -----Urspr=FCngliche Nachricht----- > >>>>> Von: Peter Dalgaard [mailto:p.dalgaard at biostat.ku.dk]=20 > >>>>> Gesendet: Donnerstag, 09. April 2009 19:26 > >>>>> An: Raberger, Stefan > >>>>> Cc: r-devel at stat.math.ethz.ch; R-bugs at r-project.org > >>>>> Betreff: Re: [Rd] type.convert (PR#13646) > >>>>> > >>>>> s.raberger at innovest.at wrote: > >>>>>> Full_Name: Stefan Raberger > >>>>>> Version: 2.8.1 > >>>>>> OS: Windows XP > >>>>>> Submission from: (NULL) (213.185.163.242) > >>>>>> > >>>>>> > >>>>>> Hi there,=20 > >>>>>> > >>>>>> I recently noticed some strange behaviour of the command=20 > >>>> "type.convert", > >>>>>> depending on the startup mode used. But there also seems=20 > >>>> to be different > >>>>>> behaviour on different PCs (all running the same OS and=20 > >>>> the same version of R). > >>>>>> On PC1: > >>>>>> When I start R in SDI mode (RGui --no-save --no-restore=20 > >>>> --no-site-file > >>>>>> --no-init-file --no-environ) and try to convert, the result is > >>>>>> > >>>>>>> type.convert("=A7") > >>>>>> [1] NA > >>>>>> > >>>>>> If I use MDI mode (RGui --no-save --no-restore=20 > >>>> --no-site-file --no-init-file > >>>>>> --no-environ --no-Rconsole) instead, the result is > >>>>>> > >>>>>>> type.convert("=A7") > >>>>>> [1] =A7 > >>>>>> Levels: =A7 > >>>>>> > >>>>>> On PC2 it's exactly the other way round (SDI: =A7, MDI: NA),=20 > >>>> on PC2 the result is > >>>>>> always NA, independent of the startup mode used, and on=20 > >>>> PC4 it's always =A7. > >>>>>> What's the result I should expect R to return, and why is=20 > >>>> it different in so > >>>>>> many cases? > >>>>> Which locale does R think it is in in the four cases?=20 > >>>>> (Sys.setlocale("LC_CTYPE"), I think). > >>>>> > >>>>> Might well not be a bug (so please don't file it as one). > >>>>> > >>>>>> Any help is much appreciated! > >>>>>> Regards, Stefan > >>>>>> > >>>>>> ______________________________________________ > >>>>>> R-devel at r-project.org mailing list > >>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>> --=20 > >>>> O__ ---- Peter Dalgaard =D8ster=20 > >> Farimagsgade 5, Entr.B > >>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > >>>> (*) \(*) -- University of Copenhagen Denmark Ph: =20 > >>>> (+45) 35327918 > >>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:=20 > >>>> (+45) 35327907 > >>>> > >>>> ______________________________________________ > >>>> R-devel at r-project.org mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>>> > >> > >> --=20 > >> O__ ---- Peter Dalgaard =D8ster=20 > Farimagsgade 5, Entr.B > >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > >> (*) \(*) -- University of Copenhagen Denmark Ph: =20 > >> (+45) 35327918 > >> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:=20 > >> (+45) 35327907 > >> >=20 >=20 > --=20 > O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: =20 > (+45) 35327918 > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:=20 > (+45) 35327907 >=20
Reasonably Related Threads
- [Bug 377] New: Reduce compiler warnings. Use unsigned args to the ctype.h is*() macros.
- Errors on Windows with grep(fixed=TRUE) on UTF-8 strings
- problem gsub in the locale of CP932 and SJIS (PR#9751)
- Call for testing: OpenSSH-5.6
- read.table problem on Linux/Alpha (seg faults caused by isspace(R_EOF)) (PR#303)