Stuart Ambler
2014-Nov-22 19:59 UTC
[Rd] R string comparisons may vary with platform (plain text)
A colleague?s R program behaved differently when I ran it, and we thought we traced it probably to different results from string comparisons as below, with different R versions. However the platforms also differed. A friend ran it on a few machines and found that the comparison behavior didn?t correlate with R version, but rather with platform. I wonder if you?ve seen this. If it?s not some setting I?m unaware of, maybe someone should look into it. Sorry I haven?t taken the time to read the source code myself. Thanks, Stuart R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" Platform: x86_64-unknown-linux-gnu (64-bit) Sys.getlocale() [1] "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF -8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_ NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICA TION=C" "-1" > "1" [1] TRUE "-1" <"1" [1] FALSE "1" < "-1" [1] TRUE "1" < "-" [1] FALSE Vs. R version 3.1.1 (2014-07-10) ? ?Sock it to Me" Platform: x86_64-redhat-linux-gnu (64-bit) Sys.getlocale() [1] "LC_CTYPE=en_US.utf8;LC_NUMERIC=C;LC_TIME=en_US.utf8;LC_COLLATE=en_US.utf8 ;LC_MONETARY=en_US.utf8;LC_MESSAGES=en_US.utf8;LC_PAPER=en_US.utf8;LC_NAME =C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.utf8;LC_IDENTIFICATION =C" "-1" > "1" [1] FALSE "-1" <"1" [1] TRUE "1" < "-1" [1] FALSE "1" < "-" [1] FALSE
Duncan Murdoch
2014-Nov-22 20:42 UTC
[Rd] R string comparisons may vary with platform (plain text)
On 22/11/2014, 2:59 PM, Stuart Ambler wrote:> A colleague?s R program behaved differently when I ran it, and we thought > we traced it probably to different results from string comparisons as > below, with different R versions. However the platforms also differed. A > friend ran it on a few machines and found that the comparison behavior > didn?t correlate with R version, but rather with platform. > > I wonder if you?ve seen this. If it?s not some setting I?m unaware of, > maybe someone should look into it. Sorry I haven?t taken the time to read > the source code myself.Looks like a collation order issue. See ?Comparison. Duncan Murdoch> Thanks, > Stuart > > R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" > Platform: x86_64-unknown-linux-gnu (64-bit) > Sys.getlocale() > [1] > "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF > -8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_ > NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICA > TION=C" > > "-1" > "1" > [1] TRUE > > "-1" <"1" > [1] FALSE > > "1" < "-1" > [1] TRUE > > "1" < "-" > [1] FALSE > > Vs. > > R version 3.1.1 (2014-07-10) ? ?Sock it to Me" > Platform: x86_64-redhat-linux-gnu (64-bit) > Sys.getlocale() > [1] > "LC_CTYPE=en_US.utf8;LC_NUMERIC=C;LC_TIME=en_US.utf8;LC_COLLATE=en_US.utf8 > ;LC_MONETARY=en_US.utf8;LC_MESSAGES=en_US.utf8;LC_PAPER=en_US.utf8;LC_NAME > =C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.utf8;LC_IDENTIFICATION > =C" > > "-1" > "1" > [1] FALSE > > "-1" <"1" > [1] TRUE > > "1" < "-1" > [1] FALSE > > "1" < "-" > [1] FALSE > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Stuart Ambler
2014-Nov-22 20:49 UTC
[Rd] R string comparisons may vary with platform (plain text)
You mean where it says that some platforms may not respect the locale (I assume, though don?t know, that en_US.UTF-8 and en_US.utf8 would be the same)? But I gather that the general problem has been looked into and is difficult to solve; thanks. On 11/22/14, 12:42 PM, "Duncan Murdoch" <murdoch.duncan at gmail.com> wrote:>On 22/11/2014, 2:59 PM, Stuart Ambler wrote: >> A colleague?s R program behaved differently when I ran it, and we >>thought >> we traced it probably to different results from string comparisons as >> below, with different R versions. However the platforms also differed. >> A >> friend ran it on a few machines and found that the comparison behavior >> didn?t correlate with R version, but rather with platform. >> >> I wonder if you?ve seen this. If it?s not some setting I?m unaware of, >> maybe someone should look into it. Sorry I haven?t taken the time to >>read >> the source code myself. > >Looks like a collation order issue. See ?Comparison. > >Duncan Murdoch > >> Thanks, >> Stuart >> >> R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" >> Platform: x86_64-unknown-linux-gnu (64-bit) >> Sys.getlocale() >> [1] >> >>"LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.U >>TF >> >>-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;L >>C_ >> >>NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFI >>CA >> TION=C" >> >> "-1" > "1" >> [1] TRUE >> >> "-1" <"1" >> [1] FALSE >> >> "1" < "-1" >> [1] TRUE >> >> "1" < "-" >> [1] FALSE >> >> Vs. >> >> R version 3.1.1 (2014-07-10) ? ?Sock it to Me" >> Platform: x86_64-redhat-linux-gnu (64-bit) >> Sys.getlocale() >> [1] >> >>"LC_CTYPE=en_US.utf8;LC_NUMERIC=C;LC_TIME=en_US.utf8;LC_COLLATE=en_US.utf >>8 >> >>;LC_MONETARY=en_US.utf8;LC_MESSAGES=en_US.utf8;LC_PAPER=en_US.utf8;LC_NAM >>E >> >>=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.utf8;LC_IDENTIFICATIO >>N >> =C" >> >> "-1" > "1" >> [1] FALSE >> >> "-1" <"1" >> [1] TRUE >> >> "1" < "-1" >> [1] FALSE >> >> "1" < "-" >> [1] FALSE >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >
Henrik Bengtsson
2014-Nov-23 00:05 UTC
[Rd] R string comparisons may vary with platform (plain text)
On Sat, Nov 22, 2014 at 12:42 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 22/11/2014, 2:59 PM, Stuart Ambler wrote: >> A colleague?s R program behaved differently when I ran it, and we thought >> we traced it probably to different results from string comparisons as >> below, with different R versions. However the platforms also differed. A >> friend ran it on a few machines and found that the comparison behavior >> didn?t correlate with R version, but rather with platform. >> >> I wonder if you?ve seen this. If it?s not some setting I?m unaware of, >> maybe someone should look into it. Sorry I haven?t taken the time to read >> the source code myself. > > Looks like a collation order issue. See ?Comparison.With the oddity that both platforms use what look like similar locales: LC_COLLATE=en_US.UTF-8 LC_COLLATE=en_US.utf8 /Henrik> > Duncan Murdoch > >> Thanks, >> Stuart >> >> R version 3.0.2 (2013-09-25) -- "Frisbee Sailing" >> Platform: x86_64-unknown-linux-gnu (64-bit) >> Sys.getlocale() >> [1] >> "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF >> -8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_ >> NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICA >> TION=C" >> >> "-1" > "1" >> [1] TRUE >> >> "-1" <"1" >> [1] FALSE >> >> "1" < "-1" >> [1] TRUE >> >> "1" < "-" >> [1] FALSE >> >> Vs. >> >> R version 3.1.1 (2014-07-10) ? ?Sock it to Me" >> Platform: x86_64-redhat-linux-gnu (64-bit) >> Sys.getlocale() >> [1] >> "LC_CTYPE=en_US.utf8;LC_NUMERIC=C;LC_TIME=en_US.utf8;LC_COLLATE=en_US.utf8 >> ;LC_MONETARY=en_US.utf8;LC_MESSAGES=en_US.utf8;LC_PAPER=en_US.utf8;LC_NAME >> =C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.utf8;LC_IDENTIFICATION >> =C" >> >> "-1" > "1" >> [1] FALSE >> >> "-1" <"1" >> [1] TRUE >> >> "1" < "-1" >> [1] FALSE >> >> "1" < "-" >> [1] FALSE >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Reasonably Related Threads
- R string comparisons may vary with platform (plain text)
- R string comparisons may vary with platform (plain text)
- R string comparisons may vary with platform (plain text)
- R string comparisons may vary with platform (plain text)
- why does [A-Z] include 'T' in an Estonian locale?