Kirill Müller
2016-May-09 14:07 UTC
[Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings
Hi I think the following behavior is a regression from R 3.2.5: > match(iconv( c("\u00f8", "A"), from = "UTF8", to = "latin1" ), "\u00f8") [1] 1 NA > match(iconv( c("\u00f8"), from = "UTF8", to = "latin1" ), "\u00f8") [1] NA > match(iconv( c("\u00f8"), from = "UTF8", to = "latin1" ), "\u00f8", incomparables = NA) [1] 1 I'm seeing this in R 3.3.0 on both Windows and Ubuntu 15.10. The specific behavior makes me think this is related to the following NEWS entry: match(x, table) is faster (sometimes by an order of magnitude) when x is of length one and incomparables is unchanged (PR#16491). Best regards Kirill
Peter Haverty
2016-May-09 16:47 UTC
[Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings
Dear Kirill, You are correct, that is a new bug introduced in PR16491. The appropriate fix and regression tests have been added via PR16885, which has been merged into trunk. I believe that means the fix will be released with R 3.3.1. I checked your example and the second "match" now properly returns 1 with the patched code. Please have a look at https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16885 http://developer.r-project.org/blosxom.cgi/R-devel/NEWS Thank you for your report. I hope the benefits of this speedup will eventually outweigh this unfortunate bug in my PR16491. Regards, Pete ____________________ Peter M. Haverty, Ph.D. [[alternative HTML version deleted]]
Martin Maechler
2016-May-10 06:40 UTC
[Rd] Regression in match() in R 3.3.0 when matching strings with different character encodings
>>>>> Peter Haverty <haverty.peter at gene.com> >>>>> on Mon, 9 May 2016 09:47:48 -0700 writes:> Dear Kirill, > You are correct, that is a new bug introduced in PR16491. The appropriate > fix and regression tests have been added via PR16885, which has been merged > into trunk. I believe that means the fix will be released with R 3.3.1. Yes, definitely. Kirill, as seem to use code which does trigger the bug, you may want to switch using 'R-patched', i.e., > R.version.string [1] "R version 3.3.0 Patched (2016-05-09 r70591)" ( where the subversion revision must be >= 70591 ) > I checked your example and the second "match" now properly returns 1 with > the patched code. > Please have a look at > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=16885 > http://developer.r-project.org/blosxom.cgi/R-devel/NEWS > Thank you for your report. I hope the benefits of this speedup will > eventually outweigh this unfortunate bug in my PR16491. I'm pretty sure that your hope will be fulfilled. > Regards, > Pete > ____________________ > Peter M. Haverty, Ph.D. Martin Maechler, ETH Zurich