Mikael Jagan
2023-Nov-22 18:14 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
FWIW, a user on Stack Overflow just reported the same issue with list.files running R 4.3.z on Windows. They do not observe the issue running R-devel, with Tomas' patch (r84960). It is still the case that their file names did not exceed 260 wide characters. https://stackoverflow.com/q/77527167/12685768 Mikael On 2023-08-17 6:00 am, r-devel-request at r-project.org wrote:> Message: 5 > Date: Wed, 16 Aug 2023 16:00:13 +0200 > From: Tomas Kalibera<tomas.kalibera at gmail.com> > To: Ivan Krylov<krylov.r00t at gmail.com> > Cc:"r-devel at r-project.org" <r-devel at r-project.org> > Subject: Re: [Rd] R-4.3 version list.files function could not work > correctly in chinese > Message-ID:<21e91609-85b2-103b-8e23-12eadff62784 at gmail.com> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > > On 8/16/23 13:22, Ivan Krylov wrote: >> On Wed, 16 Aug 2023 09:42:09 +0200 >> Tomas Kalibera<tomas.kalibera at gmail.com> wrote: >> >>> Fixed in R-devel (84960). Please let me know if you see any problem >>> with the fix. >> Thank you for implementing the fix! I gave ??? the link to the >> GitHub Action build of the r84960 installer. > Thanks and thanks for looking at the change. >> I'm worried that ??? was seeing FindNextFileA fail for a different >> reason (all the examples given at the Capital of Statistics forum >> seemed to use less than 256/4 = 64 characters per file name...), but >> maybe this won't reappear with the switch to FindNextFileW. If this >> keeps happening, it might be worth producing a warning when >> FindNextFileW() fails with an unexpected GetLastError() value. > I've added a warning to R-devel when list.files() on Windows stops > listing a directory due to an error. > > There is probably not more we can do unless there is a revised bug > report of the original problem. > >> fs::dir_fs() uses NtQueryDirectoryFile() and WideCharToMultiByte() >> instead of FindNextFileW() and wcstombs(), but maybe this shouldn't >> matter. In particular, both list.files() and fs::dir_fs() would fail >> given a file name that cannot be represented in UTF-8 (invalid UTF-16 >> surrogate pairs?) > Right, R only support file names that are valid strings, this assumption > is present at many places in the code, so it is fine/consistent to be > here as well. The choice of opendir/readdir in R was probably motivated > by minimization of platform-specific code. > > Best > Tomas
Tomas Kalibera
2023-Nov-22 18:29 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
On 11/22/23 19:14, Mikael Jagan wrote:> FWIW, a user on Stack Overflow just reported the same issue with > list.files > running R 4.3.z on Windows.? They do not observe the issue running > R-devel, > with Tomas' patch (r84960).? It is still the case that their file > names did > not exceed 260 wide characters. > > ??? https://stackoverflow.com/q/77527167/12685768Great, thanks! Tomas> > Mikael > > On 2023-08-17 6:00 am, r-devel-request at r-project.org wrote: >> Message: 5 >> Date: Wed, 16 Aug 2023 16:00:13 +0200 >> From: Tomas Kalibera<tomas.kalibera at gmail.com> >> To: Ivan Krylov<krylov.r00t at gmail.com> >> Cc:"r-devel at r-project.org"? <r-devel at r-project.org> >> Subject: Re: [Rd]? R-4.3 version list.files function could not work >> ????correctly in chinese >> Message-ID:<21e91609-85b2-103b-8e23-12eadff62784 at gmail.com> >> Content-Type: text/plain; charset="utf-8"; Format="flowed" >> >> >> On 8/16/23 13:22, Ivan Krylov wrote: >>> On Wed, 16 Aug 2023 09:42:09 +0200 >>> Tomas Kalibera<tomas.kalibera at gmail.com>? wrote: >>> >>>> Fixed in R-devel (84960). Please let me know if you see any problem >>>> with the fix. >>> Thank you for implementing the fix! I gave ??? the link to the >>> GitHub Action build of the r84960 installer. >> Thanks and thanks for looking at the change. >>> I'm worried that ??? was seeing FindNextFileA fail for a different >>> reason (all the examples given at the Capital of Statistics forum >>> seemed to use less than 256/4 = 64 characters per file name...), but >>> maybe this won't reappear with the switch to FindNextFileW. If this >>> keeps happening, it might be worth producing a warning when >>> FindNextFileW() fails with an unexpected GetLastError() value. >> I've added a warning to R-devel when list.files() on Windows stops >> listing a directory due to an error. >> >> There is probably not more we can do unless there is a revised bug >> report of the original problem. >> >>> fs::dir_fs() uses NtQueryDirectoryFile() and WideCharToMultiByte() >>> instead of FindNextFileW() and wcstombs(), but maybe this shouldn't >>> matter. In particular, both list.files() and fs::dir_fs() would fail >>> given a file name that cannot be represented in UTF-8 (invalid UTF-16 >>> surrogate pairs?) >> Right, R only support file names that are valid strings, this assumption >> is present at many places in the code, so it is fine/consistent to be >> here as well. The choice of opendir/readdir in R was probably motivated >> by minimization of platform-specific code. >> >> Best >> Tomas