εΆζε
2023-Aug-11 03:41 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
???? ????R-4.3???????????R??????list.files??????????????????????BUG???????????????????????????? r4.3????dir????????????????? - COS??? | ?????? | ?????????????? (cosx.org)<https://d.cosx.org/d/424356-r43ban-ben-zhong-dirhan-shu-huo-qu-bu-liao-quan-bu-wen-jian/11> ??????????????????????? [[alternative HTML version deleted]]
Ivan Krylov
2023-Aug-11 11:24 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
Dear ???, Thank you for your message, but please follow the posting guide in your future messages: https://www.r-project.org/posting-guide.html https://www.r-project.org/bugs.html I understand from your link that list.files() ends up skipping some Chinese filenames in R-4.3.1 (but not R-4.2.2) on Windows, but would you (or perhaps Yihui Xie who I see is also participating in the discussion) mind translating the rest of your findings into English? Have you been able to narrow down the problem to certain character ranges, for example? -- Best regards, Ivan
yu gong
2023-Aug-13 09:36 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
Could you test it on RGui and Rterm first, see it work or not. then try RStudio? ________________________________ From: R-devel <r-devel-bounces at r-project.org> on behalf of ??? <yeyueguang at goldwind.com> Sent: Friday, August 11, 2023 11:41 To: r-devel at r-project.org <r-devel at r-project.org> Subject: [Rd] R-4.3 version list.files function could not work correctly in chinese ???? ????R-4.3???????????R??????list.files??????????????????????BUG???????????????????????????? r4.3????dir????????????????? - COS??? | ?????? | ?????????????? (cosx.org)<https://d.cosx.org/d/424356-r43ban-ben-zhong-dirhan-shu-huo-qu-bu-liao-quan-bu-wen-jian/11> ??????????????????????? [[alternative HTML version deleted]] [[alternative HTML version deleted]]
Ivan Krylov
2023-Aug-13 11:16 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
Found it! Looks like a buffer length problem. This isn't limited to Chinese, just more likely to happen when a character takes three bytes to represent in UTF-8. (Any filename containing characters which take more than one byte to represent in UTF-8 may fail.) If a directory contains a file with a sufficiently long name, FindNextFile() fails with ERROR_MORE_DATA (0xEA, 234), making R_readdir() return NULL, stopping list_files() prematurely: # everything seems to work fine... list.files("????") # [1] "????-non-utf8-????? ????????????????????????????????????????????????????.txt" # [2] "????-non-utf8-?????.txt" # [3] "????-utf-8.txt" # now create a file with an even longer name list.files("????") # [1] "????-non-utf8-????? ????????????????????????????????????????????????????.txt" # the files are still there, but not visible to list.files(): system("cmd /c dir /s *.txt") # Volume in drive C has no label. # Volume Serial Number is A85A-AA74 # # Directory of C:\R\R-4.3.1\bin\x64\???? # # 08/12/2023 07:57 AM 22 ????-non-utf8-????? ????????????????????????????????????????????????????.txt # 08/12/2023 07:57 AM 22 ????-non-utf8-????? ????????????????????????????????????????????????????????????????????????????????????????????????????????.txt # 08/12/2023 07:57 AM 22 ????-non-utf8-?????.txt # 08/12/2023 07:56 AM 18 ????-utf-8.txt # 4 File(s) 84 bytes # # Total Files Listed: # 4 File(s) 84 bytes # 0 Dir(s) 29,281,538,048 bytes free # [1] 0 Increasing the path length limits [*] doesn't help, since it's the filename length limit that we're bumping against. While both WIN32_FIND_DATAA and WIN32_FIND_DATAW contain fixed-size buffers, a valid filename may take more than MAX_PATH bytes to represent in UTF-8 while still being under the limit of MAX_PATH wide characters. This may mean having to rewrite list_files in terms of R_wopendir()/R_wreaddir() for Windows. As a workaround, we may use the short filename (which sometimes may not exist, alas) when FindNextFile() fails with ERROR_MORE_DATA. -- Best regards, Ivan [*] https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
Reasonably Related Threads
- R-4.3 version list.files function could not work correctly in chinese
- R-4.3 version list.files function could not work correctly in chinese
- R-4.3 version list.files function could not work correctly in chinese
- R-4.3 version list.files function could not work correctly in chinese
- tab-complete for non-syntactic names could attempt backtick-wrapping