εΆζε
2023-Aug-11 03:41 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
????
????R-4.3???????????R??????list.files??????????????????????BUG????????????????????????????
r4.3????dir????????????????? - COS??? | ?????? | ??????????????
(cosx.org)<https://d.cosx.org/d/424356-r43ban-ben-zhong-dirhan-shu-huo-qu-bu-liao-quan-bu-wen-jian/11>
???????????????????????
[[alternative HTML version deleted]]
Ivan Krylov
2023-Aug-11 11:24 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
Dear ???, Thank you for your message, but please follow the posting guide in your future messages: https://www.r-project.org/posting-guide.html https://www.r-project.org/bugs.html I understand from your link that list.files() ends up skipping some Chinese filenames in R-4.3.1 (but not R-4.2.2) on Windows, but would you (or perhaps Yihui Xie who I see is also participating in the discussion) mind translating the rest of your findings into English? Have you been able to narrow down the problem to certain character ranges, for example? -- Best regards, Ivan
yu gong
2023-Aug-13 09:36 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
Could you test it on RGui and Rterm first, see it work or not. then try RStudio?
________________________________
From: R-devel <r-devel-bounces at r-project.org> on behalf of ???
<yeyueguang at goldwind.com>
Sent: Friday, August 11, 2023 11:41
To: r-devel at r-project.org <r-devel at r-project.org>
Subject: [Rd] R-4.3 version list.files function could not work correctly in
chinese
????
????R-4.3???????????R??????list.files??????????????????????BUG????????????????????????????
r4.3????dir????????????????? - COS??? | ?????? | ??????????????
(cosx.org)<https://d.cosx.org/d/424356-r43ban-ben-zhong-dirhan-shu-huo-qu-bu-liao-quan-bu-wen-jian/11>
???????????????????????
[[alternative HTML version deleted]]
[[alternative HTML version deleted]]
Ivan Krylov
2023-Aug-13 11:16 UTC
[Rd] R-4.3 version list.files function could not work correctly in chinese
Found it! Looks like a buffer length problem. This isn't limited to
Chinese, just more likely to happen when a character takes three bytes
to represent in UTF-8. (Any filename containing characters which take
more than one byte to represent in UTF-8 may fail.)
If a directory contains a file with a sufficiently long name,
FindNextFile() fails with ERROR_MORE_DATA (0xEA, 234), making
R_readdir() return NULL, stopping list_files() prematurely:
# everything seems to work fine...
list.files("????")
# [1] "????-non-utf8-?????
????????????????????????????????????????????????????.txt"
# [2] "????-non-utf8-?????.txt"
# [3] "????-utf-8.txt"
# now create a file with an even longer name
list.files("????")
# [1] "????-non-utf8-?????
????????????????????????????????????????????????????.txt"
# the files are still there, but not visible to list.files():
system("cmd /c dir /s *.txt")
# Volume in drive C has no label.
# Volume Serial Number is A85A-AA74
#
# Directory of C:\R\R-4.3.1\bin\x64\????
#
# 08/12/2023 07:57 AM 22 ????-non-utf8-?????
????????????????????????????????????????????????????.txt
# 08/12/2023 07:57 AM 22 ????-non-utf8-?????
????????????????????????????????????????????????????????????????????????????????????????????????????????.txt
# 08/12/2023 07:57 AM 22 ????-non-utf8-?????.txt
# 08/12/2023 07:56 AM 18 ????-utf-8.txt
# 4 File(s) 84 bytes
#
# Total Files Listed:
# 4 File(s) 84 bytes
# 0 Dir(s) 29,281,538,048 bytes free
# [1] 0
Increasing the path length limits [*] doesn't help, since it's the
filename length limit that we're bumping against. While both
WIN32_FIND_DATAA and WIN32_FIND_DATAW contain fixed-size buffers, a
valid filename may take more than MAX_PATH bytes to represent in UTF-8
while still being under the limit of MAX_PATH wide characters. This may
mean having to rewrite list_files in terms of R_wopendir()/R_wreaddir()
for Windows. As a workaround, we may use the short filename (which
sometimes may not exist, alas) when FindNextFile() fails with
ERROR_MORE_DATA.
--
Best regards,
Ivan
[*]
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation