Gabor Grothendieck pointed out a bug to me in list.files(..., full.name=TRUE), that essentially comes down to the fact that in Windows it's not always valid to add a path separator (slash or backslash) between a path specifier and a filename. For example, c:foo is different from c:\foo and there are other examples. I'm going to fix this, but I'm wondering whether the fix is needed just for Windows, or for Unix too. Specifically: In Unix-like systems, is it *always* safe to add a slash between a pathname and a filename? The only examples I can think of that might go wrong are things like //foo /tmp//foo Are these the same as /foo and /tmp/foo? Are there any examples where an extra slash causes trouble? Duncan Murdoch
# aldus Duncan Murdoch :> In Unix-like systems, is it *always* safe to add a slash between a > pathname and a filename?Not if the path is empty: '' + '/' + 'file' -> /file is not the same as: '' + 'file' -> file Doubling of a slash has no effect (at least on Linux and HP-UX) unless you are using kpsetools. -- Peter Kleiweg
Duncan Murdoch <dmurdoch@pair.com> writes:> Gabor Grothendieck pointed out a bug to me in list.files(..., > full.name=TRUE), that essentially comes down to the fact that in > Windows it's not always valid to add a path separator (slash or > backslash) between a path specifier and a filename. For example, > > c:foo > > is different from > > c:\foo > > and there are other examples. > > I'm going to fix this, but I'm wondering whether the fix is needed > just for Windows, or for Unix too. Specifically: > > In Unix-like systems, is it *always* safe to add a slash between a > pathname and a filename? > > The only examples I can think of that might go wrong are things like > > //foo > > /tmp//foo > > Are these the same as > > /foo > > and > > /tmp/foo? Are there any examples where an extra slash causes trouble?Also, and of course, if the first part is empty, foo and /foo are very different. There could be a problem with leading //, or is that Windows only? -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Perhaps the dir= and pattern= arguments could be combined so that its not necessary to for list.files to paste them together: list.files("C:/a*.txt", glob=T) Date: Tue, 25 Nov 2003 07:14:49 -0500 From: Duncan Murdoch <dmurdoch@pair.com> To: Prof Brian Ripley <ripley@stats.ox.ac.uk> Cc: <r-devel@stat.math.ethz.ch> Subject: Re: [Rd] Question about Unix file paths On Tue, 25 Nov 2003 07:35:57 +0000 (GMT), you wrote:>I think there are some potential issues with doubling separators and final >separators on dirs. On Unix file systems /part1//part2 and /path/to/dir/ >are valid. However, file systems on Unix may not be Unix file systems: >examples are earlier MacOS systems on MacOS X and mounted Windows and >Novell systems on Linux. I would not want to assume that all of these >combinations worked.This is something that R could not do reliably by itself. The code I committed checks the final character in the path, and if it's "/", "\" or ":" doesn't add a path separator. However, both "C:" and "C:\" are valid directory names in standard Unix file systems, so the test would do the wrong thing there. I think people who mount strange file systems will just have to expect occasional glitches. The only way I can see around this is to add another argument to list.files() to say whether to add a path separator, but it would be so rarely used that it doesn't seem to be worth the effort. Duncan Murdoch ______________________________________________ R-devel@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
On Tue, 25 Nov 2003 07:27:46 -0500 (EST), you wrote:> >Perhaps the dir= and pattern= arguments could be combined so that >its not necessary to for list.files to paste them together: > > list.files("C:/a*.txt", glob=T)Why not use system() or shell() instead? Those explicitly do what ls or dir would do. Duncan Murdoch
Actually that's what I currently do using something like: readLines(pipe("cmd /c dir/b \\myfolder\\a*.txt")) but its a pain since: 1. One has to explicitly paste together the filename from the output of dir with the directory path to create a complete path/filename for use in other commands. That's because, the Windows dir command does not preface the filename with the path (unless you use something like dir/b/s but that gives you everything in subdirectories too which one may not want). 2. One must convert / to \ since dir will not take /. If one uses \ then one has to write \\ in R strings, which is is annoying, so I prefer to use /. For example, to convert all consecutives series of / and \ to \ is: gsub("[/\\\\]+","\\\\",path) or gsub("[/\\]+","\\\\",path) . 3. There are reported bugs in R's pipe() on Windows and although I have not noticed them affecting the pipe above, I really don't know what their cause is and I am just crossing my fingers on this one. I guess I could redirect dir's output to a file but that would be even more work. 4. Its all somewhat complex for a simple operation. --- On Tue, 25 Nov 2003 07:27:46 -0500 (EST), you wrote:> >Perhaps the dir= and pattern= arguments could be combined so that >its not necessary to for list.files to paste them together: > > list.files("C:/a*.txt", glob=T)Why not use system() or shell() instead? Those explicitly do what ls or dir would do. Duncan Murdoch
> Date: Wed, 26 Nov 2003 10:05:42 +0100 > From: Kurt Hornik <Kurt.Hornik@wu-wien.ac.at> > To: Prof Brian Ripley <ripley@stats.ox.ac.uk> > Cc: <r-devel@stat.math.ethz.ch>,Duncan Murdoch <dmurdoch@pair.com> > Subject: Re: [Rd] Question about Unix file paths > > > > >>>>> Prof Brian Ripley writes: > > > On Mon, 24 Nov 2003, Duncan Murdoch wrote: > >> >Duncan Murdoch <dmurdoch@pair.com> writes: > >> > > >> >> Gabor Grothendieck pointed out a bug to me in list.files(..., > >> >> full.name=TRUE), that essentially comes down to the fact that in > >> >> Windows it's not always valid to add a path separator (slash or > >> >> backslash) between a path specifier and a filename. For example, > >> >> > >> >> c:foo > >> >> > >> >> is different from > >> >> > >> >> c:\foo > >> >> > >> >> and there are other examples. > >> > >> I've committed a change to r-patched to fix this in Windows only. > >> Sounds like it's not an issue elsewhere. > > > I think there are some potential issues with doubling separators and > > final separators on dirs. On Unix file systems /part1//part2 and > > /path/to/dir/ are valid. However, file systems on Unix may not be > > Unix file systems: examples are earlier MacOS systems on MacOS X and > > mounted Windows and Novell systems on Linux. I would not want to > > assume that all of these combinations worked. > > >> Gabor also suggested an option to use shell globbing instead of > >> regular expressions to select the files in the list, e.g. > >> > >> list.files(dir="/", pattern="a*.dat", glob=T) > >> > >> This would be easy to do in Windows, but from the little I know about > >> Unix programming, would not be so easy there, so I haven't done > >> anything about it. > > > It would be shell-dependent and OS-dependent as well as a retrograde > > step, as those who wanted to use regular expressions no longer would > > be able to. > > Right. In any case, an explicit glob() function seems preferable to > me ... > > -kIf it were done this way, it would be desirable to combine the dirand pattern= args in list.files so that you don't have to specify the dir= arg twice. That is: list.files(glob("c:/a*.txt"")) rather than list.files(pattern=glob("a*.txt", dir="c:/"), > Date: Wed, 26 Nov 2003 10:05:42 +0100> From: Kurt Hornik <Kurt.Hornik@wu-wien.ac.at> > To: Prof Brian Ripley <ripley@stats.ox.ac.uk> > Cc: <r-devel@stat.math.ethz.ch>,Duncan Murdoch <dmurdoch@pair.com> > Subject: Re: [Rd] Question about Unix file paths > > > > >>>>> Prof Brian Ripley writes: > > > On Mon, 24 Nov 2003, Duncan Murdoch wrote: > >> >Duncan Murdoch <dmurdoch@pair.com> writes: > >> > > >> >> Gabor Grothendieck pointed out a bug to me in list.files(..., > >> >> full.name=TRUE), that essentially comes down to the fact that in > >> >> Windows it's not always valid to add a path separator (slash or > >> >> backslash) between a path specifier and a filename. For example, > >> >> > >> >> c:foo > >> >> > >> >> is different from > >> >> > >> >> c:\foo > >> >> > >> >> and there are other examples. > >> > >> I've committed a change to r-patched to fix this in Windows only. > >> Sounds like it's not an issue elsewhere. > > > I think there are some potential issues with doubling separators and > > final separators on dirs. On Unix file systems /part1//part2 and > > /path/to/dir/ are valid. However, file systems on Unix may not be > > Unix file systems: examples are earlier MacOS systems on MacOS X and > > mounted Windows and Novell systems on Linux. I would not want to > > assume that all of these combinations worked. > > >> Gabor also suggested an option to use shell globbing instead of > >> regular expressions to select the files in the list, e.g. > >> > >> list.files(dir="/", pattern="a*.dat", glob=T) > >> > >> This would be easy to do in Windows, but from the little I know about > >> Unix programming, would not be so easy there, so I haven't done > >> anything about it. > > > It would be shell-dependent and OS-dependent as well as a retrograde > > step, as those who wanted to use regular expressions no longer would > > be able to. > > Right. In any case, an explicit glob() function seems preferable to > me ... > > -kIf it were done this way, it would be desirable to combine the dirand pattern= args in list.files so that you don't have to specify the dir= arg twice. That is: list.files( glob("c:/a*.txt"") ) rather than list.files( pattern=glob("a*.txt", dir="c:/"), dir="c:/" )
> > > --- > Date: Wed, 26 Nov 2003 16:52:09 +0000 (GMT) > From: Prof Brian Ripley <ripley@stats.ox.ac.uk> > To: John W. Eaton <jwe@bevo.che.wisc.edu> > Cc: <Kurt.Hornik@wu-wien.ac.at>,Martin Maechler <maechler@stat.math.ethz.ch>, <r-devel@stat.math.ethz.ch> > Subject: Re: [Rd] Question about Unix file paths > > > > On Wed, 26 Nov 2003, John W. Eaton wrote: > > > On 26-Nov-2003, Martin Maechler <maechler@stat.math.ethz.ch> wrote: > > > > | >>>>> " Kurt" == Kurt Hornik <Kurt.Hornik@wu-wien.ac.at> > > | >>>>> on Wed, 26 Nov 2003 10:05:42 +0100 writes: > > | > > | Kurt> Right. In any case, an explicit glob() function > > | Kurt> seems preferable to me ... > > | > > | Good idea! > > | > > | More than 12 years ago, I had a similar one, and wrote a > > | "pat2grep()" {pattern to grep regular expression} function > > | --- for S-plus on Unix --- which I have now renamed to glob2regexp(): > > | -- still not really usable outside unix (or windows with the > > | 'sed' tool in the path), nor perfect, but maybe a good start: > > | > > | sys <- function(...) system(paste(..., sep = "")) > > | > > | glob2regexp <- function(pattern) > > | { > > | ## Purpose: Change "ls pattern" to "grep regular expression" pattern. > > | ## ------------------------------------------------------------------------- > > | ## Author: Martin Maechler ETH Zurich, ~ 1991 > > | sys("echo '", pattern, "'| sed ", > > | "'s/\\./\\\\./g;s/*/.*/g;s/?/./g; s/^/^/;s/$/$/; s/\\.\\*\\$$//'") > > | } > > > > It seems to me that using this approach to implement a proper glob() > > function would be more work than using the glob code that is available > > as part of bash, which I think will allow you to handle much more > > complex patterns, including [xyz] {a,b,c} etc. > > Or even the glob code from Perl, which is cross-platform. It is not clear > to me what we would want glob() to do on Windows, BTW. > > -- > Brian D. Ripley, ripley@stats.ox.ac.ukIt would work similarly to: readLines(pipe("cmd /c dir/b a*.dat")) If the question is what would it be used for then I have a number of data files with nearly the same name and want the most recent. I started out using list.files but found the pattern matching less natural when it comes to files than file globbing so I changed this to use the above. After that I use file.info to find out which is the most recent and then read in that.
> On Wed, 26 Nov 2003 16:09:34 +0100 (CET), you wrote: > > ># aldus John W. Eaton : > > > >> It seems to me that using this approach to implement a proper glob() > >> function would be more work than using the glob code that is available > >> as part of bash, which I think will allow you to handle much more > >> complex patterns, including [xyz] {a,b,c} etc. > > > >Unix people don't need a glob function in R. But a simple glob, > >with just '*' and '?', may be all that an average Windows user > >can handle, and useful to them. > > We already have that, in choose.files(). It's interactive; maybe it > should have a non-interactive option. > > I don't think we should add another pattern matching syntax to R. > Filename pattern matching is a job for the shell or the OS. > > Duncan MurdochIts not done by the shell in Windows, VAX/VMS and probably a number of other systems. Also, I found it surprising tough in any short obvious way as the Windows dir commands will not handle this directly if you need full pathnames to be returned. In Windows: - the dir command will not return the complete pathname, only the filename (unless you use the /s flag but then it recursively descends the directory tree which is not what I want) - the dir command will not accept /'s which means I have to do the conversion to backslashes myself. (I prefer to specify /'s so that I don't have to use double backslashes which R requires since \ is also the string escape character.) I spent some time on this and think that I now have a solution that works pretty well and is short but involves a trick that was not immediately obvious. This trick was to use the Windows attrib command as it instead of the dir command. attrib does return complete pathnames. It even handles forward slash specifiers. The first line in the body of the function executes the attrib command, the second line closes the pipe and the third line checks whether anything was found and, if so, strips off the stuff before the pathname. # tested on Windows 2000 list.files.glob <- function( spec ) { z <- readLines( con <- pipe( paste( "cmd /c attrib", spec ) ) ) close( con ) if ( !pmatch("File not found - ", z[[1]], nomatch = 0) ) substring(z,12) } # a couple of examples: list.files.glob( "c:/myfolder/my*.dat" ) list.files.glob( "c:\\myfolder\\my*.dat" )