Dear R-Users, I have two questions: a) in a directory there are 3 files: [1] "Data.~csv" "Kopie von Data.~csv" "VorlageTradefile.csv" The command "dir( fold, pattern = "\.csv" )" gives back *all* the 3 files With dir( fold, pattern = "\\.csv" ) I get back only VorlageTradefile.csv. I don't understand this behaviour, IMHO the regex expression "\.csv" becomes the string ".csv" and "\\.csv" becomes "\.csv". So the first string should catch it. This is also consistent with the result when I tried with the TRegExpr Tool. Could somebody explain what's going on here? b) I need to handle a copied windows file path. This is certainly often asked but I didn't find a solution. How can I convert, e.g. myfile <- "D:\UebungenNDK\DataMining\DataMiningSeries.r" in either: myfile [1] "D:\\UebungenNDK\\DataMining\\DataMiningSeries.r" or: myfile [1] "D:/UebungenNDK/DataMining/DataMiningSeries.r" Would be great to hear about a possibility! A nice evening to everybody, Hans-Peter
On Thu, 9 Jun 2005, Hans-Peter wrote:> Dear R-Users, > > I have two questions: > > a) > in a directory there are 3 files: > [1] "Data.~csv" "Kopie von Data.~csv" "VorlageTradefile.csv" > > The command "dir( fold, pattern = "\.csv" )" gives back *all* the 3 files > With dir( fold, pattern = "\\.csv" ) I get back only VorlageTradefile.csv. > I don't understand this behaviour, IMHO the regex expression "\.csv" > becomes the string ".csv" and "\\.csv" becomes "\.csv". So the first > string should catch it.Catch what? What do you actually want (you have not told us).> This is also consistent with the result when I tried with the TRegExpr > Tool. Could somebody explain what's going on here?See the FAQ Q7.8: you need to double the backslashes. This is _also_ mentioned in ?regexp. I think you probably really intended dir( fold, pattern = "\\.csv$" )> cat("\\.csv$", "\n")\.csv$ may help illuminate the misconception.> b) > I need to handle a copied windows file path. This is certainly often > asked but I didn't find a solution.It is so often asked it really is a FAQ.> How can I convert, e.g. > > myfile <- "D:\UebungenNDK\DataMining\DataMiningSeries.r"I am sure that's not what you intended. It has to be written as myfile <- "D:\\UebungenNDK\\DataMining\\DataMiningSeries.r"> in either: > > myfile > [1] "D:\\UebungenNDK\\DataMining\\DataMiningSeries.r" > > or: > myfile > [1] "D:/UebungenNDK/DataMining/DataMiningSeries.r" > > Would be great to hear about a possibility!It's all over the R code. E.g. gsub("\\", "/", myfile, fixed = TRUE) -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi,> The command "dir( fold, pattern = "\.csv" )" gives back *all* the 3 files > With dir( fold, pattern = "\\.csv" ) I get back only VorlageTradefile.csv. > I don't understand this behaviour, IMHO the regex expression "\.csv" > becomes the string ".csv" and "\\.csv" becomes "\.csv". So the first > string should catch it. This is also consistent with the result when I > tried with the TRegExpr Tool. Could somebody explain what's going on > here?Under R, for reasons I've never quite understood, "\\." evaluates to . (the double backslash is needed to match a single period) while "\." is the single-character wild card (which seems to be the same as ".") This is explained in ?strsplit but not in the help for other commands that use regex.> b) > I need to handle a copied windows file path. This is certainly often > asked but I didn't find a solution.If you mean you want to change the "\" to either "\\" or "/" I'm really not sure. On my linux system, it doesn't seem possible (which I'm sure is wrong). I'd try sub() or at worst a combination of strsplit() and paste(), except that:> test <- "D:\UebungenNDK\DataMining\DataMiningSeries.r" > test[1] "D:UebungenNDKDataMiningDataMiningSeries.r" Regular expressions under R have always perplexed me just a bit. When I've run into problems of this sort, I've always just processed the strings in vim or similar, rather than fight with R. I'm sure someone here understands them - hopeully we will both be enlightened. Sarah -- Sarah Goslee http://www.stringpage.com
On 6/9/05, Hans-Peter <gchappi at gmail.com> wrote:> Dear R-Users, > > I have two questions: > > a) > in a directory there are 3 files: > [1] "Data.~csv" "Kopie von Data.~csv" "VorlageTradefile.csv" > > The command "dir( fold, pattern = "\.csv" )" gives back *all* the 3 files > With dir( fold, pattern = "\\.csv" ) I get back only VorlageTradefile.csv. > I don't understand this behaviour, IMHO the regex expression "\.csv" > becomes the string ".csv" and "\\.csv" becomes "\.csv". So the first > string should catch it. This is also consistent with the result when I > tried with the TRegExpr Tool. Could somebody explain what's going on > here?The dot (.) is a wildcard that matches any character so .csv will match the ~csv since the . matches the ~. By the way, note that 1. "[.]csv" is one way to specify a literal dot without using backslashes 2. you probably want "[.]csv$" so that a.csv.txt is not matched. 3. Some regular expression functions have a fixed= argument that causes them to regard all special characters like . and * as regular characters but unfortunately dir lacks that argument.> > b) > I need to handle a copied windows file path. This is certainly often > asked but I didn't find a solution. > How can I convert, e.g. > > myfile <- "D:\UebungenNDK\DataMining\DataMiningSeries.r"Variable myfile, as you have written it above, has no backslashes in it so there is no way way to know where they are supposed to be. Maybe \ what you mean is that you have a variable that is _stored_ as: D:\UebungenNDK\...etc.. In that case its already the same as myfile <- "D:\\UebungenNDK\\...etc.." Use nchar to check how many characters are stored. e.g. nchar("D:\\abc") # there are 6, not 7, characters in this string> in either: > > myfile > [1] "D:\\UebungenNDK\\DataMining\\DataMiningSeries.r" > > or: > myfile > [1] "D:/UebungenNDK/DataMining/DataMiningSeries.r" > > Would be great to hear about a possibility!You can convert backslashes to forward slashes using gsub gsub("\\", "/", "D:\\abc", fixed = TRUE) Note that internally Windows understands forward slashes although many of the Windows commands do not. In case I did not understand your question have a look at ?file.path and also ?glob2rx in package sfsmisc. The first one will construct paths and the second one allows you specify wildcards using globbing instead of regular expressions.