Dear all-- I am still forging my first arms with R and I am fighting with regexpr() as well as portability between unix and windoz. I need to extract barcodes from filenames (which are located between a double and single underscore) as well as the directory where the filename is residing. Here is the solution I came to: aFileName <- "/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt" t <- regexpr("__\\d*_",aFileName, perl=T) t.dir <- regexpr("^.*/", aFileName, perl=T) base.name <- substr(aFileName, t+2, t-2 + attr(t,"match.length")) base.dir <- substr(aFileName, t.dir, attr(t.dir,"match.length")) My questions are: 1) Is there a more elegant way to deal with regular expressions (read here: more easier, more like perl style). 2) I have a portability problem when I extract the base.dir Windoz is using '\' instead of '/' to separate directories. Any suggestions/comments Many Tx Marco Blanchette, Ph.D. mblanche at uclink.berkeley.edu Donald C. Rio's lab Department of Molecular and Cell Biology 16 Barker Hall University of California Berkeley, CA 94720-3204 Tel: (510) 642-1084 Cell: (510) 847-0996 Fax: (510) 642-6062
Try this. The regular expression says to match - anything - followed by a double underscore - followed by one or more digits - followed by an underscore - followed by anything. The digits have been parenthesized so that they can be referred to in the backreference "\\1". Also use the R function dirname rather than regular expressions. base.name <- sub(".*__([[:digit:]]+)_.*", "\\1", aFileName, ext = TRUE) base.dir <- dirname(aFileName) On 8/3/05, Marco Blanchette <mblanche at uclink.berkeley.edu> wrote:> Dear all-- > > I am still forging my first arms with R and I am fighting with regexpr() as > well as portability between unix and windoz. I need to extract barcodes from > filenames (which are located between a double and single underscore) as well > as the directory where the filename is residing. Here is the solution I came > to: > > aFileName <- > "/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt" > t <- regexpr("__\\d*_",aFileName, perl=T) > t.dir <- regexpr("^.*/", aFileName, perl=T) > base.name <- substr(aFileName, t+2, t-2 + attr(t,"match.length")) > base.dir <- substr(aFileName, t.dir, attr(t.dir,"match.length")) > > My questions are: > 1) Is there a more elegant way to deal with regular expressions (read here: > more easier, more like perl style). > 2) I have a portability problem when I extract the base.dir Windoz is using > '\' instead of '/' to separate directories. > > Any suggestions/comments > > Many Tx > > Marco Blanchette, Ph.D. > > mblanche at uclink.berkeley.edu > > Donald C. Rio's lab > Department of Molecular and Cell Biology > 16 Barker Hall > University of California > Berkeley, CA 94720-3204 > > Tel: (510) 642-1084 > Cell: (510) 847-0996 > Fax: (510) 642-6062 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
On Tue, 2 Aug 2005, Marco Blanchette wrote:> I am still forging my first arms with R and I am fighting with regexpr() as > well as portability between unix and windoz. I need to extract barcodes from > filenames (which are located between a double and single underscore) as well > as the directory where the filename is residing. Here is the solution I came > to: > > aFileName <- > "/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt" > t <- regexpr("__\\d*_",aFileName, perl=T) > t.dir <- regexpr("^.*/", aFileName, perl=T) > base.name <- substr(aFileName, t+2, t-2 + attr(t,"match.length")) > base.dir <- substr(aFileName, t.dir, attr(t.dir,"match.length")) > > My questions are: > 1) Is there a more elegant way to deal with regular expressions (read here: > more easier, more like perl style).Yes, use sub and backreferences. An example from the R sources doing something similar: wfile <- sub("/chm/([^/]*)$", "", file) thispkg <- sub(".*/([^/]*)/chm/([^/]*)$", "\\1", file) However, R does have functions basename() and dirname() to do this!> 2) I have a portability problem when I extract the base.dir Windoz is using > '\' instead of '/' to separate directories.That is misinformation: Windows (sic) accepts either / or \ (see the rw-FAQ and the R FAQ). Use chartr("\\", "/", path) to map \ to /. The `portability problem' appears to be of your own making -- take heart that R itself manages to manipulate filepaths portably. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595