thr3ads.net - R help - [R] regexpr and portability issue [Aug 2005]

If this information is useful, please help other people find it:
Share via:

Marco Blanchette

2005-Aug-03 05:26 UTC

[R] regexpr and portability issue

Dear all--

I am still forging my first arms with R and I am fighting with regexpr() as
well as portability between unix and windoz. I need to extract barcodes from
filenames (which are located between a double and single underscore) as well
as the directory where the filename is residing. Here is the solution I came
to:

aFileName <- 
"/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt"
t <- regexpr("__\\d*_",aFileName, perl=T)
t.dir <- regexpr("^.*/", aFileName, perl=T)
base.name <- substr(aFileName, t+2, t-2 + attr(t,"match.length"))
base.dir <- substr(aFileName, t.dir, attr(t.dir,"match.length"))

My questions are:
1) Is there a more elegant way to deal with regular expressions (read here:
more easier, more like perl style).
2) I have a portability problem when I extract the base.dir Windoz is using
'\' instead of '/' to separate directories.

Any suggestions/comments

Many Tx

Marco Blanchette, Ph.D.

mblanche at uclink.berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062

Gabor Grothendieck

2005-Aug-03 05:47 UTC

head link

[R] regexpr and portability issue

Try this.  The regular expression says to match 
- anything 
- followed by a double underscore 
- followed by one or more digits
- followed by an underscore 
- followed by anything.  
The digits have been parenthesized so that they can be referred to in
the backreference "\\1".    Also use the R function dirname
rather than regular expressions.

base.name <- sub(".*__([[:digit:]]+)_.*", "\\1",
aFileName, ext = TRUE)
base.dir <- dirname(aFileName)


On 8/3/05, Marco Blanchette <mblanche at uclink.berkeley.edu>
wrote:> Dear all--
> 
> I am still forging my first arms with R and I am fighting with regexpr() as
> well as portability between unix and windoz. I need to extract barcodes
from
> filenames (which are located between a double and single underscore) as
well
> as the directory where the filename is residing. Here is the solution I
came
> to:
> 
> aFileName <-
>
"/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt"
> t <- regexpr("__\\d*_",aFileName, perl=T)
> t.dir <- regexpr("^.*/", aFileName, perl=T)
> base.name <- substr(aFileName, t+2, t-2 +
attr(t,"match.length"))
> base.dir <- substr(aFileName, t.dir,
attr(t.dir,"match.length"))
> 
> My questions are:
> 1) Is there a more elegant way to deal with regular expressions (read here:
> more easier, more like perl style).
> 2) I have a portability problem when I extract the base.dir Windoz is using
> '\' instead of '/' to separate directories.
> 
> Any suggestions/comments
> 
> Many Tx
> 
> Marco Blanchette, Ph.D.
> 
> mblanche at uclink.berkeley.edu
> 
> Donald C. Rio's lab
> Department of Molecular and Cell Biology
> 16 Barker Hall
> University of California
> Berkeley, CA 94720-3204
> 
> Tel: (510) 642-1084
> Cell: (510) 847-0996
> Fax: (510) 642-6062
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

Prof Brian Ripley

2005-Aug-03 07:30 UTC

head link

[R] regexpr and portability issue

On Tue, 2 Aug 2005, Marco Blanchette wrote:
> I am still forging my first arms with R and I am fighting with regexpr() as
> well as portability between unix and windoz. I need to extract barcodes
from
> filenames (which are located between a double and single underscore) as
well
> as the directory where the filename is residing. Here is the solution I
came
> to:
>
> aFileName <-
>
"/Users/marco/Desktop/diagnosticAnalysis/test/MA__251329410021_S01_A01.txt"
> t <- regexpr("__\\d*_",aFileName, perl=T)
> t.dir <- regexpr("^.*/", aFileName, perl=T)
> base.name <- substr(aFileName, t+2, t-2 +
attr(t,"match.length"))
> base.dir <- substr(aFileName, t.dir,
attr(t.dir,"match.length"))
>
> My questions are:
> 1) Is there a more elegant way to deal with regular expressions (read here:
> more easier, more like perl style).
Yes, use sub and backreferences.  An example from the R sources doing 
something similar:

             wfile <- sub("/chm/([^/]*)$", "", file)
             thispkg <- sub(".*/([^/]*)/chm/([^/]*)$",
"\\1", file)

However, R does have functions basename() and dirname() to do this!
> 2) I have a portability problem when I extract the base.dir Windoz is using
> '\' instead of '/' to separate directories.
That is misinformation: Windows (sic) accepts either / or \ (see the 
rw-FAQ and the R FAQ).  Use chartr("\\", "/", path) to map \
to /.

The `portability problem' appears to be of your own making -- take heart 
that R itself manages to manipulate filepaths portably.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Aug 2005 - regexpr and portability issue

[R] regexpr and portability issue

[R] regexpr and portability issue

[R] regexpr and portability issue

Apparently Analagous Threads