thr3ads.net - R help - [R] Regular expressions on filenames [Jan 2014]

If this information is useful, please help other people find it:
Share via:

Fisher Dennis

2014-Jan-16 00:37 UTC

[R] Regular expressions on filenames

R 3.0.2
OS X

Colleagues

I am writing code to read a large number of files in a particular folder.  In
some situations, there may be two versions of the file with different
extensions, e.g.:
	FILE.csv
	FILE.xls
I extracted the portion before the extension with:
	sub("\\..*$", "", basename(FILELIST))
then used 
	duplicated
to find duplicates.  All was well until I encountered files named:
	FILE.XXX.csv
	FILE.YYY.xls

My regular expression extracted only the ?FILE? portion of the text and claimed
that the filenames (without the extensions) matched.  Can someone provide me
with the appropriate regular expression to deal with this?  Thanks.

Dennis


Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

Wojtek Poppe

2014-Jan-16 00:40 UTC

head link

[R] Regular expressions on filenames

Try  sub("\\.[^.]+$", "",  basename(FILELIST))

Thanks,
Wojtek


On Wed, Jan 15, 2014 at 4:37 PM, Fisher Dennis <fisher@plessthan.com>
wrote:
> R 3.0.2
> OS X
>
> Colleagues
>
> I am writing code to read a large number of files in a particular folder.
>  In some situations, there may be two versions of the file with different
> extensions, e.g.:
>         FILE.csv
>         FILE.xls
> I extracted the portion before the extension with:
>         sub("\\..*$", "", basename(FILELIST))
> then used
>         duplicated
> to find duplicates.  All was well until I encountered files named:
>         FILE.XXX.csv
>         FILE.YYY.xls
>
> My regular expression extracted only the “FILE” portion of the text and
> claimed that the filenames (without the extensions) matched.  Can someone
> provide me with the appropriate regular expression to deal with this?
>  Thanks.
>
> Dennis
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

jim holtman

2014-Jan-16 00:46 UTC

head link

[R] Regular expressions on filenames

try this:
> x <- c(  "FILE.XXX.csv"
+         , "FILE.YYY.xls")> sub("\\.[^.]*$", "", x)
[1] "FILE.XXX" "FILE.YYY">
the '[^.]*' says to match anything BUT a period.

Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Wed, Jan 15, 2014 at 7:37 PM, Fisher Dennis <fisher at plessthan.com>
wrote:> R 3.0.2
> OS X
>
> Colleagues
>
> I am writing code to read a large number of files in a particular folder. 
In some situations, there may be two versions of the file with different
extensions, e.g.:
>         FILE.csv
>         FILE.xls
> I extracted the portion before the extension with:
>         sub("\\..*$", "", basename(FILELIST))
> then used
>         duplicated
> to find duplicates.  All was well until I encountered files named:
>         FILE.XXX.csv
>         FILE.YYY.xls
>
> My regular expression extracted only the ?FILE? portion of the text and
claimed that the filenames (without the extensions) matched.  Can someone
provide me with the appropriate regular expression to deal with this?  Thanks.
>
> Dennis
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

arun

2014-Jan-16 00:48 UTC

head link

[R] Regular expressions on filenames

Hi,
Try:
?FILELIST <- list.files()
FILELIST 
#[1] "FILE.csv"???? "FILE.XXX.csv" "FILE.YYY.xls"

? sub("(.*)\\..*$", "\\1", basename(FILELIST))
#[1] "FILE"???? "FILE.XXX" "FILE.YYY"


A.K.


On Wednesday, January 15, 2014 7:35 PM, Fisher Dennis <fisher at
plessthan.com> wrote:
R 3.0.2
OS X

Colleagues

I am writing code to read a large number of files in a particular folder.? In
some situations, there may be two versions of the file with different
extensions, e.g.:
??? FILE.csv
??? FILE.xls
I extracted the portion before the extension with:
??? sub("\\..*$", "", basename(FILELIST))
then used 
??? duplicated
to find duplicates.? All was well until I encountered files named:
??? FILE.XXX.csv
??? FILE.YYY.xls

My regular expression extracted only the ?FILE? portion of the text and claimed
that the filenames (without the extensions) matched.? Can someone provide me
with the appropriate regular expression to deal with this?? Thanks.

Dennis


Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2014-Jan-16 01:01 UTC

head link

[R] Regular expressions on filenames

You want to match a period and anything that follows to the end of the string,
as long as what follows has no period in it.
"\\.[^.]*$"
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Fisher Dennis <fisher at plessthan.com> wrote:>R 3.0.2
>OS X
>
>Colleagues
>
>I am writing code to read a large number of files in a particular
>folder.  In some situations, there may be two versions of the file with
>different extensions, e.g.:
>	FILE.csv
>	FILE.xls
>I extracted the portion before the extension with:
>	sub("\\..*$", "", basename(FILELIST))
>then used 
>	duplicated
>to find duplicates.  All was well until I encountered files named:
>	FILE.XXX.csv
>	FILE.YYY.xls
>
>My regular expression extracted only the ?FILE? portion of the text and
>claimed that the filenames (without the extensions) matched.  Can
>someone provide me with the appropriate regular expression to deal with
>this?  Thanks.
>
>Dennis
>
>
>Dennis Fisher MD
>P < (The "P Less Than" Company)
>Phone: 1-866-PLessThan (1-866-753-7784)
>Fax: 1-866-PLessThan (1-866-753-7784)
>www.PLessThan.com
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2014-Jan-16 01:56 UTC

head link

[R] Regular expressions on filenames

On Jan 15, 2014, at 4:37 PM, Fisher Dennis wrote:
> R 3.0.2
> OS X
> 
> Colleagues
> 
> I am writing code to read a large number of files in a particular folder. 
In some situations, there may be two versions of the file with different
extensions, e.g.:
> 	FILE.csv
> 	FILE.xls
> I extracted the portion before the extension with:
> 	sub("\\..*$", "", basename(FILELIST))
> then used 
> 	duplicated
> to find duplicates.  All was well until I encountered files named:
> 	FILE.XXX.csv
> 	FILE.YYY.xls
> 
> My regular expression extracted only the ?FILE? portion of the text and
claimed that the filenames (without the extensions) matched.  Can someone
provide me with the appropriate regular expression to deal with this?  Thanks.
Why not:

sub("\\..{3}$", "", basename(FILELIST))

See ?regex

-- 

David Winsemius
Alameda, CA, USA

R help - Jan 2014 - Regular expressions on filenames

[R] Regular expressions on filenames

[R] Regular expressions on filenames

[R] Regular expressions on filenames

[R] Regular expressions on filenames

[R] Regular expressions on filenames

[R] Regular expressions on filenames