thr3ads.net - R help - [R] Regular Expression [Jul 2012]

If this information is useful, please help other people find it:
Share via:

Fred G

2012-Jul-24 17:36 UTC

[R] Regular Expression

Hi--

I have three columns in an input file:
MONTH   QUARTER  YEAR
2012-07   2012-3        2012
2001-07   2001-3        2001
2002-01   2002-1        2002

I want to make output like so:
MONTH   QUARTER  YEAR
07           3                2012
07           3                2001
01           1                2002

I was having some trouble getting the regular expression to work.  I think
it should be something like the following:
tmp <- uncurated$MONTH
*tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
*tmp[tmp=="-"] <- ""*
*curated$MONTH <- tmp*
*
*
tmp <- uncurated$QUARTER
*tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
*tmp[tmp=="-"] <- ""*
*curated$QUARTER <- tmp*
*
*
*but it's not quite working. I want to be able to isolate any digits that
occur after the hyphen and to delete everything before and including the
hyphen. Would greatly appreciate any clarification anyone can provide.*

	[[alternative HTML version deleted]]

R. Michael Weylandt

2012-Jul-24 17:41 UTC

head link

[R] Regular Expression

Hi Fred,

I'm no regex ninja (and I imagine one will be along shortly to solve
your problem) but in your case does it simply suffice to drop the
first 5 characters? That might be an easier sub() to write.

Best,
Michael

On Tue, Jul 24, 2012 at 12:36 PM, Fred G <bayespokerguy at gmail.com>
wrote:> Hi--
>
> I have three columns in an input file:
> MONTH   QUARTER  YEAR
> 2012-07   2012-3        2012
> 2001-07   2001-3        2001
> 2002-01   2002-1        2002
>
> I want to make output like so:
> MONTH   QUARTER  YEAR
> 07           3                2012
> 07           3                2001
> 01           1                2002
>
> I was having some trouble getting the regular expression to work.  I think
> it should be something like the following:
> tmp <- uncurated$MONTH
> *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$MONTH <- tmp*
> *
> *
> tmp <- uncurated$QUARTER
> *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$QUARTER <- tmp*
> *
> *
> *but it's not quite working. I want to be able to isolate any digits
that
> occur after the hyphen and to delete everything before and including the
> hyphen. Would greatly appreciate any clarification anyone can provide.*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Sarah Goslee

2012-Jul-24 17:42 UTC

head link

[R] Regular Expression

To delete everything from the beginning of the string to and including
the hyphen, use
sub("^.*-", "", tmp)

Sarah

On Tue, Jul 24, 2012 at 1:36 PM, Fred G <bayespokerguy at gmail.com>
wrote:> Hi--
>
> I have three columns in an input file:
> MONTH   QUARTER  YEAR
> 2012-07   2012-3        2012
> 2001-07   2001-3        2001
> 2002-01   2002-1        2002
>
> I want to make output like so:
> MONTH   QUARTER  YEAR
> 07           3                2012
> 07           3                2001
> 01           1                2002
>
> I was having some trouble getting the regular expression to work.  I think
> it should be something like the following:
> tmp <- uncurated$MONTH
> *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$MONTH <- tmp*
> *
> *
> tmp <- uncurated$QUARTER
> *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$QUARTER <- tmp*
> *
> *
> *but it's not quite working. I want to be able to isolate any digits
that
> occur after the hyphen and to delete everything before and including the
> hyphen. Would greatly appreciate any clarification anyone can provide.*
>
>         [[alternative HTML version deleted]]
>
> ______________
-- 
Sarah Goslee
http://www.functionaldiversity.org

jose Bartolomei

2012-Jul-24 17:50 UTC

head link

[R] Regular Expression

If you want that output.....

substr()

Can help in your task too.

I can not help with regular expression, I will learn too.

> Date: Tue, 24 Jul 2012 13:36:25 -0400
> From: bayespokerguy@gmail.com
> To: r-help@r-project.org
> Subject: [R] Regular Expression
> 
> Hi--
> 
> I have three columns in an input file:
> MONTH   QUARTER  YEAR
> 2012-07   2012-3        2012
> 2001-07   2001-3        2001
> 2002-01   2002-1        2002
> 
> I want to make output like so:
> MONTH   QUARTER  YEAR
> 07           3                2012
> 07           3                2001
> 01           1                2002
> 
> I was having some trouble getting the regular expression to work.  I think
> it should be something like the following:
> tmp <- uncurated$MONTH
> *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$MONTH <- tmp*
> *
> *
> tmp <- uncurated$QUARTER
> *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$QUARTER <- tmp*
> *
> *
> *but it's not quite working. I want to be able to isolate any digits
that
> occur after the hyphen and to delete everything before and including the
> hyphen. Would greatly appreciate any clarification anyone can provide.*
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 		 	   		  
	[[alternative HTML version deleted]]

Henrik Singmann

2012-Jul-24 17:52 UTC

head link

[R] Regular Expression

Hi,

one problem, many solutions, only one of which uses regular expression but work
equally well.

dat1<-read.table(text="
MONTH   QUARTER  YEAR
2012-07   2012-3        2012
2001-07   2001-3        2001
2002-01   2002-1        2002
",sep="",as.is = TRUE, header=TRUE)

# using substr:
substr(dat1$MONTH, 6,7)
substr(dat1$QUARTER, 6,7)

# using strsplit:
vapply(strsplit(dat1$MONTH, "-"), "[", i = 2, "")
vapply(strsplit(dat1$QUARTER, "-"), "[", i = 2,
"")

# using sub:
sub("[[:digit:]]*-", "", dat1$MONTH)
sub("[[:digit:]]*-", "", dat1$QUARTER)

all produce the desired outcome.
[1] "07" "07" "01"
and
[1] "3" "3" "1"

IF the data is regularly like this, I personally would prefer substr.

Cheers,
Henrik





Am 24.07.2012 19:36, schrieb Fred G:> Hi--
>
> I have three columns in an input file:
> MONTH   QUARTER  YEAR
> 2012-07   2012-3        2012
> 2001-07   2001-3        2001
> 2002-01   2002-1        2002
>
> I want to make output like so:
> MONTH   QUARTER  YEAR
> 07           3                2012
> 07           3                2001
> 01           1                2002
>
> I was having some trouble getting the regular expression to work.  I think
> it should be something like the following:
> tmp <- uncurated$MONTH
> *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$MONTH <- tmp*
> *
> *
> tmp <- uncurated$QUARTER
> *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$QUARTER <- tmp*
> *
> *
> *but it's not quite working. I want to be able to isolate any digits
that
> occur after the hyphen and to delete everything before and including the
> hyphen. Would greatly appreciate any clarification anyone can provide.*
>
> 	[[alternative HTML version deleted]]
>
-- 
Dipl. Psych. Henrik Singmann
PhD Student
Albert-Ludwigs-Universit?t Freiburg, Germany
http://www.psychologie.uni-freiburg.de/Members/singmann

jim holtman

2012-Jul-24 17:54 UTC

head link

[R] Regular Expression

Is this what you want:
> x <- read.table(text = "MONTH   QUARTER  YEAR+ 2012-07   2012-3        2012
+ 2001-07   2001-3        2001
+ 2002-01   2002-1        2002", header = TRUE, as.is =
TRUE)> x    MONTH QUARTER YEAR
1 2012-07  2012-3 2012
2 2001-07  2001-3 2001
3 2002-01  2002-1 2002> x$MONTH <- sub(".*-(.*)", "\\1", x$MONTH)
> x$QUARTER <- sub(".*-(.*)", "\\1", x$QUARTER)
> x  MONTH QUARTER YEAR
1    07       3 2012
2    07       3 2001
3    01       1 2002>
>

On Tue, Jul 24, 2012 at 1:36 PM, Fred G <bayespokerguy at gmail.com>
wrote:> Hi--
>
> I have three columns in an input file:
> MONTH   QUARTER  YEAR
> 2012-07   2012-3        2012
> 2001-07   2001-3        2001
> 2002-01   2002-1        2002
>
> I want to make output like so:
> MONTH   QUARTER  YEAR
> 07           3                2012
> 07           3                2001
> 01           1                2002
>
> I was having some trouble getting the regular expression to work.  I think
> it should be something like the following:
> tmp <- uncurated$MONTH
> *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$MONTH <- tmp*
> *
> *
> tmp <- uncurated$QUARTER
> *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$QUARTER <- tmp*
> *
> *
> *but it's not quite working. I want to be able to isolate any digits
that
> occur after the hyphen and to delete everything before and including the
> hyphen. Would greatly appreciate any clarification anyone can provide.*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

arun

2012-Jul-24 17:55 UTC

head link

[R] Regular Expression

Hi,


Try this:

dat1$MONTH<- gsub("^[0-9]+\\-","",dat1$MONTH)
[1] "07" "07" "01"
dat1$QUARTER<- gsub("^[0-9]+\\-","",dat1$QUARTER)
[1] "3" "3" "1"
dat1
? MONTH QUARTER YEAR
1??? 07?????? 3 2012
2??? 07?????? 3 2001
3??? 01?????? 1 2002

A.K.



----- Original Message -----
From: Fred G <bayespokerguy at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, July 24, 2012 1:36 PM
Subject: [R] Regular Expression

Hi--

I have three columns in an input file:
MONTH?  QUARTER? YEAR
2012-07?  2012-3? ? ? ? 2012
2001-07?  2001-3? ? ? ? 2001
2002-01?  2002-1? ? ? ? 2002

I want to make output like so:
MONTH?  QUARTER? YEAR
07? ? ? ? ?  3? ? ? ? ? ? ? ? 2012
07? ? ? ? ?  3? ? ? ? ? ? ? ? 2001
01? ? ? ? ?  1? ? ? ? ? ? ? ? 2002

I was having some trouble getting the regular expression to work.? I think
it should be something like the following:
tmp <- uncurated$MONTH
*tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
*tmp[tmp=="-"] <- ""*
*curated$MONTH <- tmp*
*
*
tmp <- uncurated$QUARTER
*tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
*tmp[tmp=="-"] <- ""*
*curated$QUARTER <- tmp*
*
*
*but it's not quite working. I want to be able to isolate any digits that
occur after the hyphen and to delete everything before and including the
hyphen. Would greatly appreciate any clarification anyone can provide.*

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

2012-Jul-24 17:57 UTC

head link

[R] Regular Expression

Hello,

I believe the following will do it.


d <- read.table(text="
MONTH   QUARTER  YEAR
2012-07   2012-3        2012
2001-07   2001-3        2001
2002-01   2002-1        2002
", header=TRUE)

search <- "^.*-([[:digit:]]+)$"
sapply(d, function(x) as.integer(sub(search, "\\1", x)))


Hope this helps,

Rui Barradas

Em 24-07-2012 18:36, Fred G escreveu:> Hi--
>
> I have three columns in an input file:
> MONTH   QUARTER  YEAR
> 2012-07   2012-3        2012
> 2001-07   2001-3        2001
> 2002-01   2002-1        2002
>
> I want to make output like so:
> MONTH   QUARTER  YEAR
> 07           3                2012
> 07           3                2001
> 01           1                2002
>
> I was having some trouble getting the regular expression to work.  I think
> it should be something like the following:
> tmp <- uncurated$MONTH
> *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$MONTH <- tmp*
> *
> *
> tmp <- uncurated$QUARTER
> *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$QUARTER <- tmp*
> *
> *
> *but it's not quite working. I want to be able to isolate any digits
that
> occur after the hyphen and to delete everything before and including the
> hyphen. Would greatly appreciate any clarification anyone can provide.*
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David L Carlson

2012-Jul-24 18:03 UTC

head link

[R] Regular Expression

If they are all formatted as your example, substr() would be simpler:

MONTH <- c("2012-07", "2001-07", "2002-01")
QUARTER <- c("2012-3", "2001-3", "2002-1")
YEAR <- c(2013, 2001, 2002)
Inp <- data.frame(MONTH, QUARTER, YEAR)
Out <- data.frame(MONTH=substr(MONTH, 6, 8),
     QUARTER=substr(QUARTER, 6, 7), YEAR)

This assumes MONTH and QUARTER are character strings and not dates.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Fred G
> Sent: Tuesday, July 24, 2012 12:36 PM
> To: r-help at r-project.org
> Subject: [R] Regular Expression
> 
> Hi--
> 
> I have three columns in an input file:
> MONTH   QUARTER  YEAR
> 2012-07   2012-3        2012
> 2001-07   2001-3        2001
> 2002-01   2002-1        2002
> 
> I want to make output like so:
> MONTH   QUARTER  YEAR
> 07           3                2012
> 07           3                2001
> 01           1                2002
> 
> I was having some trouble getting the regular expression to work.  I
> think
> it should be something like the following:
> tmp <- uncurated$MONTH
> *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$MONTH <- tmp*
> *
> *
> tmp <- uncurated$QUARTER
> *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)*
> *tmp[tmp=="-"] <- ""*
> *curated$QUARTER <- tmp*
> *
> *
> *but it's not quite working. I want to be able to isolate any digits
> that
> occur after the hyphen and to delete everything before and including
> the
> hyphen. Would greatly appreciate any clarification anyone can provide.*
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Gabor Grothendieck

2012-Jul-24 18:28 UTC

head link

[R] Regular Expression

On Tue, Jul 24, 2012 at 1:36 PM, Fred G <bayespokerguy at gmail.com>
wrote:> Hi--
>
> I have three columns in an input file:
> MONTH   QUARTER  YEAR
> 2012-07   2012-3        2012
> 2001-07   2001-3        2001
> 2002-01   2002-1        2002
>
> I want to make output like so:
> MONTH   QUARTER  YEAR
> 07           3                2012
> 07           3                2001
> 01           1                2002
>
Normally there is no need to store components of the date.  Its
usually easier to just extract what you need on the fly.  Since you
only seem to need the year, quarter and month if DF is your data frame
you can store the date as a yearmon class object which is rich enough
to contain everything else so you don't really need the MONTH, QUARTER
and YEAR columns making everything simpler.

library(zoo)
ym <- as.yearmon(DF$MONTH)

Now the year, quarter and month are:

floor(ym)
format(as.yearqtr(ym), "%q")
format(ym, "%m")

The last two return character strings which is likely ok but if you
need them as numeric then use as.numeric(format(ym, "%m")) and
similarly for the quarter.

This does not involve regular expressions or intricate character manipulation.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Reasonably Related Threads

Search for more apparently analagous threads

R help - Jul 2012 - Regular Expression

[R] Regular Expression

[R] Regular Expression

[R] Regular Expression

[R] Regular Expression

[R] Regular Expression

[R] Regular Expression

[R] Regular Expression

[R] Regular Expression

[R] Regular Expression

[R] Regular Expression

Reasonably Related Threads