Hi-- I have three columns in an input file: MONTH QUARTER YEAR 2012-07 2012-3 2012 2001-07 2001-3 2001 2002-01 2002-1 2002 I want to make output like so: MONTH QUARTER YEAR 07 3 2012 07 3 2001 01 1 2002 I was having some trouble getting the regular expression to work. I think it should be something like the following: tmp <- uncurated$MONTH *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* *tmp[tmp=="-"] <- ""* *curated$MONTH <- tmp* * * tmp <- uncurated$QUARTER *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* *tmp[tmp=="-"] <- ""* *curated$QUARTER <- tmp* * * *but it's not quite working. I want to be able to isolate any digits that occur after the hyphen and to delete everything before and including the hyphen. Would greatly appreciate any clarification anyone can provide.* [[alternative HTML version deleted]]
Hi Fred, I'm no regex ninja (and I imagine one will be along shortly to solve your problem) but in your case does it simply suffice to drop the first 5 characters? That might be an easier sub() to write. Best, Michael On Tue, Jul 24, 2012 at 12:36 PM, Fred G <bayespokerguy at gmail.com> wrote:> Hi-- > > I have three columns in an input file: > MONTH QUARTER YEAR > 2012-07 2012-3 2012 > 2001-07 2001-3 2001 > 2002-01 2002-1 2002 > > I want to make output like so: > MONTH QUARTER YEAR > 07 3 2012 > 07 3 2001 > 01 1 2002 > > I was having some trouble getting the regular expression to work. I think > it should be something like the following: > tmp <- uncurated$MONTH > *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$MONTH <- tmp* > * > * > tmp <- uncurated$QUARTER > *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$QUARTER <- tmp* > * > * > *but it's not quite working. I want to be able to isolate any digits that > occur after the hyphen and to delete everything before and including the > hyphen. Would greatly appreciate any clarification anyone can provide.* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
To delete everything from the beginning of the string to and including the hyphen, use sub("^.*-", "", tmp) Sarah On Tue, Jul 24, 2012 at 1:36 PM, Fred G <bayespokerguy at gmail.com> wrote:> Hi-- > > I have three columns in an input file: > MONTH QUARTER YEAR > 2012-07 2012-3 2012 > 2001-07 2001-3 2001 > 2002-01 2002-1 2002 > > I want to make output like so: > MONTH QUARTER YEAR > 07 3 2012 > 07 3 2001 > 01 1 2002 > > I was having some trouble getting the regular expression to work. I think > it should be something like the following: > tmp <- uncurated$MONTH > *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$MONTH <- tmp* > * > * > tmp <- uncurated$QUARTER > *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$QUARTER <- tmp* > * > * > *but it's not quite working. I want to be able to isolate any digits that > occur after the hyphen and to delete everything before and including the > hyphen. Would greatly appreciate any clarification anyone can provide.* > > [[alternative HTML version deleted]] > > ______________-- Sarah Goslee http://www.functionaldiversity.org
If you want that output..... substr() Can help in your task too. I can not help with regular expression, I will learn too.> Date: Tue, 24 Jul 2012 13:36:25 -0400 > From: bayespokerguy@gmail.com > To: r-help@r-project.org > Subject: [R] Regular Expression > > Hi-- > > I have three columns in an input file: > MONTH QUARTER YEAR > 2012-07 2012-3 2012 > 2001-07 2001-3 2001 > 2002-01 2002-1 2002 > > I want to make output like so: > MONTH QUARTER YEAR > 07 3 2012 > 07 3 2001 > 01 1 2002 > > I was having some trouble getting the regular expression to work. I think > it should be something like the following: > tmp <- uncurated$MONTH > *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$MONTH <- tmp* > * > * > tmp <- uncurated$QUARTER > *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$QUARTER <- tmp* > * > * > *but it's not quite working. I want to be able to isolate any digits that > occur after the hyphen and to delete everything before and including the > hyphen. Would greatly appreciate any clarification anyone can provide.* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Hi, one problem, many solutions, only one of which uses regular expression but work equally well. dat1<-read.table(text=" MONTH QUARTER YEAR 2012-07 2012-3 2012 2001-07 2001-3 2001 2002-01 2002-1 2002 ",sep="",as.is = TRUE, header=TRUE) # using substr: substr(dat1$MONTH, 6,7) substr(dat1$QUARTER, 6,7) # using strsplit: vapply(strsplit(dat1$MONTH, "-"), "[", i = 2, "") vapply(strsplit(dat1$QUARTER, "-"), "[", i = 2, "") # using sub: sub("[[:digit:]]*-", "", dat1$MONTH) sub("[[:digit:]]*-", "", dat1$QUARTER) all produce the desired outcome. [1] "07" "07" "01" and [1] "3" "3" "1" IF the data is regularly like this, I personally would prefer substr. Cheers, Henrik Am 24.07.2012 19:36, schrieb Fred G:> Hi-- > > I have three columns in an input file: > MONTH QUARTER YEAR > 2012-07 2012-3 2012 > 2001-07 2001-3 2001 > 2002-01 2002-1 2002 > > I want to make output like so: > MONTH QUARTER YEAR > 07 3 2012 > 07 3 2001 > 01 1 2002 > > I was having some trouble getting the regular expression to work. I think > it should be something like the following: > tmp <- uncurated$MONTH > *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$MONTH <- tmp* > * > * > tmp <- uncurated$QUARTER > *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$QUARTER <- tmp* > * > * > *but it's not quite working. I want to be able to isolate any digits that > occur after the hyphen and to delete everything before and including the > hyphen. Would greatly appreciate any clarification anyone can provide.* > > [[alternative HTML version deleted]] >-- Dipl. Psych. Henrik Singmann PhD Student Albert-Ludwigs-Universit?t Freiburg, Germany http://www.psychologie.uni-freiburg.de/Members/singmann
Is this what you want:> x <- read.table(text = "MONTH QUARTER YEAR+ 2012-07 2012-3 2012 + 2001-07 2001-3 2001 + 2002-01 2002-1 2002", header = TRUE, as.is = TRUE)> xMONTH QUARTER YEAR 1 2012-07 2012-3 2012 2 2001-07 2001-3 2001 3 2002-01 2002-1 2002> x$MONTH <- sub(".*-(.*)", "\\1", x$MONTH) > x$QUARTER <- sub(".*-(.*)", "\\1", x$QUARTER) > xMONTH QUARTER YEAR 1 07 3 2012 2 07 3 2001 3 01 1 2002> >On Tue, Jul 24, 2012 at 1:36 PM, Fred G <bayespokerguy at gmail.com> wrote:> Hi-- > > I have three columns in an input file: > MONTH QUARTER YEAR > 2012-07 2012-3 2012 > 2001-07 2001-3 2001 > 2002-01 2002-1 2002 > > I want to make output like so: > MONTH QUARTER YEAR > 07 3 2012 > 07 3 2001 > 01 1 2002 > > I was having some trouble getting the regular expression to work. I think > it should be something like the following: > tmp <- uncurated$MONTH > *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$MONTH <- tmp* > * > * > tmp <- uncurated$QUARTER > *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$QUARTER <- tmp* > * > * > *but it's not quite working. I want to be able to isolate any digits that > occur after the hyphen and to delete everything before and including the > hyphen. Would greatly appreciate any clarification anyone can provide.* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
Hi, Try this: dat1$MONTH<- gsub("^[0-9]+\\-","",dat1$MONTH) [1] "07" "07" "01" dat1$QUARTER<- gsub("^[0-9]+\\-","",dat1$QUARTER) [1] "3" "3" "1" dat1 ? MONTH QUARTER YEAR 1??? 07?????? 3 2012 2??? 07?????? 3 2001 3??? 01?????? 1 2002 A.K. ----- Original Message ----- From: Fred G <bayespokerguy at gmail.com> To: r-help at r-project.org Cc: Sent: Tuesday, July 24, 2012 1:36 PM Subject: [R] Regular Expression Hi-- I have three columns in an input file: MONTH? QUARTER? YEAR 2012-07? 2012-3? ? ? ? 2012 2001-07? 2001-3? ? ? ? 2001 2002-01? 2002-1? ? ? ? 2002 I want to make output like so: MONTH? QUARTER? YEAR 07? ? ? ? ? 3? ? ? ? ? ? ? ? 2012 07? ? ? ? ? 3? ? ? ? ? ? ? ? 2001 01? ? ? ? ? 1? ? ? ? ? ? ? ? 2002 I was having some trouble getting the regular expression to work.? I think it should be something like the following: tmp <- uncurated$MONTH *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* *tmp[tmp=="-"] <- ""* *curated$MONTH <- tmp* * * tmp <- uncurated$QUARTER *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* *tmp[tmp=="-"] <- ""* *curated$QUARTER <- tmp* * * *but it's not quite working. I want to be able to isolate any digits that occur after the hyphen and to delete everything before and including the hyphen. Would greatly appreciate any clarification anyone can provide.* ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, I believe the following will do it. d <- read.table(text=" MONTH QUARTER YEAR 2012-07 2012-3 2012 2001-07 2001-3 2001 2002-01 2002-1 2002 ", header=TRUE) search <- "^.*-([[:digit:]]+)$" sapply(d, function(x) as.integer(sub(search, "\\1", x))) Hope this helps, Rui Barradas Em 24-07-2012 18:36, Fred G escreveu:> Hi-- > > I have three columns in an input file: > MONTH QUARTER YEAR > 2012-07 2012-3 2012 > 2001-07 2001-3 2001 > 2002-01 2002-1 2002 > > I want to make output like so: > MONTH QUARTER YEAR > 07 3 2012 > 07 3 2001 > 01 1 2002 > > I was having some trouble getting the regular expression to work. I think > it should be something like the following: > tmp <- uncurated$MONTH > *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$MONTH <- tmp* > * > * > tmp <- uncurated$QUARTER > *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$QUARTER <- tmp* > * > * > *but it's not quite working. I want to be able to isolate any digits that > occur after the hyphen and to delete everything before and including the > hyphen. Would greatly appreciate any clarification anyone can provide.* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
If they are all formatted as your example, substr() would be simpler: MONTH <- c("2012-07", "2001-07", "2002-01") QUARTER <- c("2012-3", "2001-3", "2002-1") YEAR <- c(2013, 2001, 2002) Inp <- data.frame(MONTH, QUARTER, YEAR) Out <- data.frame(MONTH=substr(MONTH, 6, 8), QUARTER=substr(QUARTER, 6, 7), YEAR) This assumes MONTH and QUARTER are character strings and not dates. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Fred G > Sent: Tuesday, July 24, 2012 12:36 PM > To: r-help at r-project.org > Subject: [R] Regular Expression > > Hi-- > > I have three columns in an input file: > MONTH QUARTER YEAR > 2012-07 2012-3 2012 > 2001-07 2001-3 2001 > 2002-01 2002-1 2002 > > I want to make output like so: > MONTH QUARTER YEAR > 07 3 2012 > 07 3 2001 > 01 1 2002 > > I was having some trouble getting the regular expression to work. I > think > it should be something like the following: > tmp <- uncurated$MONTH > *tmp <- gsub("[^-\\d\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$MONTH <- tmp* > * > * > tmp <- uncurated$QUARTER > *tmp <- gsub("[^-\\d]","",tmp,perl=TRUE)* > *tmp[tmp=="-"] <- ""* > *curated$QUARTER <- tmp* > * > * > *but it's not quite working. I want to be able to isolate any digits > that > occur after the hyphen and to delete everything before and including > the > hyphen. Would greatly appreciate any clarification anyone can provide.* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
On Tue, Jul 24, 2012 at 1:36 PM, Fred G <bayespokerguy at gmail.com> wrote:> Hi-- > > I have three columns in an input file: > MONTH QUARTER YEAR > 2012-07 2012-3 2012 > 2001-07 2001-3 2001 > 2002-01 2002-1 2002 > > I want to make output like so: > MONTH QUARTER YEAR > 07 3 2012 > 07 3 2001 > 01 1 2002 >Normally there is no need to store components of the date. Its usually easier to just extract what you need on the fly. Since you only seem to need the year, quarter and month if DF is your data frame you can store the date as a yearmon class object which is rich enough to contain everything else so you don't really need the MONTH, QUARTER and YEAR columns making everything simpler. library(zoo) ym <- as.yearmon(DF$MONTH) Now the year, quarter and month are: floor(ym) format(as.yearqtr(ym), "%q") format(ym, "%m") The last two return character strings which is likely ok but if you need them as numeric then use as.numeric(format(ym, "%m")) and similarly for the quarter. This does not involve regular expressions or intricate character manipulation. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Apparently Analagous Threads
- Matrix Question
- Creating a file with reusable functions accessible throughout a computational biology cancer project
- Error in read.table(file = file, header = header, sep = sep, quote = quote, : more columns than column names
- Problematic If-Else statement
- regular expression