Dear group, Here is my data.frame : avprix <- structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10", "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE ZINC USD Jul/10", "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5, -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names = c("DESCRIPTION", "prix", "quantity"), row.names = c(NA, -6L), class = "data.frame")> avprixDESCRIPTION prix quantity 1 CORN Jul/10 -1.5 0 2 CORN May/10 -1082.0 -3 3 ROBUSTA COFFEE (10) Jul/10 11084.0 8 4 SOYBEANS Jul/10 1983.5 2 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0 -1 6 STANDARD LEAD USD Jul/10 -118.0 0 I need to remove the date (i.e. Jul/10 in this example) for each element of the DESCRIPTION column that contains the USD symbol. I am trying to do this using regular expressions, but must admit I am going nowhere. My elements in the DESCRIPTION column and the dates can change every day. TY for any help.
On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:> Dear group, > > Here is my data.frame : > > avprix <- > structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10", > "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE > ZINC USD > Jul/10", > "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5, > -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names = > c("DESCRIPTION", > "prix", "quantity"), row.names = c(NA, -6L), class = "data.frame") > >> avprix > DESCRIPTION prix quantity > 1 CORN Jul/10 -1.5 0 > 2 CORN May/10 -1082.0 -3 > 3 ROBUSTA COFFEE (10) Jul/10 11084.0 8 > 4 SOYBEANS Jul/10 1983.5 2 > 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0 -1 > 6 STANDARD LEAD USD Jul/10 -118.0 0 > > I need to remove the date (i.e. Jul/10 in this example) for each > element of > the DESCRIPTION column that contains the USD symbol. I am trying to > do this > using regular expressions, but must admit I am going nowhere. > My elements in the DESCRIPTION column and the dates can change every > day.This searches for the pattern USD and then replaces any three characters , forward-slash, any two characters: > sub("USD+.*(.../..)", "", avprix$DESCRIPTION) [1] "CORN Jul/10" "CORN May/10" "ROBUSTA COFFEE (10) Jul/10" [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC " "STANDARD LEAD " This tightens up the matching by requiring that that the characters after the slash be digits: > sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION) [1] "CORN Jul/10" "CORN May/10" "ROBUSTA COFFEE (10) Jul/10" [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC " "STANDARD LEAD " -- David. >> > TY for any help. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
TY so much david. We are getting close. But I need to keep "USD" in my object name (i.e "STANDARD LEAD USD") *************************** Arnaud Gaboury Mobile: +41 79 392 79 56 BBM: 255B488F ***************************> -----Original Message----- > From: David Winsemius [mailto:dwinsemius at comcast.net] > Sent: Wednesday, April 28, 2010 2:25 PM > To: arnaud Gaboury > Cc: r-help at r-project.org > Subject: Re: [R] data frame manipulation and regex > > > On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote: > > > Dear group, > > > > Here is my data.frame : > > > > avprix <- > > structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10", > > "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE > > ZINC USD > > Jul/10", > > "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5, > > -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names > > c("DESCRIPTION", > > "prix", "quantity"), row.names = c(NA, -6L), class = "data.frame") > > > >> avprix > > DESCRIPTION prix quantity > > 1 CORN Jul/10 -1.5 0 > > 2 CORN May/10 -1082.0 -3 > > 3 ROBUSTA COFFEE (10) Jul/10 11084.0 8 > > 4 SOYBEANS Jul/10 1983.5 2 > > 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0 -1 > > 6 STANDARD LEAD USD Jul/10 -118.0 0 > > > > I need to remove the date (i.e. Jul/10 in this example) for each > > element of > > the DESCRIPTION column that contains the USD symbol. I am trying to > > do this > > using regular expressions, but must admit I am going nowhere. > > My elements in the DESCRIPTION column and the dates can change every > > day. > > This searches for the pattern USD and then replaces any three > characters , forward-slash, any two characters: > > sub("USD+.*(.../..)", "", avprix$DESCRIPTION) > [1] "CORN Jul/10" "CORN May/10" "ROBUSTA > COFFEE (10) Jul/10" > [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC " > "STANDARD LEAD " > > This tightens up the matching by requiring that that the characters > after the slash be digits: > > > sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION) > [1] "CORN Jul/10" "CORN May/10" "ROBUSTA > COFFEE (10) Jul/10" > [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC " > "STANDARD LEAD " > > -- David. > > > > > > > > TY for any help. > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT
On Apr 28, 2010, at 8:30 AM, arnaud Gaboury wrote:> TY so much david. We are getting close. But I need to keep "USD" in my > object name (i.e "STANDARD LEAD USD") >> sub("USD+.*.(.../\\d{2})", "USD", avprix$DESCRIPTION) [1] "CORN Jul/10" "CORN May/10" "ROBUSTA COFFEE (10) Jul/10" [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC USD" "STANDARD LEAD USD" > I had been attempting (unsuccessfully to get the portion within hte parens to be the replaced string; This also works and has hte side effect of keeping hte \n that I had not intended to remove from the 5th item: > sub("(USD+.*).../\\d{2}", "\\1", avprix$DESCRIPTION) [1] "CORN Jul/10" "CORN May/10" "ROBUSTA COFFEE (10) Jul/10" [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC USD\n" "STANDARD LEAD USD " -- David> > > *************************** > Arnaud Gaboury > Mobile: +41 79 392 79 56 > BBM: 255B488F > *************************** > > >> -----Original Message----- >> From: David Winsemius [mailto:dwinsemius at comcast.net] >> Sent: Wednesday, April 28, 2010 2:25 PM >> To: arnaud Gaboury >> Cc: r-help at r-project.org >> Subject: Re: [R] data frame manipulation and regex >> >> >> On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote: >> >>> Dear group, >>> >>> Here is my data.frame : >>> >>> avprix <- >>> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10", >>> "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE >>> ZINC USD >>> Jul/10", >>> "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5, >>> -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names >>> c("DESCRIPTION", >>> "prix", "quantity"), row.names = c(NA, -6L), class = "data.frame") >>> >>>> avprix >>> DESCRIPTION prix quantity >>> 1 CORN Jul/10 -1.5 0 >>> 2 CORN May/10 -1082.0 -3 >>> 3 ROBUSTA COFFEE (10) Jul/10 11084.0 8 >>> 4 SOYBEANS Jul/10 1983.5 2 >>> 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0 -1 >>> 6 STANDARD LEAD USD Jul/10 -118.0 0 >>> >>> I need to remove the date (i.e. Jul/10 in this example) for each >>> element of >>> the DESCRIPTION column that contains the USD symbol. I am trying to >>> do this >>> using regular expressions, but must admit I am going nowhere. >>> My elements in the DESCRIPTION column and the dates can change every >>> day. >> >> This searches for the pattern USD and then replaces any three >> characters , forward-slash, any two characters: >>> sub("USD+.*(.../..)", "", avprix$DESCRIPTION) >> [1] "CORN Jul/10" "CORN May/10" >> "ROBUSTA >> COFFEE (10) Jul/10" >> [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC " >> "STANDARD LEAD " >> >> This tightens up the matching by requiring that that the characters >> after the slash be digits: >> >>> sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION) >> [1] "CORN Jul/10" "CORN May/10" >> "ROBUSTA >> COFFEE (10) Jul/10" >> [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC " >> "STANDARD LEAD " >> >> -- David. >> >> >>> >>> >>> TY for any help. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >David Winsemius, MD West Hartford, CT