thr3ads.net - R help - [R] data frame manipulation and regex [Apr 2010]

If this information is useful, please help other people find it:
Share via:

arnaud Gaboury

2010-Apr-28 09:14 UTC

[R] data frame manipulation and regex

Dear group,

Here is my data.frame :


avprix <-
structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10",
"ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL
HIGH GRADE ZINC USD
Jul/10", 
"STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5, 
-2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =
c("DESCRIPTION",
"prix", "quantity"), row.names = c(NA, -6L), class =
"data.frame")
> avprix                      DESCRIPTION    prix quantity
1                     CORN Jul/10    -1.5        0
2                     CORN May/10 -1082.0       -3
3      ROBUSTA COFFEE (10) Jul/10 11084.0        8
4                 SOYBEANS Jul/10  1983.5        2
5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0       -1
6        STANDARD LEAD USD Jul/10  -118.0        0

I need to remove the date (i.e. Jul/10 in this example) for each element of
the DESCRIPTION column that contains the USD symbol. I am trying to do this
using regular expressions, but must admit I am going nowhere.
My elements in the DESCRIPTION column and the dates can change every day.

TY for any help.

David Winsemius

2010-Apr-28 12:25 UTC

head link

[R] data frame manipulation and regex

On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:
> Dear group,
>
> Here is my data.frame :
>
> avprix <-
> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN
May/10",
> "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10",
"SPCL HIGH GRADE
> ZINC USD
> Jul/10",
> "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5,
> -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =  
> c("DESCRIPTION",
> "prix", "quantity"), row.names = c(NA, -6L), class =
"data.frame")
>
>> avprix
>                      DESCRIPTION    prix quantity
> 1                     CORN Jul/10    -1.5        0
> 2                     CORN May/10 -1082.0       -3
> 3      ROBUSTA COFFEE (10) Jul/10 11084.0        8
> 4                 SOYBEANS Jul/10  1983.5        2
> 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0       -1
> 6        STANDARD LEAD USD Jul/10  -118.0        0
>
> I need to remove the date (i.e. Jul/10 in this example) for each  
> element of
> the DESCRIPTION column that contains the USD symbol. I am trying to  
> do this
> using regular expressions, but must admit I am going nowhere.
> My elements in the DESCRIPTION column and the dates can change every  
> day.
This searches for the pattern USD and then replaces any three  
characters , forward-slash, any two characters:
 > sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
[1] "CORN Jul/10"                "CORN May/10"              
"ROBUSTA
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC "
"STANDARD LEAD "

This tightens up the matching by requiring that that the characters  
after the slash be digits:

 > sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)
[1] "CORN Jul/10"                "CORN May/10"              
"ROBUSTA
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC "
"STANDARD LEAD "

-- David.


 >>
> TY for any help.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

arnaud Gaboury

2010-Apr-28 12:30 UTC

head link

[R] data frame manipulation and regex

TY so much david. We are getting close. But I need to keep "USD" in my
object name (i.e "STANDARD LEAD USD")



***************************
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F
***************************

> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Wednesday, April 28, 2010 2:25 PM
> To: arnaud Gaboury
> Cc: r-help at r-project.org
> Subject: Re: [R] data frame manipulation and regex
> 
> 
> On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:
> 
> > Dear group,
> >
> > Here is my data.frame :
> >
> > avprix <-
> > structure(list(DESCRIPTION = c("CORN Jul/10", "CORN
May/10",
> > "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10",
"SPCL HIGH GRADE
> > ZINC USD
> > Jul/10",
> > "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084,
1983.5,
> > -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names > >
c("DESCRIPTION",
> > "prix", "quantity"), row.names = c(NA, -6L), class
= "data.frame")
> >
> >> avprix
> >                      DESCRIPTION    prix quantity
> > 1                     CORN Jul/10    -1.5        0
> > 2                     CORN May/10 -1082.0       -3
> > 3      ROBUSTA COFFEE (10) Jul/10 11084.0        8
> > 4                 SOYBEANS Jul/10  1983.5        2
> > 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0       -1
> > 6        STANDARD LEAD USD Jul/10  -118.0        0
> >
> > I need to remove the date (i.e. Jul/10 in this example) for each
> > element of
> > the DESCRIPTION column that contains the USD symbol. I am trying to
> > do this
> > using regular expressions, but must admit I am going nowhere.
> > My elements in the DESCRIPTION column and the dates can change every
> > day.
> 
> This searches for the pattern USD and then replaces any three
> characters , forward-slash, any two characters:
>  > sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
> [1] "CORN Jul/10"                "CORN May/10"         
"ROBUSTA
> COFFEE (10) Jul/10"
> [4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC
"
> "STANDARD LEAD "
> 
> This tightens up the matching by requiring that that the characters
> after the slash be digits:
> 
>  > sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)
> [1] "CORN Jul/10"                "CORN May/10"         
"ROBUSTA
> COFFEE (10) Jul/10"
> [4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC
"
> "STANDARD LEAD "
> 
> -- David.
> 
> 
>  >
> >
> > TY for any help.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> David Winsemius, MD
> West Hartford, CT

David Winsemius

2010-Apr-28 12:40 UTC

head link

[R] data frame manipulation and regex

On Apr 28, 2010, at 8:30 AM, arnaud Gaboury wrote:
> TY so much david. We are getting close. But I need to keep "USD"
in my
> object name (i.e "STANDARD LEAD USD")
> > sub("USD+.*.(.../\\d{2})", "USD", avprix$DESCRIPTION)
[1] "CORN Jul/10"                "CORN May/10"              
"ROBUSTA
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC USD"
"STANDARD LEAD USD"
 >
I had been attempting  (unsuccessfully to get the portion within hte  
parens to be the replaced string; This also works and has hte side  
effect of keeping hte \n that I had not intended to remove from the  
5th item:

 > sub("(USD+.*).../\\d{2}", "\\1", avprix$DESCRIPTION)
[1] "CORN Jul/10"                "CORN May/10"              
"ROBUSTA
COFFEE (10) Jul/10"
[4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC
USD\n"
"STANDARD LEAD USD "

-- 
David>
>
> ***************************
> Arnaud Gaboury
> Mobile: +41 79 392 79 56
> BBM: 255B488F
> ***************************
>
>
>> -----Original Message-----
>> From: David Winsemius [mailto:dwinsemius at comcast.net]
>> Sent: Wednesday, April 28, 2010 2:25 PM
>> To: arnaud Gaboury
>> Cc: r-help at r-project.org
>> Subject: Re: [R] data frame manipulation and regex
>>
>>
>> On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:
>>
>>> Dear group,
>>>
>>> Here is my data.frame :
>>>
>>> avprix <-
>>> structure(list(DESCRIPTION = c("CORN Jul/10", "CORN
May/10",
>>> "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS
Jul/10", "SPCL HIGH GRADE
>>> ZINC USD
>>> Jul/10",
>>> "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084,
1983.5,
>>> -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names
>>> c("DESCRIPTION",
>>> "prix", "quantity"), row.names = c(NA, -6L),
class = "data.frame")
>>>
>>>> avprix
>>>                     DESCRIPTION    prix quantity
>>> 1                     CORN Jul/10    -1.5        0
>>> 2                     CORN May/10 -1082.0       -3
>>> 3      ROBUSTA COFFEE (10) Jul/10 11084.0        8
>>> 4                 SOYBEANS Jul/10  1983.5        2
>>> 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0       -1
>>> 6        STANDARD LEAD USD Jul/10  -118.0        0
>>>
>>> I need to remove the date (i.e. Jul/10 in this example) for each
>>> element of
>>> the DESCRIPTION column that contains the USD symbol. I am trying to
>>> do this
>>> using regular expressions, but must admit I am going nowhere.
>>> My elements in the DESCRIPTION column and the dates can change
every
>>> day.
>>
>> This searches for the pattern USD and then replaces any three
>> characters , forward-slash, any two characters:
>>> sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
>> [1] "CORN Jul/10"                "CORN May/10"
>> "ROBUSTA
>> COFFEE (10) Jul/10"
>> [4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC
"
>> "STANDARD LEAD "
>>
>> This tightens up the matching by requiring that that the characters
>> after the slash be digits:
>>
>>> sub("USD+.*(.../\\d{2})", "",
avprix$DESCRIPTION)
>> [1] "CORN Jul/10"                "CORN May/10"
>> "ROBUSTA
>> COFFEE (10) Jul/10"
>> [4] "SOYBEANS Jul/10"            "SPCL HIGH GRADE ZINC
"
>> "STANDARD LEAD "
>>
>> -- David.
>>
>>
>>>
>>>
>>> TY for any help.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>
David Winsemius, MD
West Hartford, CT

R help - Apr 2010 - data frame manipulation and regex

[R] data frame manipulation and regex

[R] data frame manipulation and regex

[R] data frame manipulation and regex

[R] data frame manipulation and regex