Dear Everyone,
I try to automatically manipulate the data of a variable (class factor) like
x
220
220a
221
221b
B221
Into two variables (class = numeric) like
x y
220 0
220 1
221 0
221 1
221 1
y has to carry the information about the class (number or string) of the
former x-Variable.
I could do it by hand like
x[x == "220a"] <- 220
y[x == "220a"] <- 1
but x has way to many expressions.
So I wondered if I could use a regular expression like OR ANY OTHER WAY
x[x == [0-9]{3}a] <- regular expression
y[x == [0-9]{3}] <- 1
Thanks a lot
[[alternative HTML version deleted]]
Check out sedit() in the Hmisc package Cheers! --- On Tue, 7/8/08, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:> From: Kunzler, Andreas <a.kunzler at bzaek.de> > Subject: [R] Manipulate Data (with regular expressions) > To: r-help at r-project.org > Date: Tuesday, July 8, 2008, 7:11 AM > Dear Everyone, > > > > I try to automatically manipulate the data of a variable > (class > factor) like > > > > x > > 220 > > 220a > > 221 > > 221b > > B221 > > > > Into two variables (class = numeric) like > > > > x y > > 220 0 > > 220 1 > > 221 0 > > 221 1 > > 221 1 > > > > y has to carry the information about the class (number or > string) of the > former x-Variable. > > > > I could do it by hand like > > > > x[x == "220a"] <- 220 > > y[x == "220a"] <- 1 > > > > but x has way to many expressions. > > > > So I wondered if I could use a regular expression like OR > ANY OTHER WAY > > > > x[x == [0-9]{3}a] <- regular expression > > y[x == [0-9]{3}] <- 1 > > > > > > Thanks a lot > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code.
Try this:
x <- factor(c("220", "220a", "221",
"221b", "B221"))
pat <- "[^0-9]+" # match non-digits
nums <- as.numeric(gsub(pat, "", x))
has.lets <- as.numeric(regexpr(pat, x) > 0)
On Tue, Jul 8, 2008 at 7:11 AM, Kunzler, Andreas <a.kunzler at bzaek.de>
wrote:> Dear Everyone,
>
>
>
> I try to automatically manipulate the data of a variable (class >
factor) like
>
>
>
> x
>
> 220
>
> 220a
>
> 221
>
> 221b
>
> B221
>
>
>
> Into two variables (class = numeric) like
>
>
>
> x y
>
> 220 0
>
> 220 1
>
> 221 0
>
> 221 1
>
> 221 1
>
>
>
> y has to carry the information about the class (number or string) of the
> former x-Variable.
>
>
>
> I could do it by hand like
>
>
>
> x[x == "220a"] <- 220
>
> y[x == "220a"] <- 1
>
>
>
> but x has way to many expressions.
>
>
>
> So I wondered if I could use a regular expression like OR ANY OTHER WAY
>
>
>
> x[x == [0-9]{3}a] <- regular expression
>
> y[x == [0-9]{3}] <- 1
>
>
>
>
>
> Thanks a lot
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thank you a lot,
I am almost done, but unfortunately I have to manipulate values like
x
220a1
220ab1
220a12
to
y
220
220
220
Eventhough it is easy to macht a 3-digit number
[0-9]{3}
I habe no idea how to mach everything except a 3-digit number in order to
replace everything but the 3-digit number by ""
y <- gsub(RE for Everything but a 3-digit number, "", x)
Maybe it ist possible to use the MATCH as the Replacer
y <- gsub([0-9]{3}, MATCH, x)
Thank you
-----Urspr?ngliche Nachricht-----
Von: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
Gesendet: Dienstag, 8. Juli 2008 17:20
An: Kunzler, Andreas
Cc: r-help at r-project.org
Betreff: Re: [R] Manipulate Data (with regular expressions)
Try this:
x <- factor(c("220", "220a", "221",
"221b", "B221"))
pat <- "[^0-9]+" # match non-digits
nums <- as.numeric(gsub(pat, "", x))
has.lets <- as.numeric(regexpr(pat, x) > 0)
On Tue, Jul 8, 2008 at 7:11 AM, Kunzler, Andreas <a.kunzler at bzaek.de>
wrote:> Dear Everyone,
>
>
>
> I try to automatically manipulate the data of a variable (class >
factor) like
>
>
>
> x
>
> 220
>
> 220a
>
> 221
>
> 221b
>
> B221
>
>
>
> Into two variables (class = numeric) like
>
>
>
> x y
>
> 220 0
>
> 220 1
>
> 221 0
>
> 221 1
>
> 221 1
>
>
>
> y has to carry the information about the class (number or string) of the
> former x-Variable.
>
>
>
> I could do it by hand like
>
>
>
> x[x == "220a"] <- 220
>
> y[x == "220a"] <- 1
>
>
>
> but x has way to many expressions.
>
>
>
> So I wondered if I could use a regular expression like OR ANY OTHER WAY
>
>
>
> x[x == [0-9]{3}a] <- regular expression
>
> y[x == [0-9]{3}] <- 1
>
>
>
>
>
> Thanks a lot
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
strapply() in gsubfn is convenient for that since it matches by contents
rather than delimiters:
x <- factor(c("220", "220a", "221b",
"B221", "220a1", "220ab1", "220a12"))
library(gsubfn)
strapply(as.character(x), "[0-9]{3}", simplify = c)
See
http://gsubfn.googlecode.com
On Fri, Jul 11, 2008 at 5:04 AM, Kunzler, Andreas <a.kunzler at bzaek.de>
wrote:> Thank you a lot,
>
> I am almost done, but unfortunately I have to manipulate values like
>
> x
> 220a1
> 220ab1
> 220a12
>
> to
>
> y
> 220
> 220
> 220
>
> Eventhough it is easy to macht a 3-digit number
> [0-9]{3}
> I habe no idea how to mach everything except a 3-digit number in order to
replace everything but the 3-digit number by ""
>
> y <- gsub(RE for Everything but a 3-digit number, "", x)
>
> Maybe it ist possible to use the MATCH as the Replacer
>
> y <- gsub([0-9]{3}, MATCH, x)
>
> Thank you
>
> -----Urspr?ngliche Nachricht-----
> Von: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Gesendet: Dienstag, 8. Juli 2008 17:20
> An: Kunzler, Andreas
> Cc: r-help at r-project.org
> Betreff: Re: [R] Manipulate Data (with regular expressions)
>
> Try this:
>
> x <- factor(c("220", "220a", "221",
"221b", "B221"))
> pat <- "[^0-9]+" # match non-digits
> nums <- as.numeric(gsub(pat, "", x))
> has.lets <- as.numeric(regexpr(pat, x) > 0)
>
>
> On Tue, Jul 8, 2008 at 7:11 AM, Kunzler, Andreas <a.kunzler at
bzaek.de> wrote:
>> Dear Everyone,
>>
>>
>>
>> I try to automatically manipulate the data of a variable (class
>> factor) like
>>
>>
>>
>> x
>>
>> 220
>>
>> 220a
>>
>> 221
>>
>> 221b
>>
>> B221
>>
>>
>>
>> Into two variables (class = numeric) like
>>
>>
>>
>> x y
>>
>> 220 0
>>
>> 220 1
>>
>> 221 0
>>
>> 221 1
>>
>> 221 1
>>
>>
>>
>> y has to carry the information about the class (number or string) of
the
>> former x-Variable.
>>
>>
>>
>> I could do it by hand like
>>
>>
>>
>> x[x == "220a"] <- 220
>>
>> y[x == "220a"] <- 1
>>
>>
>>
>> but x has way to many expressions.
>>
>>
>>
>> So I wondered if I could use a regular expression like OR ANY OTHER WAY
>>
>>
>>
>> x[x == [0-9]{3}a] <- regular expression
>>
>> y[x == [0-9]{3}] <- 1
>>
>>
>>
>>
>>
>> Thanks a lot
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>