Dear Everyone, I try to automatically manipulate the data of a variable (class factor) like x 220 220a 221 221b B221 Into two variables (class = numeric) like x y 220 0 220 1 221 0 221 1 221 1 y has to carry the information about the class (number or string) of the former x-Variable. I could do it by hand like x[x == "220a"] <- 220 y[x == "220a"] <- 1 but x has way to many expressions. So I wondered if I could use a regular expression like OR ANY OTHER WAY x[x == [0-9]{3}a] <- regular expression y[x == [0-9]{3}] <- 1 Thanks a lot [[alternative HTML version deleted]]
Check out sedit() in the Hmisc package Cheers! --- On Tue, 7/8/08, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:> From: Kunzler, Andreas <a.kunzler at bzaek.de> > Subject: [R] Manipulate Data (with regular expressions) > To: r-help at r-project.org > Date: Tuesday, July 8, 2008, 7:11 AM > Dear Everyone, > > > > I try to automatically manipulate the data of a variable > (class > factor) like > > > > x > > 220 > > 220a > > 221 > > 221b > > B221 > > > > Into two variables (class = numeric) like > > > > x y > > 220 0 > > 220 1 > > 221 0 > > 221 1 > > 221 1 > > > > y has to carry the information about the class (number or > string) of the > former x-Variable. > > > > I could do it by hand like > > > > x[x == "220a"] <- 220 > > y[x == "220a"] <- 1 > > > > but x has way to many expressions. > > > > So I wondered if I could use a regular expression like OR > ANY OTHER WAY > > > > x[x == [0-9]{3}a] <- regular expression > > y[x == [0-9]{3}] <- 1 > > > > > > Thanks a lot > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code.
Try this: x <- factor(c("220", "220a", "221", "221b", "B221")) pat <- "[^0-9]+" # match non-digits nums <- as.numeric(gsub(pat, "", x)) has.lets <- as.numeric(regexpr(pat, x) > 0) On Tue, Jul 8, 2008 at 7:11 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:> Dear Everyone, > > > > I try to automatically manipulate the data of a variable (class > factor) like > > > > x > > 220 > > 220a > > 221 > > 221b > > B221 > > > > Into two variables (class = numeric) like > > > > x y > > 220 0 > > 220 1 > > 221 0 > > 221 1 > > 221 1 > > > > y has to carry the information about the class (number or string) of the > former x-Variable. > > > > I could do it by hand like > > > > x[x == "220a"] <- 220 > > y[x == "220a"] <- 1 > > > > but x has way to many expressions. > > > > So I wondered if I could use a regular expression like OR ANY OTHER WAY > > > > x[x == [0-9]{3}a] <- regular expression > > y[x == [0-9]{3}] <- 1 > > > > > > Thanks a lot > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thank you a lot, I am almost done, but unfortunately I have to manipulate values like x 220a1 220ab1 220a12 to y 220 220 220 Eventhough it is easy to macht a 3-digit number [0-9]{3} I habe no idea how to mach everything except a 3-digit number in order to replace everything but the 3-digit number by "" y <- gsub(RE for Everything but a 3-digit number, "", x) Maybe it ist possible to use the MATCH as the Replacer y <- gsub([0-9]{3}, MATCH, x) Thank you -----Urspr?ngliche Nachricht----- Von: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] Gesendet: Dienstag, 8. Juli 2008 17:20 An: Kunzler, Andreas Cc: r-help at r-project.org Betreff: Re: [R] Manipulate Data (with regular expressions) Try this: x <- factor(c("220", "220a", "221", "221b", "B221")) pat <- "[^0-9]+" # match non-digits nums <- as.numeric(gsub(pat, "", x)) has.lets <- as.numeric(regexpr(pat, x) > 0) On Tue, Jul 8, 2008 at 7:11 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:> Dear Everyone, > > > > I try to automatically manipulate the data of a variable (class > factor) like > > > > x > > 220 > > 220a > > 221 > > 221b > > B221 > > > > Into two variables (class = numeric) like > > > > x y > > 220 0 > > 220 1 > > 221 0 > > 221 1 > > 221 1 > > > > y has to carry the information about the class (number or string) of the > former x-Variable. > > > > I could do it by hand like > > > > x[x == "220a"] <- 220 > > y[x == "220a"] <- 1 > > > > but x has way to many expressions. > > > > So I wondered if I could use a regular expression like OR ANY OTHER WAY > > > > x[x == [0-9]{3}a] <- regular expression > > y[x == [0-9]{3}] <- 1 > > > > > > Thanks a lot > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
strapply() in gsubfn is convenient for that since it matches by contents rather than delimiters: x <- factor(c("220", "220a", "221b", "B221", "220a1", "220ab1", "220a12")) library(gsubfn) strapply(as.character(x), "[0-9]{3}", simplify = c) See http://gsubfn.googlecode.com On Fri, Jul 11, 2008 at 5:04 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:> Thank you a lot, > > I am almost done, but unfortunately I have to manipulate values like > > x > 220a1 > 220ab1 > 220a12 > > to > > y > 220 > 220 > 220 > > Eventhough it is easy to macht a 3-digit number > [0-9]{3} > I habe no idea how to mach everything except a 3-digit number in order to replace everything but the 3-digit number by "" > > y <- gsub(RE for Everything but a 3-digit number, "", x) > > Maybe it ist possible to use the MATCH as the Replacer > > y <- gsub([0-9]{3}, MATCH, x) > > Thank you > > -----Urspr?ngliche Nachricht----- > Von: Gabor Grothendieck [mailto:ggrothendieck at gmail.com] > Gesendet: Dienstag, 8. Juli 2008 17:20 > An: Kunzler, Andreas > Cc: r-help at r-project.org > Betreff: Re: [R] Manipulate Data (with regular expressions) > > Try this: > > x <- factor(c("220", "220a", "221", "221b", "B221")) > pat <- "[^0-9]+" # match non-digits > nums <- as.numeric(gsub(pat, "", x)) > has.lets <- as.numeric(regexpr(pat, x) > 0) > > > On Tue, Jul 8, 2008 at 7:11 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote: >> Dear Everyone, >> >> >> >> I try to automatically manipulate the data of a variable (class >> factor) like >> >> >> >> x >> >> 220 >> >> 220a >> >> 221 >> >> 221b >> >> B221 >> >> >> >> Into two variables (class = numeric) like >> >> >> >> x y >> >> 220 0 >> >> 220 1 >> >> 221 0 >> >> 221 1 >> >> 221 1 >> >> >> >> y has to carry the information about the class (number or string) of the >> former x-Variable. >> >> >> >> I could do it by hand like >> >> >> >> x[x == "220a"] <- 220 >> >> y[x == "220a"] <- 1 >> >> >> >> but x has way to many expressions. >> >> >> >> So I wondered if I could use a regular expression like OR ANY OTHER WAY >> >> >> >> x[x == [0-9]{3}a] <- regular expression >> >> y[x == [0-9]{3}] <- 1 >> >> >> >> >> >> Thanks a lot >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > >