Liviu Andronic
2012-Aug-06 16:25 UTC
[R] test if elements of a character vector contain letters
Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector:> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))[1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4"> x[1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k" "l" "m" "n" [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" "1" "2" [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter <- function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) }> is_letter(x)a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE p q r s t u v w x y z 1 2 3 4 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 20 21 22 23 24 25 26 FALSE FALSE FALSE FALSE FALSE FALSE FALSE> is_letter(x, 0:9) ##function slightly misnameda10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k l m n o TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE p q r s t u v w x y z 1 2 3 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 20 21 22 23 24 25 26 TRUE TRUE TRUE TRUE TRUE TRUE TRUE Is there a nicer way to do this? Regards Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
Bert Gunter
2012-Aug-06 16:42 UTC
[R] test if elements of a character vector contain letters
nzchar(x) & !is.na(x) No? -- Bert On Mon, Aug 6, 2012 at 9:25 AM, Liviu Andronic <landronimirc at gmail.com> wrote:> Dear all > I'm pretty sure that I'm approaching the problem in a wrong way. > Suppose the following character vector: >> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')) > [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" >> x > [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k" > "l" "m" "n" > [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" > "z" "1" "2" > [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" > "14" "15" "16" > [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" > > > How do you test whether the elements of the vector contain at least > one letter (or at least one digit) and obtain a logical vector of the > same dimension? I came up with the following awkward function: > is_letter <- function(x, pattern=c(letters, LETTERS)){ > sapply(x, function(y){ > any(sapply(pattern, function(z) grepl(z, y, fixed=T))) > }) > } > >> is_letter(x) > a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k > l m n o > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > TRUE TRUE TRUE TRUE > p q r s t u v w x y z > 1 2 3 4 > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > FALSE FALSE FALSE FALSE > 5 6 7 8 9 10 11 12 13 14 15 > 16 17 18 19 > FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE FALSE FALSE FALSE > 20 21 22 23 24 25 26 > FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> is_letter(x, 0:9) ##function slightly misnamed > a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k > l m n o > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE > FALSE FALSE FALSE FALSE > p q r s t u v w x y z > 1 2 3 4 > FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > TRUE TRUE TRUE TRUE > 5 6 7 8 9 10 11 12 13 14 15 > 16 17 18 19 > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > TRUE TRUE TRUE TRUE > 20 21 22 23 24 25 26 > TRUE TRUE TRUE TRUE TRUE TRUE TRUE > > > Is there a nicer way to do this? Regards > Liviu > > > -- > Do you know how to read? > http://www.alienetworks.com/srtest.cfm > http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader > Do you know how to write? > http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Rui Barradas
2012-Aug-06 16:51 UTC
[R] test if elements of a character vector contain letters
Hello, Fun as an exercise in vectorization. 30 times faster. Don't look, guess. Gave it up? Ok, here it is. is_letter <- function(x, pattern=c(letters, LETTERS)){ sapply(x, function(y){ any(sapply(pattern, function(z) grepl(z, y, fixed=T))) }) } # test ascii codes, just one loop. has_letter <- function(x){ sapply(x, function(y){ y <- as.integer(charToRaw(y)) any((65 <= y & y <= 90) | (97 <= y & y <= 122)) }) } x <- c(letters, 1:26) x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='') x <- rep(x, 1e3) t1 <- system.time(is_letter(x)) t2 <- system.time(has_letter(x)) rbind(t1, t2, t1/t2) user.self sys.self elapsed user.child sys.child t1 15.69 0 15.74 NA NA t2 0.50 0 0.50 NA NA 31.38 NaN 31.48 NA NA Em 06-08-2012 17:25, Liviu Andronic escreveu:> Dear all > I'm pretty sure that I'm approaching the problem in a wrong way. > Suppose the following character vector: >> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')) > [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" >> x > [1] "a10" "b7" "c2" "d3" "e6" "f1" "g5" "h8" "i9" "j4" "k" > "l" "m" "n" > [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" > "z" "1" "2" > [29] "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" > "14" "15" "16" > [43] "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" > > > How do you test whether the elements of the vector contain at least > one letter (or at least one digit) and obtain a logical vector of the > same dimension? I came up with the following awkward function: > is_letter <- function(x, pattern=c(letters, LETTERS)){ > sapply(x, function(y){ > any(sapply(pattern, function(z) grepl(z, y, fixed=T))) > }) > } > >> is_letter(x) > a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k > l m n o > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > TRUE TRUE TRUE TRUE > p q r s t u v w x y z > 1 2 3 4 > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > FALSE FALSE FALSE FALSE > 5 6 7 8 9 10 11 12 13 14 15 > 16 17 18 19 > FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > FALSE FALSE FALSE FALSE > 20 21 22 23 24 25 26 > FALSE FALSE FALSE FALSE FALSE FALSE FALSE >> is_letter(x, 0:9) ##function slightly misnamed > a10 b7 c2 d3 e6 f1 g5 h8 i9 j4 k > l m n o > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE > FALSE FALSE FALSE FALSE > p q r s t u v w x y z > 1 2 3 4 > FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > TRUE TRUE TRUE TRUE > 5 6 7 8 9 10 11 12 13 14 15 > 16 17 18 19 > TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > TRUE TRUE TRUE TRUE > 20 21 22 23 24 25 26 > TRUE TRUE TRUE TRUE TRUE TRUE TRUE > > > Is there a nicer way to do this? Regards > Liviu > >
Hi, Not sure whether this is you wanted. x<-letters ? (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep='')) ?x1<-c(x,1:26) x1 ?[1] "a4"? "b3"? "c5"? "d2"? "e9"? "f6"? "g1"? "h8"? "i10" "j7"? "k"?? "l"? [13] "m"?? "n"?? "o"?? "p"?? "q"?? "r"?? "s"?? "t"?? "u"?? "v"?? "w"?? "x"? [25] "y"?? "z"?? "1"?? "2"?? "3"?? "4"?? "5"?? "6"?? "7"?? "8"?? "9"?? "10" [37] "11"? "12"? "13"? "14"? "15"? "16"? "17"? "18"? "19"? "20"? "21"? "22" [49] "23"? "24"? "25"? "26" ?grepl("^[[:alpha:]][[:digit:]]",x1) ?[1]? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE FALSE FALSE [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE [49] FALSE FALSE FALSE FALSE A.K. ----- Original Message ----- From: Liviu Andronic <landronimirc at gmail.com> To: "r-help at r-project.org Help" <r-help at r-project.org> Cc: Sent: Monday, August 6, 2012 12:25 PM Subject: [R] test if elements of a character vector contain letters Dear all I'm pretty sure that I'm approaching the problem in a wrong way. Suppose the following character vector:> (x[1:10] <- paste(x[1:10], sample(1:10, 10), sep=''))[1] "a10" "b7"? "c2"? "d3"? "e6"? "f1"? "g5"? "h8"? "i9"? "j4"> x[1] "a10" "b7"? "c2"? "d3"? "e6"? "f1"? "g5"? "h8"? "i9"? "j4"? "k" "l"? "m"? "n" [15] "o"? "p"? "q"? "r"? "s"? "t"? "u"? "v"? "w"? "x"? "y" "z"? "1"? "2" [29] "3"? "4"? "5"? "6"? "7"? "8"? "9"? "10"? "11"? "12"? "13" "14"? "15"? "16" [43] "17"? "18"? "19"? "20"? "21"? "22"? "23"? "24"? "25"? "26" How do you test whether the elements of the vector contain at least one letter (or at least one digit) and obtain a logical vector of the same dimension? I came up with the following awkward function: is_letter <- function(x, pattern=c(letters, LETTERS)){ ? ? sapply(x, function(y){ ? ? ? ? any(sapply(pattern, function(z) grepl(z, y, fixed=T))) ? ? }) }> is_letter(x)? a10? ? b7? ? c2? ? d3? ? e6? ? f1? ? g5? ? h8? ? i9? ? j4? ? k l? ? m? ? n? ? o TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE TRUE? TRUE? TRUE? TRUE ? ? p? ? q? ? r? ? s? ? t? ? u? ? v? ? w? ? x? ? y? ? z 1? ? 2? ? 3? ? 4 TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE FALSE FALSE FALSE FALSE ? ? 5? ? 6? ? 7? ? 8? ? 9? ? 10? ? 11? ? 12? ? 13? ? 14? ? 15 16? ? 17? ? 18? ? 19 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ? 20? ? 21? ? 22? ? 23? ? 24? ? 25? ? 26 FALSE FALSE FALSE FALSE FALSE FALSE FALSE> is_letter(x, 0:9)? ##function slightly misnamed? a10? ? b7? ? c2? ? d3? ? e6? ? f1? ? g5? ? h8? ? i9? ? j4? ? k l? ? m? ? n? ? o TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE FALSE FALSE FALSE FALSE FALSE ? ? p? ? q? ? r? ? s? ? t? ? u? ? v? ? w? ? x? ? y? ? z 1? ? 2? ? 3? ? 4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE? TRUE? TRUE? TRUE ? ? 5? ? 6? ? 7? ? 8? ? 9? ? 10? ? 11? ? 12? ? 13? ? 14? ? 15 16? ? 17? ? 18? ? 19 TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE TRUE? TRUE? TRUE? TRUE ? 20? ? 21? ? 22? ? 23? ? 24? ? 25? ? 26 TRUE? TRUE? TRUE? TRUE? TRUE? TRUE? TRUE Is there a nicer way to do this? Regards Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.