Hi John, As I mentioned in our private exchange, this is well known in R, i.e. vectorized versions are not always faster or more efficient than straight loops. It is a misconception that loops should be avoided at any cost. See John Fox's illuminating article on Rnews (p. 46) on this subject. http://cran.r-project.org/doc/Rnews/Rnews_2008-1.pdf To me, unless the application is way too demanding, I take the vectorized version over the loop any day, as it is much simpler to write, usually a one-liner, and therefore much easier to maintain in the long run. -Christos> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of john > Sent: Thursday, October 09, 2008 12:38 AM > To: r-help > Subject: Re: [R] vectorized sub, gsub, grep, etc. > > Hello Christos, > To my surprise, vectorization actually hurt processing speed! > > #Example > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A") > > sub2 <- function(pattern, replacement, x) { > len <- length(x) > if (length(pattern) == 1) > pattern <- rep(pattern, len) > if (length(replacement) == 1) > replacement <- rep(replacement, len) > FUN <- function(i, ...) { > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > } > idx <- 1:length(x) > sapply(idx, FUN) > } > > system.time( for(i in 1:10000) sub2(patt, repl, X) ) > user system elapsed > 1.18 0.07 1.26 > > system.time( for(i in 1:10000) mapply(function(p, r, x) > sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) ) > user system elapsed > 1.42 0.05 1.47 > > So much for avoiding loops. > John Thaden > > ======= At 2008-10-07, 14:58:10 Christos wrote: ======> > >John, > >Try the following: > > > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), > p=patt, r=repl, x=X) > > b cd a > >"aB" "CD" "ef" > > > >-Christos > > >> -----My Original Message----- > >> R pattern-matching and replacement functions are > >> vectorized: they can operate on vectors of targets. > >> However, they can only use one pattern and replacement. > >> Here is code to apply a different pattern and replacement > for every > >> target. My question: can it be done better? > >> > >> sub2 <- function(pattern, replacement, x) { > >> len <- length(x) > >> if (length(pattern) == 1) > >> pattern <- rep(pattern, len) > >> if (length(replacement) == 1) > >> replacement <- rep(replacement, len) > >> FUN <- function(i, ...) { > >> sub(pattern[i], replacement[i], x[i], fixed = TRUE) > >> } > >> idx <- 1:length(x) > >> sapply(idx, FUN) > >> } > >> > >> #Example > >> X <- c("ab", "cd", "ef") > >> patt <- c("b", "cd", "a") > >> repl <- c("B", "CD", "A") > >> sub2(patt, repl, X) > >> > >> -John > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Hello Christos,
To my surprise, vectorization actually hurt processing speed!
#Example
X <- c("ab", "cd", "ef")
patt <- c("b", "cd", "a")
repl <- c("B", "CD", "A")
sub2 <- function(pattern, replacement, x) {
len <- length(x)
if (length(pattern) == 1)
pattern <- rep(pattern, len)
if (length(replacement) == 1)
replacement <- rep(replacement, len)
FUN <- function(i, ...) {
sub(pattern[i], replacement[i], x[i], fixed = TRUE)
}
idx <- 1:length(x)
sapply(idx, FUN)
}
system.time( for(i in 1:10000) sub2(patt, repl, X) )
user system elapsed
1.18 0.07 1.26
system.time( for(i in 1:10000) mapply(function(p, r, x) sub(p, r, x, fixed =
TRUE), p=patt, r=repl, x=X) )
user system elapsed
1.42 0.05 1.47
So much for avoiding loops.
John Thaden
======= At 2008-10-07, 14:58:10 Christos wrote: ======>John,
>Try the following:
>
> mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X)
> b cd a
>"aB" "CD" "ef"
>
>-Christos
>> -----My Original Message-----
>> R pattern-matching and replacement functions are
>> vectorized: they can operate on vectors of targets.
>> However, they can only use one pattern and replacement.
>> Here is code to apply a different pattern and replacement for
>> every target. My question: can it be done better?
>>
>> sub2 <- function(pattern, replacement, x) {
>> len <- length(x)
>> if (length(pattern) == 1)
>> pattern <- rep(pattern, len)
>> if (length(replacement) == 1)
>> replacement <- rep(replacement, len)
>> FUN <- function(i, ...) {
>> sub(pattern[i], replacement[i], x[i], fixed = TRUE)
>> }
>> idx <- 1:length(x)
>> sapply(idx, FUN)
>> }
>>
>> #Example
>> X <- c("ab", "cd", "ef")
>> patt <- c("b", "cd", "a")
>> repl <- c("B", "CD", "A")
>> sub2(patt, repl, X)
>>
>> -John
I know this is an old thread, but I wrote a simple FOR loop with vectorized
pattern replacement that is much faster than either of those (it can also
accept outputs differing in length from the patterns):
sub2 <- function(pattern, replacement, x) {
len <- length(x)
y <- character(length=len)
patlen <- length(pattern)
replen <- length(replacement)
if(patlen != replen) stop('Error: Pattern and replacement length do not
match')
for(i in 1:replen) {
y[which(x==pattern[i])] <- replacement[i]
}
return(y)
}
system.time(test <- sub2(patt, repl, XX))
user system elapsed
0 0 0
Cheers,
Adam
On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john
wrote:>
> Hello Christos,
> To my surprise, vectorization actually hurt processing speed!
>
> #Example
> X <- c("ab", "cd", "ef")
> patt <- c("b", "cd", "a")
> repl <- c("B", "CD", "A")
>
> sub2 <- function(pattern, replacement, x) {
> len <- length(x)
> if (length(pattern) == 1)
> pattern <- rep(pattern, len)
> if (length(replacement) == 1)
> replacement <- rep(replacement, len)
> FUN <- function(i, ...) {
> sub(pattern[i], replacement[i], x[i], fixed = TRUE)
> }
> idx <- 1:length(x)
> sapply(idx, FUN)
> }
>
> system.time( for(i in 1:10000) sub2(patt, repl, X) )
> user system elapsed
> 1.18 0.07 1.26
>
> system.time( for(i in 1:10000) mapply(function(p, r, x) sub(p, r, x,
> fixed = TRUE), p=patt, r=repl, x=X) )
> user system elapsed
> 1.42 0.05 1.47
>
> So much for avoiding loops.
> John Thaden
>
> ======= At 2008-10-07, 14:58:10 Christos wrote: ======>
> >John,
> >Try the following:
> >
> > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl,
x=X)
> > b cd a
> >"aB" "CD" "ef"
> >
> >-Christos
>
> >> -----My Original Message-----
> >> R pattern-matching and replacement functions are
> >> vectorized: they can operate on vectors of targets.
> >> However, they can only use one pattern and replacement.
> >> Here is code to apply a different pattern and replacement for
> >> every target. My question: can it be done better?
> >>
> >> sub2 <- function(pattern, replacement, x) {
> >> len <- length(x)
> >> if (length(pattern) == 1)
> >> pattern <- rep(pattern, len)
> >> if (length(replacement) == 1)
> >> replacement <- rep(replacement, len)
> >> FUN <- function(i, ...) {
> >> sub(pattern[i], replacement[i], x[i], fixed = TRUE)
> >> }
> >> idx <- 1:length(x)
> >> sapply(idx, FUN)
> >> }
> >>
> >> #Example
> >> X <- c("ab", "cd", "ef")
> >> patt <- c("b", "cd", "a")
> >> repl <- c("B", "CD", "A")
> >> sub2(patt, repl, X)
> >>
> >> -John
>
> ______________________________________________
> R-h... at r-project.org <javascript:> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>