I know this is an old thread, but I wrote a simple FOR loop with vectorized pattern replacement that is much faster than either of those (it can also accept outputs differing in length from the patterns): sub2 <- function(pattern, replacement, x) { len <- length(x) y <- character(length=len) patlen <- length(pattern) replen <- length(replacement) if(patlen != replen) stop('Error: Pattern and replacement length do not match') for(i in 1:replen) { y[which(x==pattern[i])] <- replacement[i] } return(y) } system.time(test <- sub2(patt, repl, XX)) user system elapsed 0 0 0 Cheers, Adam On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john wrote:> > Hello Christos, > To my surprise, vectorization actually hurt processing speed! > > #Example > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A") > > sub2 <- function(pattern, replacement, x) { > len <- length(x) > if (length(pattern) == 1) > pattern <- rep(pattern, len) > if (length(replacement) == 1) > replacement <- rep(replacement, len) > FUN <- function(i, ...) { > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > } > idx <- 1:length(x) > sapply(idx, FUN) > } > > system.time( for(i in 1:10000) sub2(patt, repl, X) ) > user system elapsed > 1.18 0.07 1.26 > > system.time( for(i in 1:10000) mapply(function(p, r, x) sub(p, r, x, > fixed = TRUE), p=patt, r=repl, x=X) ) > user system elapsed > 1.42 0.05 1.47 > > So much for avoiding loops. > John Thaden > > ======= At 2008-10-07, 14:58:10 Christos wrote: ======> > >John, > >Try the following: > > > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > > b cd a > >"aB" "CD" "ef" > > > >-Christos > > >> -----My Original Message----- > >> R pattern-matching and replacement functions are > >> vectorized: they can operate on vectors of targets. > >> However, they can only use one pattern and replacement. > >> Here is code to apply a different pattern and replacement for > >> every target. My question: can it be done better? > >> > >> sub2 <- function(pattern, replacement, x) { > >> len <- length(x) > >> if (length(pattern) == 1) > >> pattern <- rep(pattern, len) > >> if (length(replacement) == 1) > >> replacement <- rep(replacement, len) > >> FUN <- function(i, ...) { > >> sub(pattern[i], replacement[i], x[i], fixed = TRUE) > >> } > >> idx <- 1:length(x) > >> sapply(idx, FUN) > >> } > >> > >> #Example > >> X <- c("ab", "cd", "ef") > >> patt <- c("b", "cd", "a") > >> repl <- c("B", "CD", "A") > >> sub2(patt, repl, X) > >> > >> -John > > ______________________________________________ > R-h... at r-project.org <javascript:> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
Adam,??? The method you propose gives a different result than the prior methods for these example vectors X <- c("ab", "cd", "ef") patt <- c("b", "cd", "a") repl <- c("B", "CD", "A") Old method 1 mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) gives ? b?? cd??? a "aB" "CD" "ef" Old method 2 sub2 <- function(pattern, replacement, x) { ? ? len <- length(x) ? ? if (length(pattern) == 1) ? ? ? ? pattern <- rep(pattern, len) ? ? if (length(replacement) == 1) ? ? ? ? replacement <- rep(replacement, len) ? ? FUN <- function(i, ...) { ? ? ? ? sub(pattern[i], replacement[i], x[i], fixed = TRUE) ? ? } ? ? idx <- 1:length(x) ? ? sapply(idx, FUN) ? ? } sub2(patt, repl, X) gives [1] "aB" "CD" "ef" Your method (I gave it the unique name "sub3") ?sub3 <- function(pattern, replacement, x) {?? len ?? <- length(x)? y ? ? ?<- character(length=len)? patlen <- length(pattern)? replen <- length(replacement)? if(patlen != replen) stop('Error: Pattern and replacement length do not match')? for(i in 1:replen) {? ? y[which(x==pattern[i])] <- replacement[i]? }? return(y)}sub3(patt, repl, X) gives[1] ""?? "CD" "" Granted, whatever it does, it does it faster #Old method 1 system.time(for(i in 1:50000) mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) ?? user? system elapsed ?? 2.53??? 0.00??? 2.52 ? #Old method 2 system.time(for(i in 1:50000)sub2(patt, repl, X))?? user? system elapsed ?? 2.32??? 0.00??? 2.32 ? #Your proposed method system.time(for(i in 1:50000) sub3(patt, repl, X)) ?? user? system elapsed ?? 1.02??? 0.00??? 1.01 but would it still be faster if it actually solved the same problem? -John Thaden On Monday, July 27, 2015 11:40 PM, Adam Erickson <adam.michael.erickson at gmail.com> wrote: I know this is an old thread, but I wrote a simple FOR loop with vectorized pattern replacement that is much faster than either of those (it can also accept outputs differing in length from the patterns): ? sub2 ?<- function(pattern, replacement, x) {?? ? len ? <- length(x)? ? y ? ? ?<- character(length=len)? ? patlen <- length(pattern)? ? replen <- length(replacement)? ? if(patlen != replen) stop('Error: Pattern and replacement length do not match')? ? for(i in 1:replen) {? ? ? y[which(x==pattern[i])] <- replacement[i]? ? }? ? return(y)? } system.time(test <- sub2(patt, repl, XX))? ?user ?system elapsed?? ? ? 0 ? ? ? 0 ? ? ? 0? Cheers, Adam On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john wrote: Hello Christos, ? To my surprise, vectorization actually hurt processing speed!#Example X <- c("ab", "cd", "ef") patt <- c("b", "cd", "a") repl <- c("B", "CD", "A")sub2 <- function(pattern, replacement, x) { ? ? len <- length(x) ? ? if (length(pattern) == 1) ? ? ? ? pattern <- rep(pattern, len) ? ? if (length(replacement) == 1) ? ? ? ? replacement <- rep(replacement, len) ? ? FUN <- function(i, ...) { ? ? ? ? sub(pattern[i], replacement[i], x[i], fixed = TRUE) ? ? } ? ? idx <- 1:length(x) ? ? sapply(idx, FUN) ? ? } ? system.time( ?for(i in 1:10000) ?sub2(patt, repl, X) ?) ? ?user ?system elapsed ? ?1.18 ? ?0.07 ? ?1.26 system.time( ?for(i in 1:10000) ?mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) ?) ? ?user ?system elapsed ? ?1.42 ? ?0.05 ? ?1.47 ? So much for avoiding loops. John Thaden======= At 2008-10-07, 14:58:10 Christos wrote: =======>John,>Try the following: > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > ? b ? cd ? ?a >"aB" "CD" "ef" ? > >-Christos>> -----My Original Message----- >> R pattern-matching and replacement functions are >> vectorized: they can operate on vectors of targets. >> However, they can only use one pattern and replacement. >> Here is code to apply a different pattern and replacement for >> every target. ?My question: can it be done better? >> >> sub2 <- function(pattern, replacement, x) { >> ? ? len <- length(x) >> ? ? if (length(pattern) == 1) >> ? ? ? ? pattern <- rep(pattern, len) >> ? ? if (length(replacement) == 1) >> ? ? ? ? replacement <- rep(replacement, len) >> ? ? FUN <- function(i, ...) { >> ? ? ? ? sub(pattern[i], replacement[i], x[i], fixed = TRUE) >> ? ? } >> ? ? idx <- 1:length(x) >> ? ? sapply(idx, FUN) ? ? >> } >> >> #Example >> X <- c("ab", "cd", "ef") >> patt <- c("b", "cd", "a") >> repl <- c("B", "CD", "A") >> sub2(patt, repl, X) >> >> -John______________________________________________R-h... at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Hi John, The version I wrote performs vectorized full string matching and replacement with some error checking and flexible inputs. I think there are a lot of good reasons for using this method where possible (e.g., speed and reduced complexity). Duly noted that it is different from the original question, which I only skimmed. The previous versions you listed are both actually faster than the function for this in stringr: str_replace(X,patt,repl) [1] "aB" "CD" "ef" system.time(for(i in 1:50000) str_replace(X,patt,repl)) user system elapsed 5.51 0.00 5.79 However, it seems unrealistic that the vectors would be perfectly ordered in this way for most applications. The previous listed code is faster than other approaches because there are far fewer permutations and only the first character is checked. Perhaps that was the intention? I find this case to be rare. For data.tables, I prefer the := and like() function, which uses grepl(). Cheers, Adam On Tue, Jul 28, 2015 at 3:00 PM, John Thaden <jjthaden at flash.net> wrote:> Adam, > The method you propose gives a different result than the prior methods > for these example vectors > > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A") > > Old method 1 > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > > gives > > > * b cd a "aB" "CD" "ef"* > > Old method 2 > > sub2 <- function(pattern, replacement, x) { > len <- length(x) > if (length(pattern) == 1) > pattern <- rep(pattern, len) > if (length(replacement) == 1) > replacement <- rep(replacement, len) > FUN <- function(i, ...) { > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > } > idx <- 1:length(x) > sapply(idx, FUN) > } > sub2(patt, repl, X) > > gives > > *[1] "aB" "CD" "ef"* > > Your method (I gave it the unique name "sub3") > > sub3 <- function(pattern, replacement, x) { > len <- length(x) > y <- character(length=len) > patlen <- length(pattern) > replen <- length(replacement) > if(patlen != replen) stop('Error: Pattern and replacement length do not > match') > for(i in 1:replen) { > y[which(x==pattern[i])] <- replacement[i] > } > return(y) > } > sub3(patt, repl, X) > > gives > > *[1] "" "CD" ""* > > Granted, whatever it does, it does it faster > > #Old method 1 > system.time(for(i in 1:50000) > mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) > > *user system elapsed 2.53 0.00 2.52 * > > #Old method 2 > system.time(for(i in 1:50000) > sub2(patt, repl, X)) > > *user system elapsed 2.32 0.00 2.32 * > > #Your proposed method > system.time(for(i in 1:50000) sub3(patt, repl, X)) > > *user system elapsed * > * 1.02 0.00 1.01* > > but would it still be faster if it actually solved the same problem? > > -John Thaden > > > > > On Monday, July 27, 2015 11:40 PM, Adam Erickson < > adam.michael.erickson at gmail.com> wrote: > > I know this is an old thread, but I wrote a simple FOR loop with > vectorized pattern replacement that is much faster than either of those (it > can also accept outputs differing in length from the patterns): > > sub2 <- function(pattern, replacement, x) { > len <- length(x) > y <- character(length=len) > patlen <- length(pattern) > replen <- length(replacement) > if(patlen != replen) stop('Error: Pattern and replacement length do > not match') > for(i in 1:replen) { > y[which(x==pattern[i])] <- replacement[i] > } > return(y) > } > > system.time(test <- sub2(patt, repl, XX)) > user system elapsed > 0 0 0 > > Cheers, > > Adam > > On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john wrote: > > Hello Christos, > To my surprise, vectorization actually hurt processing speed! > #Example > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A") > sub2 <- function(pattern, replacement, x) { > len <- length(x) > if (length(pattern) == 1) > pattern <- rep(pattern, len) > if (length(replacement) == 1) > replacement <- rep(replacement, len) > FUN <- function(i, ...) { > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > } > idx <- 1:length(x) > sapply(idx, FUN) > } > > system.time( for(i in 1:10000) sub2(patt, repl, X) ) > user system elapsed > 1.18 0.07 1.26 > system.time( for(i in 1:10000) mapply(function(p, r, x) sub(p, r, x, > fixed = TRUE), p=patt, r=repl, x=X) ) > user system elapsed > 1.42 0.05 1.47 > > So much for avoiding loops. > John Thaden > ======= At 2008-10-07, 14:58:10 Christos wrote: ======> >John, > >Try the following: > > > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > > b cd a > >"aB" "CD" "ef" > > > >-Christos > >> -----My Original Message----- > >> R pattern-matching and replacement functions are > >> vectorized: they can operate on vectors of targets. > >> However, they can only use one pattern and replacement. > >> Here is code to apply a different pattern and replacement for > >> every target. My question: can it be done better? > >> > >> sub2 <- function(pattern, replacement, x) { > >> len <- length(x) > >> if (length(pattern) == 1) > >> pattern <- rep(pattern, len) > >> if (length(replacement) == 1) > >> replacement <- rep(replacement, len) > >> FUN <- function(i, ...) { > >> sub(pattern[i], replacement[i], x[i], fixed = TRUE) > >> } > >> idx <- 1:length(x) > >> sapply(idx, FUN) > >> } > >> > >> #Example > >> X <- c("ab", "cd", "ef") > >> patt <- c("b", "cd", "a") > >> repl <- c("B", "CD", "A") > >> sub2(patt, repl, X) > >> > >> -John > ______________________________________________ > R-h... at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > > >[[alternative HTML version deleted]]
There is confusion here. apply() family functions are **NOT** vectorization -- they ARE loops (at the interpreter level), just done in "functionalized" form. Please read background material (John Chambers's books, MASS, or numerous others) to improve your understanding and avoid posting erroneous comments. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Jul 28, 2015 at 3:00 PM, John Thaden <jjthaden at flash.net> wrote:> Adam, The method you propose gives a different result than the prior methods for these example vectors > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A") > > Old method 1 > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > gives > b cd a > "aB" "CD" "ef" > > Old method 2 > > sub2 <- function(pattern, replacement, x) { > len <- length(x) > if (length(pattern) == 1) > pattern <- rep(pattern, len) > if (length(replacement) == 1) > replacement <- rep(replacement, len) > FUN <- function(i, ...) { > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > } > idx <- 1:length(x) > sapply(idx, FUN) > } > sub2(patt, repl, X) > gives > [1] "aB" "CD" "ef" > > Your method (I gave it the unique name "sub3") > sub3 <- function(pattern, replacement, x) { len <- length(x) y <- character(length=len) patlen <- length(pattern) replen <- length(replacement) if(patlen != replen) stop('Error: Pattern and replacement length do not match') for(i in 1:replen) { y[which(x==pattern[i])] <- replacement[i] } return(y)}sub3(patt, repl, X) > gives[1] "" "CD" "" > > Granted, whatever it does, it does it faster > #Old method 1 > system.time(for(i in 1:50000) > mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) > user system elapsed > 2.53 0.00 2.52 > > #Old method 2 > system.time(for(i in 1:50000)sub2(patt, repl, X)) user system elapsed > 2.32 0.00 2.32 > > #Your proposed method > system.time(for(i in 1:50000) sub3(patt, repl, X)) > user system elapsed > 1.02 0.00 1.01 > but would it still be faster if it actually solved the same problem? > > -John Thaden > > > > > On Monday, July 27, 2015 11:40 PM, Adam Erickson <adam.michael.erickson at gmail.com> wrote: > > I know this is an old thread, but I wrote a simple FOR loop with vectorized pattern replacement that is much faster than either of those (it can also accept outputs differing in length from the patterns): > sub2 <- function(pattern, replacement, x) { len <- length(x) y <- character(length=len) patlen <- length(pattern) replen <- length(replacement) if(patlen != replen) stop('Error: Pattern and replacement length do not match') for(i in 1:replen) { y[which(x==pattern[i])] <- replacement[i] } return(y) } > system.time(test <- sub2(patt, repl, XX)) user system elapsed 0 0 0 > Cheers, > Adam > On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john wrote: > Hello Christos, > To my surprise, vectorization actually hurt processing speed!#Example > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A")sub2 <- function(pattern, replacement, x) { > len <- length(x) > if (length(pattern) == 1) > pattern <- rep(pattern, len) > if (length(replacement) == 1) > replacement <- rep(replacement, len) > FUN <- function(i, ...) { > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > } > idx <- 1:length(x) > sapply(idx, FUN) > } > > system.time( for(i in 1:10000) sub2(patt, repl, X) ) > user system elapsed > 1.18 0.07 1.26 system.time( for(i in 1:10000) mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) ) > user system elapsed > 1.42 0.05 1.47 > > So much for avoiding loops. > John Thaden======= At 2008-10-07, 14:58:10 Christos wrote: =======>John, >>Try the following: >> >> mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) >> b cd a >>"aB" "CD" "ef" >> >>-Christos>> -----My Original Message----- >>> R pattern-matching and replacement functions are >>> vectorized: they can operate on vectors of targets. >>> However, they can only use one pattern and replacement. >>> Here is code to apply a different pattern and replacement for >>> every target. My question: can it be done better? >>> >>> sub2 <- function(pattern, replacement, x) { >>> len <- length(x) >>> if (length(pattern) == 1) >>> pattern <- rep(pattern, len) >>> if (length(replacement) == 1) >>> replacement <- rep(replacement, len) >>> FUN <- function(i, ...) { >>> sub(pattern[i], replacement[i], x[i], fixed = TRUE) >>> } >>> idx <- 1:length(x) >>> sapply(idx, FUN) >>> } >>> >>> #Example >>> X <- c("ab", "cd", "ef") >>> patt <- c("b", "cd", "a") >>> repl <- c("B", "CD", "A") >>> sub2(patt, repl, X) >>> >>> -John______________________________________________ > R-h... at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.