I'm not sure I understand your question. Both functions return "" "CD" "" because they perform exact string matching. The first demonstrates how string or character replacements can be vectorized, while the second merely demonstrates how Rcpp can accelerate this type of operation. Cheers, Adam> On Jul 30, 2015, at 21:09, John Thaden <jjthaden at flash.net> wrote: > > > Can you show what is its solution for the original sample data? Why that discrepancy for you original sub2() function? > > From:"Adam Erickson" <adam.michael.erickson at gmail.com> > Date:Thu, Jul 30, 2015 at 6:11 pm > Subject:Re: [R] vectorized sub, gsub, grep, etc. > > Here is a Rcpp version for exact character matching (for example) written in C++ that is substantially faster. Hence, I think this is the way to go where loops may be unavoidable. However, the input vector length has to match the length of the pattern and replacement vectors, as your original code did. That can be changed though. > > #include <Rcpp.h> > using namespace Rcpp; > > // [[Rcpp::export]] > CharacterVector subCPP(CharacterVector pattern, CharacterVector replacement, CharacterVector x) { > int len = x.size(); > CharacterVector y(len); > int patlen = pattern.size(); > int replen = replacement.size(); > if (patlen != replen) > Rcout<<"Error: Pattern and replacement length do not match"; > for(int i = 0; i < patlen; ++i) { > if (*(char*)x[i] == *(char*)pattern[i]) > y[x[i] == pattern[i]] = replacement[i]; > } > return y; > } > > "" "CD" "" > > system.time(for(i in 1:50000) subCPP(patt, repl, X)) > user system elapsed > 0.16 0.00 0.16 > > Cheers, > > Adam > >> On Wednesday, July 29, 2015 at 2:42:23 PM UTC-7, Adam Erickson wrote: >> Further refining the vectorized (within a loop) exact string match function, I get times below 0.9 seconds while maintaining error checking. This is accomplished by removing which() and replacing 1:length() with seq_along(). >> >> sub2 <- function(pattern, replacement, x) { >> len <- length(x) >> y <- character(length=len) >> patlen <- length(pattern) >> replen <- length(replacement) >> if(patlen != replen) stop('Error: Pattern and replacement length do not match') >> for(i in seq_along(pattern)) { >> y[x==pattern[i]] <- replacement[i] >> } >> return(y) >> } >> >> system.time(for(i in 1:50000) sub2(patt, repl, X)) >> user system elapsed >> 0.86 0.00 0.86 >> >> Since the ordered vectors are perfectly aligned, might as well do an exact string match. Hence, I think this is not off-topic. >> >> Cheers, >> >> Adam >> >>> On Wednesday, July 29, 2015 at 8:15:52 AM UTC-7, Bert Gunter wrote: >>> There is confusion here. apply() family functions are **NOT** >>> vectorization -- they ARE loops (at the interpreter level), just done >>> in "functionalized" form. Please read background material (John >>> Chambers's books, MASS, or numerous others) to improve your >>> understanding and avoid posting erroneous comments. >>> >>> Cheers, >>> Bert >>> >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge >>> is certainly not wisdom." >>> -- Clifford Stoll >>> >>> >>> On Tue, Jul 28, 2015 at 3:00 PM, John Thaden <jjth... at flash.net> wrote: >>> > Adam, The method you propose gives a different result than the prior methods for these example vectors >>> > X <- c("ab", "cd", "ef") >>> > patt <- c("b", "cd", "a") >>> > repl <- c("B", "CD", "A") >>> > >>> > Old method 1 >>> > >>> > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) >>> > gives >>> > b cd a >>> > "aB" "CD" "ef" >>> > >>> > Old method 2 >>> > >>> > sub2 <- function(pattern, replacement, x) { >>> > len <- length(x) >>> > if (length(pattern) == 1) >>> > pattern <- rep(pattern, len) >>> > if (length(replacement) == 1) >>> > replacement <- rep(replacement, len) >>> > FUN <- function(i, ...) { >>> > sub(pattern[i], replacement[i], x[i], fixed = TRUE) >>> > } >>> > idx <- 1:length(x) >>> > sapply(idx, FUN) >>> > } >>> > sub2(patt, repl, X) >>> > gives >>> > [1] "aB" "CD" "ef" >>> > >>> > Your method (I gave it the unique name "sub3") >>> > sub3 <- function(pattern, replacement, x) { len <- length(x) y <- character(length=len) patlen <- length(pattern) replen <- length(replacement) if(patlen != replen) stop('Error: Pattern and replacement length do not match') for(i in 1:replen) { y[which(x==pattern[i])] <- replacement[i] } return(y)}sub3(patt, repl, X) >>> > gives[1] "" "CD" "" >>> > >>> > Granted, whatever it does, it does it faster >>> > #Old method 1 >>> > system.time(for(i in 1:50000) >>> > mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) >>> > user system elapsed >>> > 2.53 0.00 2.52 >>> > >>> > #Old method 2 >>> > system.time(for(i in 1:50000)sub2(patt, repl, X)) user system elapsed >>> > 2.32 0.00 2.32 >>> > >>> > #Your proposed method >>> > system.time(for(i in 1:50000) sub3(patt, repl, X)) >>> > user system elapsed >>> > 1.02 0.00 1.01 >>> > but would it still be faster if it actually solved the same problem? >>> > >>> > -John Thaden >>> > >>> > >>> > >>> > >>> > On Monday, July 27, 2015 11:40 PM, Adam Erickson <adam.micha... at gmail.com> wrote: >>> > >>> > I know this is an old thread, but I wrote a simple FOR loop with vectorized pattern replacement that is much faster than either of those (it can also accept outputs differing in length from the patterns): >>> > sub2 <- function(pattern, replacement, x) { len <- length(x) y <- character(length=len) patlen <- length(pattern) replen <- length(replacement) if(patlen != replen) stop('Error: Pattern and replacement length do not match') for(i in 1:replen) { y[which(x==pattern[i])] <- replacement[i] } return(y) } >>> > system.time(test <- sub2(patt, repl, XX)) user system elapsed 0 0 0 >>> > Cheers, >>> > Adam >>> > On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john wrote: >>> > Hello Christos, >>> > To my surprise, vectorization actually hurt processing speed!#Example >>> > X <- c("ab", "cd", "ef") >>> > patt <- c("b", "cd", "a") >>> > repl <- c("B", "CD", "A")sub2 <- function(pattern, replacement, x) { >>> > len <- length(x) >>> > if (length(pattern) == 1) >>> > pattern <- rep(pattern, len) >>> > if (length(replacement) == 1) >>> > replacement <- rep(replacement, len) >>> > FUN <- function(i, ...) { >>> > sub(pattern[i], replacement[i], x[i], fixed = TRUE) >>> > } >>> > idx <- 1:length(x) >>> > sapply(idx, FUN) >>> > } >>> > >>> > system.time( for(i in 1:10000) sub2(patt, repl, X) ) >>> > user system elapsed >>> > 1.18 0.07 1.26 system.time( for(i in 1:10000) mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) ) >>> > user system elapsed >>> > 1.42 0.05 1.47 >>> > >>> > So much for avoiding loops. >>> > John Thaden======= At 2008-10-07, 14:58:10 Christos wrote: =======>John, >>> >>Try the following: >>> >> >>> >> mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) >>> >> b cd a >>> >>"aB" "CD" "ef" >>> >> >>> >>-Christos>> -----My Original Message----- >>> >>> R pattern-matching and replacement functions are >>> >>> vectorized: they can operate on vectors of targets. >>> >>> However, they can only use one pattern and replacement. >>> >>> Here is code to apply a different pattern and replacement for >>> >>> every target. My question: can it be done better? >>> >>> >>> >>> sub2 <- function(pattern, replacement, x) { >>> >>> len <- length(x) >>> >>> if (length(pattern) == 1) >>> >>> pattern <- rep(pattern, len) >>> >>> if (length(replacement) == 1) >>> >>> replacement <- rep(replacement, len) >>> >>> FUN <- function(i, ...) { >>> >>> sub(pattern[i], replacement[i], x[i], fixed = TRUE) >>> >>> } >>> >>> idx <- 1:length(x) >>> >>> sapply(idx, FUN) >>> >>> } >>> >>> >>> >>> #Example >>> >>> X <- c("ab", "cd", "ef") >>> >>> patt <- c("b", "cd", "a") >>> >>> repl <- c("B", "CD", "A") >>> >>> sub2(patt, repl, X) >>> >>> >>> >>> -John_________________________ _____________________ >>> > R-h... at r-project.org mailing list >>> > https://stat.ethz.ch/mailman/ listinfo/r-help >>> > PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> > >>> > >>> > >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > ______________________________ ________________ >>> > R-h... at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> > https://stat.ethz.ch/mailman/ listinfo/r-help >>> > PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________ ________________ >>> R-h... at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/ listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Adam,?? You reopened an old thread noting its age, but did you begin at its beginning?> Subject: vectorized sub, gsub, grep, etc.> Date: Oct 7, 2008 > R pattern-matching and replacement functions are > vectorized: they can operate on vectors of targets. > However, they can only use one pattern and replacement. > Here is code to apply a different pattern and replacement > for every target. My question: can it be done better?sub2 <- function(pattern, replacement, x) { len <- length(x) if (length(pattern) == 1) pattern <- rep(pattern, len) if (length(replacement) == 1) replacement <- rep(replacement, len) FUN <- function(i, ...) { sub(pattern[i], replacement[i], x[i], fixed = TRUE) } idx <- 1:length(x) sapply(idx, FUN) } #Example X <- c("ab", "cd", "ef") patt <- c("b", "cd", "a") repl <- c("B", "CD", "A") sub2(patt, repl, X) ? If you run that code, you'll see the correct answer is not "" "CD" "", it is [1] "aB" "CD" "ef" And the same answer is given by the shorter (but slower) code suggested later that day by Christos mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) b ? cd ? ?a "aB" "CD" "ef" By talking instead about simple string matching, I'm afraid you've rather hijacked the thread. -John ?-John Adam wrote > I'm not sure I understand your question. Both functions return "" "CD" "" because they> perform exact string matching. The first demonstrates how string or character replacements > can be vectorized, while the second merely demonstrates how Rcpp can accelerate this type of operation.On Jul 30, 2015, at 21:09, John Thaden <jjthaden at flash.net> wrote: | Can you show what is its solution for the original sample data? Why that discrepancy for you original sub2() function? | From:"Adam Erickson" <adam.michael.erickson at gmail.com> Date:Thu, Jul 30, 2015 at 6:11 pm Subject:Re: [R] vectorized sub, gsub, grep, etc. Here is a Rcpp version for exact character matching (for example) written in C++ that is substantially faster. Hence, I think this is the way to go where loops may be unavoidable. However, the input vector length has to match the length of the pattern and replacement vectors, as your original code did. That can be changed though. #include <Rcpp.h>using namespace Rcpp; // [[Rcpp::export]]CharacterVector subCPP(CharacterVector pattern, CharacterVector replacement, CharacterVector x) {? int len = x.size();? CharacterVector y(len);? int patlen = pattern.size();? int replen = replacement.size();? if (patlen != replen)? ? Rcout<<"Error: Pattern and replacement length do not match";? for(int i = 0; i < patlen; ++i) {? ? if (*(char*)x[i] == *(char*)pattern[i])? ? ? y[x[i] == pattern[i]] = replacement[i];? }? return y;} "" ? "CD" "" ? system.time(for(i in 1:50000) subCPP(patt, repl, X))? ?user ?system elapsed?? ?0.16 ? ?0.00 ? ?0.16? Cheers, Adam On Wednesday, July 29, 2015 at 2:42:23 PM UTC-7, Adam Erickson wrote: Further refining the vectorized (within a loop) exact string match function,?I get times below 0.9 seconds while maintaining error checking. This is accomplished by?removing which() and replacing?1:length()?with seq_along(). sub2 <- function(pattern, replacement, x) {? ?len ? ?<- length(x)? ?y ? ? ?<- character(length=len)? ?patlen <- length(pattern)? ?replen <- length(replacement)? ?if(patlen != replen) stop('Error: Pattern and replacement length do not match')? ?for(i in seq_along(pattern)) {? ? ?y[x==pattern[i]] <- replacement[i]? ?}? ?return(y)?} system.time(for(i in 1:50000) sub2(patt, repl, X))? ?user ?system elapsed?? ?0.86 ? ?0.00 ? ?0.86? Since the ordered vectors are perfectly aligned, might as well do an exact string match. Hence, I think this is not off-topic. Cheers, Adam On Wednesday, July 29, 2015 at 8:15:52 AM UTC-7, Bert Gunter wrote: There is confusion here. apply() family functions are **NOT** vectorization -- they ARE loops (at the interpreter level), just done in "functionalized" form. Please read background material (John Chambers's books, MASS, or numerous others) to improve your understanding and avoid posting erroneous comments. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." ? ?-- Clifford Stoll On Tue, Jul 28, 2015 at 3:00 PM, John Thaden <jjth... at flash.net> wrote:> Adam, ? ?The method you propose gives a different result than the prior methods for these example vectors > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A") > > Old method 1 > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > gives > ? b ? cd ? ?a > "aB" "CD" "ef" > > Old method 2 > > sub2 <- function(pattern, replacement, x) { > ? ? len <- length(x) > ? ? if (length(pattern) == 1) > ? ? ? ? pattern <- rep(pattern, len) > ? ? if (length(replacement) == 1) > ? ? ? ? replacement <- rep(replacement, len) > ? ? FUN <- function(i, ...) { > ? ? ? ? sub(pattern[i], replacement[i], x[i], fixed = TRUE) > ? ? } > ? ? idx <- 1:length(x) > ? ? sapply(idx, FUN) > } > sub2(patt, repl, X) > ?gives > [1] "aB" "CD" "ef" > > Your method (I gave it the unique name "sub3") > ?sub3 <- function(pattern, replacement, x) { ? len ? ?<- length(x) ?y ? ? ?<- character(length=len) ?patlen <- length(pattern) ?replen <- length(replacement) ?if(patlen != replen) stop('Error: Pattern and replacement length do not match') ?for(i in 1:replen) { ? ?y[which(x==pattern[i])] <- replacement[i] ?} ?return(y)}sub3(patt, repl, X) > gives[1] "" ? "CD" "" > > Granted, whatever it does, it does it faster > #Old method 1 > system.time(for(i in 1:50000) > mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) > ? ?user ?system elapsed > ? ?2.53 ? ?0.00 ? ?2.52 > > #Old method 2 > system.time(for(i in 1:50000)sub2(patt, repl, X)) ? user ?system elapsed > ? ?2.32 ? ?0.00 ? ?2.32 > > #Your proposed method > system.time(for(i in 1:50000) sub3(patt, repl, X)) > ? ?user ?system elapsed > ? ?1.02 ? ?0.00 ? ?1.01 > ?but would it still be faster if it actually solved the same problem? > > -John Thaden > > > > > ? ? ?On Monday, July 27, 2015 11:40 PM, Adam Erickson <adam.micha... at gmail.com> wrote: > > I know this is an old thread, but I wrote a simple FOR loop with vectorized pattern replacement that is much faster than either of those (it can also accept outputs differing in length from the patterns): > ? sub2 ?<- function(pattern, replacement, x) { ? ? len ? <- length(x) ? ?y ? ? ?<- character(length=len) ? ?patlen <- length(pattern) ? ?replen <- length(replacement) ? ?if(patlen != replen) stop('Error: Pattern and replacement length do not match') ? ?for(i in 1:replen) { ? ? ?y[which(x==pattern[i])] <- replacement[i] ? ?} ? ?return(y) ?} > system.time(test <- sub2(patt, repl, XX)) ? user ?system elapsed ? ? ? 0 ? ? ? 0 ? ? ? 0 > Cheers, > Adam > On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john wrote: > Hello Christos, > ? To my surprise, vectorization actually hurt processing speed!#Example > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A")sub2 <- function(pattern, replacement, x) { > ? ? len <- length(x) > ? ? if (length(pattern) == 1) > ? ? ? ? pattern <- rep(pattern, len) > ? ? if (length(replacement) == 1) > ? ? ? ? replacement <- rep(replacement, len) > ? ? FUN <- function(i, ...) { > ? ? ? ? sub(pattern[i], replacement[i], x[i], fixed = TRUE) > ? ? } > ? ? idx <- 1:length(x) > ? ? sapply(idx, FUN) > } > > system.time( ?for(i in 1:10000) ?sub2(patt, repl, X) ?) > ? ?user ?system elapsed > ? ?1.18 ? ?0.07 ? ?1.26 system.time( ?for(i in 1:10000) ?mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) ?) > ? ?user ?system elapsed > ? ?1.42 ? ?0.05 ? ?1.47 > > So much for avoiding loops. > John Thaden======= At 2008-10-07, 14:58:10 Christos wrote: =======>John, >>Try the following: >> >> mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) >> ? b ? cd ? ?a >>"aB" "CD" "ef" >> >>-Christos>> -----My Original Message----- >>> R pattern-matching and replacement functions are >>> vectorized: they can operate on vectors of targets. >>> However, they can only use one pattern and replacement. >>> Here is code to apply a different pattern and replacement for >>> every target. ?My question: can it be done better? >>> >>> sub2 <- function(pattern, replacement, x) { >>> ? ? len <- length(x) >>> ? ? if (length(pattern) == 1) >>> ? ? ? ? pattern <- rep(pattern, len) >>> ? ? if (length(replacement) == 1) >>> ? ? ? ? replacement <- rep(replacement, len) >>> ? ? FUN <- function(i, ...) { >>> ? ? ? ? sub(pattern[i], replacement[i], x[i], fixed = TRUE) >>> ? ? } >>> ? ? idx <- 1:length(x) >>> ? ? sapply(idx, FUN) >>> } >>> >>> #Example >>> X <- c("ab", "cd", "ef") >>> patt <- c("b", "cd", "a") >>> repl <- c("B", "CD", "A") >>> sub2(patt, repl, X) >>> >>> -John_________________________ _____________________ > R-h... at r-project.org mailing list > https://stat.ethz.ch/mailman/ listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > ? ? ? ? [[alternative HTML version deleted]] > > ______________________________ ________________ > R-h... at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/ listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________ ________________ R-h... at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/ listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code. | | [[alternative HTML version deleted]]
Hi John, So, you think looping over the sub() function with regular expressions disabled is somehow more nuanced? Perhaps you should have specified that you were only interested in sub() function results. Regardless, the original function failed to match the 'a' in 'ab,' which should have returned 'AB' at that vector position. Hence, the 'correct' results are not really correct. It only returned the first match. The qdap library provides a function for this: library(qdap) mgsub(patt,repl,X) [1] "AB" "CD" "ef" Based on my previous pattern matching and replacement code, there is clearly an opportunity to accelerate these function with Rcpp. I think that is the main takeaway from all of this. Unfortunately, I cannot dedicade my own time to this at the moment. Until then, exact string matching provides an elegant and efficient solution. A little data preparation, which is also quite fast, is all that is needed. Cheers, Adam On Fri, Jul 31, 2015 at 11:15 PM, John Thaden <jjthaden at flash.net> wrote:> Adam, > > You reopened an old thread noting its age, but did you begin at its > beginning? > > > Subject: vectorized sub, gsub, grep, etc. > > Date: Oct 7, 2008 > > > R pattern-matching and replacement functions are > > vectorized: they can operate on vectors of targets. > > However, they can only use one pattern and replacement. > > Here is code to apply a different pattern and replacement > > for every target. My question: can it be done better? > > sub2 <- function(pattern, replacement, x) { > len <- length(x) > if (length(pattern) == 1) > pattern <- rep(pattern, len) > if (length(replacement) == 1) > replacement <- rep(replacement, len) > FUN <- function(i, ...) { > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > } > idx <- 1:length(x) > sapply(idx, FUN) > } > > #Example > X <- c("ab", "cd", "ef") > patt <- c("b", "cd", "a") > repl <- c("B", "CD", "A") > sub2(patt, repl, X) > > If you run that code, you'll see the correct answer is not "" "CD" "", it is > > [1] "aB" "CD" "ef" > And the same answer is given by the shorter (but slower) code suggested later that day by Christos > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > b cd a > "aB" "CD" "ef" > > By talking instead about simple string matching, I'm afraid you've rather hijacked the thread. > > -John > > > -John > > > Adam wrote > > > > > > I'm not sure I understand your question. Both functions return "" "CD" > "" because they > > perform exact string matching. The first demonstrates how string or > character replacements > > can be vectorized, while the second merely demonstrates how Rcpp can > accelerate this type of operation. > > > On Jul 30, 2015, at 21:09, John Thaden <jjthaden at flash.net> wrote: > > Can you show what is its solution for the original sample data? Why that > discrepancy for you original sub2() function? > From:"Adam Erickson" <adam.michael.erickson at gmail.com> > Date:Thu, Jul 30, 2015 at 6:11 pm > Subject:Re: [R] vectorized sub, gsub, grep, etc. > > Here is a Rcpp version for exact character matching (for example) written > in C++ that is substantially faster. Hence, I think this is the way to go > where loops may be unavoidable. However, the input vector length has to > match the length of the pattern and replacement vectors, as your original > code did. That can be changed though. > > #include <Rcpp.h> > using namespace Rcpp; > > // [[Rcpp::export]] > CharacterVector subCPP(CharacterVector pattern, CharacterVector > replacement, CharacterVector x) { > int len = x.size(); > CharacterVector y(len); > int patlen = pattern.size(); > int replen = replacement.size(); > if (patlen != replen) > Rcout<<"Error: Pattern and replacement length do not match"; > for(int i = 0; i < patlen; ++i) { > if (*(char*)x[i] == *(char*)pattern[i]) > y[x[i] == pattern[i]] = replacement[i]; > } > return y; > } > > "" "CD" "" > > system.time(for(i in 1:50000) subCPP(patt, repl, X)) > user system elapsed > 0.16 0.00 0.16 > > Cheers, > > Adam > > On Wednesday, July 29, 2015 at 2:42:23 PM UTC-7, Adam Erickson wrote: > > Further refining the vectorized (within a loop) exact string match > function, I get times below 0.9 seconds while maintaining error checking. > This is accomplished by removing which() and replacing 1:length() with > seq_along(). > > sub2 <- function(pattern, replacement, x) { > len <- length(x) > y <- character(length=len) > patlen <- length(pattern) > replen <- length(replacement) > if(patlen != replen) stop('Error: Pattern and replacement length do not > match') > for(i in seq_along(pattern)) { > y[x==pattern[i]] <- replacement[i] > } > return(y) > } > > system.time(for(i in 1:50000) sub2(patt, repl, X)) > user system elapsed > 0.86 0.00 0.86 > > Since the ordered vectors are perfectly aligned, might as well do an exact > string match. Hence, I think this is not off-topic. > > Cheers, > > Adam > > On Wednesday, July 29, 2015 at 8:15:52 AM UTC-7, Bert Gunter wrote: > > There is confusion here. apply() family functions are **NOT** > vectorization -- they ARE loops (at the interpreter level), just done > in "functionalized" form. Please read background material (John > Chambers's books, MASS, or numerous others) to improve your > understanding and avoid posting erroneous comments. > > Cheers, > Bert > > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > -- Clifford Stoll > > > On Tue, Jul 28, 2015 at 3:00 PM, John Thaden <jjth... at flash.net> wrote: > > Adam, The method you propose gives a different result than the prior > methods for these example vectors > > X <- c("ab", "cd", "ef") > > patt <- c("b", "cd", "a") > > repl <- c("B", "CD", "A") > > > > Old method 1 > > > > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, > x=X) > > gives > > b cd a > > "aB" "CD" "ef" > > > > Old method 2 > > > > sub2 <- function(pattern, replacement, x) { > > len <- length(x) > > if (length(pattern) == 1) > > pattern <- rep(pattern, len) > > if (length(replacement) == 1) > > replacement <- rep(replacement, len) > > FUN <- function(i, ...) { > > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > > } > > idx <- 1:length(x) > > sapply(idx, FUN) > > } > > sub2(patt, repl, X) > > gives > > [1] "aB" "CD" "ef" > > > > Your method (I gave it the unique name "sub3") > > sub3 <- function(pattern, replacement, x) { len <- length(x) y > <- character(length=len) patlen <- length(pattern) replen <- > length(replacement) if(patlen != replen) stop('Error: Pattern and > replacement length do not match') for(i in 1:replen) { > y[which(x==pattern[i])] <- replacement[i] } return(y)}sub3(patt, repl, > X) > > gives[1] "" "CD" "" > > > > Granted, whatever it does, it does it faster > > #Old method 1 > > system.time(for(i in 1:50000) > > mapply(function(p,r,x) sub(p,r,x, fixed = TRUE),p=patt,r=repl,x=X)) > > user system elapsed > > 2.53 0.00 2.52 > > > > #Old method 2 > > system.time(for(i in 1:50000)sub2(patt, repl, X)) user system elapsed > > 2.32 0.00 2.32 > > > > #Your proposed method > > system.time(for(i in 1:50000) sub3(patt, repl, X)) > > user system elapsed > > 1.02 0.00 1.01 > > but would it still be faster if it actually solved the same problem? > > > > -John Thaden > > > > > > > > > > On Monday, July 27, 2015 11:40 PM, Adam Erickson < > adam.micha... at gmail.com> wrote: > > > > I know this is an old thread, but I wrote a simple FOR loop with > vectorized pattern replacement that is much faster than either of those (it > can also accept outputs differing in length from the patterns): > > sub2 <- function(pattern, replacement, x) { len <- length(x) > y <- character(length=len) patlen <- length(pattern) replen <- > length(replacement) if(patlen != replen) stop('Error: Pattern and > replacement length do not match') for(i in 1:replen) { > y[which(x==pattern[i])] <- replacement[i] } return(y) } > > system.time(test <- sub2(patt, repl, XX)) user system elapsed 0 > 0 0 > > Cheers, > > Adam > > On Wednesday, October 8, 2008 at 9:38:01 PM UTC-7, john wrote: > > Hello Christos, > > To my surprise, vectorization actually hurt processing speed!#Example > > X <- c("ab", "cd", "ef") > > patt <- c("b", "cd", "a") > > repl <- c("B", "CD", "A")sub2 <- function(pattern, replacement, x) { > > len <- length(x) > > if (length(pattern) == 1) > > pattern <- rep(pattern, len) > > if (length(replacement) == 1) > > replacement <- rep(replacement, len) > > FUN <- function(i, ...) { > > sub(pattern[i], replacement[i], x[i], fixed = TRUE) > > } > > idx <- 1:length(x) > > sapply(idx, FUN) > > } > > > > system.time( for(i in 1:10000) sub2(patt, repl, X) ) > > user system elapsed > > 1.18 0.07 1.26 system.time( for(i in 1:10000) > mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, x=X) > ) > > user system elapsed > > 1.42 0.05 1.47 > > > > So much for avoiding loops. > > John Thaden======= At 2008-10-07, 14:58:10 Christos wrote: =======>John, > >>Try the following: > >> > >> mapply(function(p, r, x) sub(p, r, x, fixed = TRUE), p=patt, r=repl, > x=X) > >> b cd a > >>"aB" "CD" "ef" > >> > >>-Christos>> -----My Original Message----- > >>> R pattern-matching and replacement functions are > >>> vectorized: they can operate on vectors of targets. > >>> However, they can only use one pattern and replacement. > >>> Here is code to apply a different pattern and replacement for > >>> every target. My question: can it be done better? > >>> > >>> sub2 <- function(pattern, replacement, x) { > >>> len <- length(x) > >>> if (length(pattern) == 1) > >>> pattern <- rep(pattern, len) > >>> if (length(replacement) == 1) > >>> replacement <- rep(replacement, len) > >>> FUN <- function(i, ...) { > >>> sub(pattern[i], replacement[i], x[i], fixed = TRUE) > >>> } > >>> idx <- 1:length(x) > >>> sapply(idx, FUN) > >>> } > >>> > >>> #Example > >>> X <- c("ab", "cd", "ef") > >>> patt <- c("b", "cd", "a") > >>> repl <- c("B", "CD", "A") > >>> sub2(patt, repl, X) > >>> > >>> -John_________________________ _____________________ > > R-h... at r-project.org mailing list > > https://stat.ethz.ch/mailman/ listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html <http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________ ________________ > > R-h... at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/ listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html <http://www.r-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________ ________________ > R-h... at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/ listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > > >[[alternative HTML version deleted]]