HI R-experts, I am trying to speed up my calculation of the A results below and replace the for loop withsome functionals like lapply.? After manyreadings, trial and error, I still have no success.? Would anyone please give me some hints onthat?? Thank you in advance. Anne? The program is this, I have a complicated function and itneeds to operate on some subsets of a dataset many times, depending on thevalues of group.? I simplify the functionand dataset for this example run.? getResult <- function(d) { ? ? ??#examplefunction ?????weighted.mean(x=d[,1], w=d[,2]) } ? #example data setup n=20; set.seed(1) g=rep(1:5,each=4) df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1, 0.4) , g )); df getResult(df) i0=c(1,2,4,5,5) ng= length(unique(g)) ? #initiation of result matrix A=matrix(Inf, ng,?ng); A for(i in 1:ng) {????????????? cat("i:",i,"") ??????????????? for(jin i0[i]:ng) { ??????????????????????????????? ok= !is.na(match(g,i:j)); cat("j:",j,"\n"); ??????????????? ?? ???????????? A[i,j]=getResult(d=df[ok,]) ??????????????? } #endfor (j) } #end for (i) Is there an elegant way to remove the for loop here?? I try to make it flat for faster run but Icannot figure out how to subset the observations faster without error to apply the functiongetResult.? Any hint is appreciated. ? ? on another note, is there a more elegant way to initiate the list as follows? mylist=list(); w=rep(4,5) for (i in 1:5) mylist[[i]]=w[i:5] ? [[alternative HTML version deleted]]
The answer to "another note" is: mapply(rep, w, 5:1) I'll try to look at the first part in more detail later today. -- Mike On Mon, Oct 12, 2015 at 5:55 PM, Annie Hawk via R-help <r-help at r-project.org> wrote:> HI R-experts, > > > I am trying to speed up my calculation of the A results below and replace the for loop withsome functionals like lapply. After manyreadings, trial and error, I still have no success. Would anyone please give me some hints onthat? > > Thank you in advance. > > Anne > > > The program is this, I have a complicated function and itneeds to operate on some subsets of a dataset many times, depending on thevalues of group. I simplify the functionand dataset for this example run. > > getResult <- function(d) { > > #examplefunction > > weighted.mean(x=d[,1], w=d[,2]) > > } > > > > #example data setup > > n=20; > > set.seed(1) > > g=rep(1:5,each=4) > > df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1, 0.4) , g )); df > > getResult(df) > > i0=c(1,2,4,5,5) > > ng= length(unique(g)) > > > > #initiation of result matrix > > A=matrix(Inf, ng, ng); A > > for(i in 1:ng) > > { cat("i:",i,"") > > for(jin i0[i]:ng) { > > ok= !is.na(match(g,i:j)); cat("j:",j,"\n"); > > A[i,j]=getResult(d=df[ok,]) > > } #endfor (j) > > } #end for (i) > > Is there an elegant way to remove the for loop here? I try to make it flat for faster run but Icannot figure out how to subset the observations faster without error to apply the functiongetResult. Any hint is appreciated. > > > > > > on another note, is there a more elegant way to initiate the list as follows? > > mylist=list(); w=rep(4,5) > > for (i in 1:5) mylist[[i]]=w[i:5] > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I've done a simple-minded transliteration of your code into code using nested lapply's. I doubt that it buys you much in terms of performance (or even clarity, which is really one of the main advantages of the `apply` family).> A[,1] [,2] [,3] [,4] [,5] [1,] 3.06097 6.507521 10.99610 12.05556 15.10388 [2,] Inf 11.818495 15.85044 16.69465 19.70425 [3,] Inf Inf Inf 19.14779 22.30343 [4,] Inf Inf Inf Inf 26.11170 [5,] Inf Inf Inf Inf 28.29882> B[,1] [,2] [,3] [,4] [,5] [1,] 3.06097 6.507521 10.99610 12.05556 15.10388 [2,] Inf 11.818495 15.85044 16.69465 19.70425 [3,] Inf Inf Inf 19.14779 22.30343 [4,] Inf Inf Inf Inf 26.11170 [5,] Inf Inf Inf Inf 28.29882> all.equal(A, B)[1] TRUE If I happen to think of a more-elegant approach, I'll let you know. -- Mike Appendix: code ============= ###### Anne's code getResult <- function(d) { #examplefunction weighted.mean(x=d[,1], w=d[,2]) } #example data setup n=20; set.seed(1) g=rep(1:5,each=4) df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1, 0.4) , g )); df getResult(df) i0=c(1,2,4,5,5) ng= length(unique(g)) #initiation of result matrix A=matrix(Inf, ng, ng); A for(i in 1:ng) { cat("i:",i,"") for(j in i0[i]:ng) { ok= !is.na(match(g,i:j)); cat("j:",j,"\n"); A[i,j]=getResult(d=df[ok,]) } #endfor (j) } #end for (i) A ###### Mike's code n <- 20; set.seed(1) g <- rep(1:5,each=4) df <- as.data.frame(cbind(sort(rnorm(mean=15,sd=10, n)), runif(n), rbinom(n, 1, 0.4), g )); df getResult(df) i0 <- c(1,2,4,5,5) ng <- length(unique(g)) B <- matrix(Inf, ng, ng); invisible(lapply(1:ng, function(i) { lapply(i0[i]:ng, function(j) { ok <- !is.na(match(g, i:j)) B[i, j] <<- getResult(df[ok, ]) }) })) B all.equal(A, B) On Mon, Oct 12, 2015 at 5:55 PM, Annie Hawk via R-help <r-help at r-project.org> wrote:> HI R-experts, > > > I am trying to speed up my calculation of the A results below and replace the for loop withsome functionals like lapply. After manyreadings, trial and error, I still have no success. Would anyone please give me some hints onthat? > > Thank you in advance. > > Anne > > > The program is this, I have a complicated function and itneeds to operate on some subsets of a dataset many times, depending on thevalues of group. I simplify the functionand dataset for this example run. > > getResult <- function(d) { > > #examplefunction > > weighted.mean(x=d[,1], w=d[,2]) > > } > > > > #example data setup > > n=20; > > set.seed(1) > > g=rep(1:5,each=4) > > df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1, 0.4) , g )); df > > getResult(df) > > i0=c(1,2,4,5,5) > > ng= length(unique(g)) > > > > #initiation of result matrix > > A=matrix(Inf, ng, ng); A > > for(i in 1:ng) > > { cat("i:",i,"") > > for(jin i0[i]:ng) { > > ok= !is.na(match(g,i:j)); cat("j:",j,"\n"); > > A[i,j]=getResult(d=df[ok,]) > > } #endfor (j) > > } #end for (i) > > Is there an elegant way to remove the for loop here? I try to make it flat for faster run but Icannot figure out how to subset the observations faster without error to apply the functiongetResult. Any hint is appreciated. > > > > > > on another note, is there a more elegant way to initiate the list as follows? > > mylist=list(); w=rep(4,5) > > for (i in 1:5) mylist[[i]]=w[i:5] > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
> df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n,1, 0.4) , g )) This is a lousy way to make a data.frame - the cbind forces all columns to be the same type and forces them into one vector then as.data.frame splits them up into separate columns again. You also get weird names for your columns. If you want to make a data.frame, use df <- data.frame(ColA = sort(rnorm(mean=15,sd=10, n)), ColB = runif(n), ColC = rbinom(n, 1, 0.4) , g = g) However, since the columns you are passing to getResult are both numeric a matrix (made with cbind) would work just as well and selecting rows from it will probably be faster. You will have to have a large number of groups before you notice the difference. Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Oct 14, 2015 at 2:02 AM, Michael Hannon <jmhannon.ucdavis at gmail.com> wrote:> I've done a simple-minded transliteration of your code into code using > nested > lapply's. I doubt that it buys you much in terms of performance (or even > clarity, which is really one of the main advantages of the `apply` family). > > > > A > [,1] [,2] [,3] [,4] [,5] > [1,] 3.06097 6.507521 10.99610 12.05556 15.10388 > [2,] Inf 11.818495 15.85044 16.69465 19.70425 > [3,] Inf Inf Inf 19.14779 22.30343 > [4,] Inf Inf Inf Inf 26.11170 > [5,] Inf Inf Inf Inf 28.29882 > > > B > [,1] [,2] [,3] [,4] [,5] > [1,] 3.06097 6.507521 10.99610 12.05556 15.10388 > [2,] Inf 11.818495 15.85044 16.69465 19.70425 > [3,] Inf Inf Inf 19.14779 22.30343 > [4,] Inf Inf Inf Inf 26.11170 > [5,] Inf Inf Inf Inf 28.29882 > > all.equal(A, B) > [1] TRUE > > If I happen to think of a more-elegant approach, I'll let you know. > > -- Mike > > Appendix: code > =============> > ###### Anne's code > > getResult <- function(d) { > > #examplefunction > > weighted.mean(x=d[,1], w=d[,2]) > > } > > #example data setup > > n=20; > > set.seed(1) > > g=rep(1:5,each=4) > > df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, > 1, > 0.4) , g )); df > > getResult(df) > > i0=c(1,2,4,5,5) > > ng= length(unique(g)) > > > > #initiation of result matrix > > A=matrix(Inf, ng, ng); A > > for(i in 1:ng) > > { cat("i:",i,"") > > for(j in i0[i]:ng) { > > ok= !is.na(match(g,i:j)); > cat("j:",j,"\n"); > > A[i,j]=getResult(d=df[ok,]) > > } #endfor (j) > > } #end for (i) > A > > ###### Mike's code > > n <- 20; > set.seed(1) > g <- rep(1:5,each=4) > df <- as.data.frame(cbind(sort(rnorm(mean=15,sd=10, n)), > runif(n), > rbinom(n, 1, 0.4), > g )); df > getResult(df) > i0 <- c(1,2,4,5,5) > ng <- length(unique(g)) > > B <- matrix(Inf, ng, ng); > > invisible(lapply(1:ng, function(i) { > lapply(i0[i]:ng, function(j) { > ok <- !is.na(match(g, i:j)) > B[i, j] <<- getResult(df[ok, ]) > }) > })) > > B > all.equal(A, B) > > > On Mon, Oct 12, 2015 at 5:55 PM, Annie Hawk via R-help > <r-help at r-project.org> wrote: > > HI R-experts, > > > > > > I am trying to speed up my calculation of the A results below and > replace the for loop withsome functionals like lapply. After manyreadings, > trial and error, I still have no success. Would anyone please give me some > hints onthat? > > > > Thank you in advance. > > > > Anne > > > > > > The program is this, I have a complicated function and itneeds to > operate on some subsets of a dataset many times, depending on thevalues of > group. I simplify the functionand dataset for this example run. > > > > getResult <- function(d) { > > > > #examplefunction > > > > weighted.mean(x=d[,1], w=d[,2]) > > > > } > > > > > > > > #example data setup > > > > n=20; > > > > set.seed(1) > > > > g=rep(1:5,each=4) > > > > df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), > rbinom(n, 1, 0.4) , g )); df > > > > getResult(df) > > > > i0=c(1,2,4,5,5) > > > > ng= length(unique(g)) > > > > > > > > #initiation of result matrix > > > > A=matrix(Inf, ng, ng); A > > > > for(i in 1:ng) > > > > { cat("i:",i,"") > > > > for(jin i0[i]:ng) { > > > > ok= !is.na(match(g,i:j)); > cat("j:",j,"\n"); > > > > A[i,j]=getResult(d=df[ok,]) > > > > } #endfor (j) > > > > } #end for (i) > > > > Is there an elegant way to remove the for loop here? I try to make it > flat for faster run but Icannot figure out how to subset the observations > faster without error to apply the functiongetResult. Any hint is > appreciated. > > > > > > > > > > > > on another note, is there a more elegant way to initiate the list as > follows? > > > > mylist=list(); w=rep(4,5) > > > > for (i in 1:5) mylist[[i]]=w[i:5] > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]