Hello All, My algorithm as follows; y <- c(1,1,1,0,0,1,0,1,0,0) x <- c(1,0,0,1,1,0,0,1,1,0) n <- length(x) t <- matrix(cbind(y,x), ncol=2) z = x+y for(j in 1:length(x)) { out <- vector("list", ) for(i in 1:10) { t.s <- t[sample(n,n,replace=T),] y.s <- t.s[,1] x.s <- t.s[,2] z.s <- y.s+x.s out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) kk <- sapply(out, function(x) {x$finding}) ff <- out[! kk] } I tried to find the total of the two vectors as statistic by using bootstrap. Finally, I want to get the values which do not contain the y's each elemet. In the algorithm ti is referred to "ff". But i get always the same result ;> fflist()> kk[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE Because, my "y" vector contains only 2 elements, and probably all of the bootstrap resamples include "1", or all of resamples include "0". So I can not find the true matches. Can anyone help me about how to be? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/indexing-tp4428210p4428210.html Sent from the R help mailing list archive at Nabble.com.
Hi> > My algorithm as follows; > y <- c(1,1,1,0,0,1,0,1,0,0) > x <- c(1,0,0,1,1,0,0,1,1,0) > > n <- length(x) > > t <- matrix(cbind(y,x), ncol=2)Do not use t, it is a function for transposing matrix and after you redefine it you can get nasty surprise in future. tt <- cbind(y,x) is enough> > z = x+y > > for(j in 1:length(x)) { > out <- vector("list", ) > > for(i in 1:10) { > > t.s <- t[sample(n,n,replace=T),]t.s <- tt[sample(n,n,replace=T),]> > y.s <- t.s[,1] > x.s <- t.s[,2] > > z.s <- y.s+x.s > > out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j]))Here you compare vector y.s with one element of y as y.s is set of (0,1) values y is either 0 or 1, any tests if there is any match so only in rare case where all values in y.s are 0 and y[something] is 1 you get FALSE> kk <- sapply(out, function(x) {x$finding})finding is (almost) always TRUE therefore kk is TRUE> ff <- out[! kk] > } >> I tried to find the total of the two vectors as statistic by using > bootstrap. Finally, I want to get the values which do not contain they's> each elemet. In the algorithm ti is referred to "ff". But i get alwaysthe> same result ;I do not understand your intention so it is difficult to help. What is total of two vectors? sum? What does it mean "to get values which do not contain y's each element"? Maybe you shall rethink your code and first try to evaluate each line separately to see what it does and if the result is same as you intended. Regards Petr> > ff > list() > > kk > [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > Because, my "y" vector contains only 2 elements, and probably all of the > bootstrap resamples include "1", or all of resamples include "0". So Ican> not find the true matches. Can anyone help me about how to be? > Thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/indexing- > tp4428210p4428210.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
On Tue, Feb 28, 2012 at 05:59:24AM -0800, helin_susam wrote:> Hello All, > > My algorithm as follows; > y <- c(1,1,1,0,0,1,0,1,0,0) > x <- c(1,0,0,1,1,0,0,1,1,0) > > n <- length(x) > > t <- matrix(cbind(y,x), ncol=2) > > z = x+y > > for(j in 1:length(x)) { > out <- vector("list", ) > > for(i in 1:10) { > > t.s <- t[sample(n,n,replace=T),] > > y.s <- t.s[,1] > x.s <- t.s[,2] > > z.s <- y.s+x.s > > out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) > kk <- sapply(out, function(x) {x$finding}) > ff <- out[! kk] > } > > I tried to find the total of the two vectors as statistic by using > bootstrap. Finally, I want to get the values which do not contain the y's > each elemet. In the algorithm ti is referred to "ff". But i get always the > same result ; > > ff > list() > > kk > [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE > Because, my "y" vector contains only 2 elements, and probably all of the > bootstrap resamples include "1", or all of resamples include "0". So I can > not find the true matches. Can anyone help me about how to be?Hi. First of all, there are some unclear points in your code. In particular, i would expect "}" between the line out[[i]] <- list(... and kk <- sapply(... Moreover, i do not see, why the loop over j contains the loop over i. I would expect these loops be disjoint, since the loop over i collects all the samples to a list. The following code is a modification, which i suggest as an alternative. y <- c(1:5, 1:5) x <- c(1,0,0,1,1,0,0,1,1,0) n <- length(x) t <- matrix(cbind(y,x), ncol=2) z = x+y # generate 10 bootstrap samples and keep z.s, y.s out <- vector("list", 10) for(i in 1:10) { t.s <- t[sample(n,n,replace=T),] y.s <- t.s[,1] x.s <- t.s[,2] z.s <- y.s+x.s out[[i]] <- list(zz = z.s, yy =y.s) } # check, which replications do not contain y[j] in their y.s, # and take the OR of these conditions over j ff <- rep(FALSE, times=length(out)) for(j in 1:length(y)) { kk <- sapply(out, function(x) {any(x$yy == y[j])}) ff <- ff | (! kk) } out[ff] With the original y <- c(1,1,1,0,0,1,0,1,0,0), the probability that a bootstrap sample contains only 1's or only 0's is 2 * (1/2)^10, so i replaced the vector y with another, where a missing value is more frequent. I obtained, for example [[1]] [[1]]$zz [1] 2 2 5 2 3 2 3 2 2 6 [[1]]$yy [1] 1 1 5 1 3 2 3 2 1 5 # 4 is missing [[2]] [[2]]$zz [1] 5 5 5 5 3 5 2 5 6 4 [[2]]$yy [1] 4 4 5 4 3 5 2 5 5 3 # 1 is missing [[3]] [[3]]$zz [1] 5 2 5 1 5 1 2 5 5 5 [[3]]$yy [1] 4 2 5 1 5 1 1 4 5 4 # 3 is missing Hope this helps. Petr Savicky.
Dear Petr Pikal and Petr Savicky thank you for your replies.. If the y vector contains different elements my algorithm gives this result; y <- c(1,2,3,4,5,6,7,8,9,10) x <- c(1,0,0,1,1,0,0,1,1,0) n <- length(x) t <- matrix(cbind(y,x), ncol=2) z = x+y for(j in 1:length(x)) { out <- vector("list", ) for(i in 1:10) { t.s <- t[sample(n,n,replace=T),] y.s <- t.s[,1] x.s <- t.s[,2] z.s <- y.s+x.s out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) kk <- sapply(out, function(x) {x$finding}) ff <- out[! kk] } }> kk[1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE> ff[[1]] [[1]][[1]] [1] 5 7 3 2 2 6 7 2 6 6 [[1]]$finding [1] FALSE [[2]] [[2]][[1]] [1] 7 10 6 2 2 2 6 6 9 3 [[2]]$finding [1] FALSE Here, the two situations are FALSE, that is 5th and 10th bootstrap re-samples do not contain one (or more) element(s) of original vector ("y"). How can I get the similar result when the y vector includes the only response variable (1 or 0) ? That is y <- c(1,1,1,0,0,1,0,1,0,0) Many thanks. -- View this message in context: http://r.789695.n4.nabble.com/indexing-tp4428210p4428746.html Sent from the R help mailing list archive at Nabble.com.
On Tue, Feb 28, 2012 at 08:50:45AM -0800, helin_susam wrote:> Dear Petr Pikal and Petr Savicky thank you for your replies.. > > If the y vector contains different elements my algorithm gives this result; > y <- c(1,2,3,4,5,6,7,8,9,10) > x <- c(1,0,0,1,1,0,0,1,1,0) > > n <- length(x) > > t <- matrix(cbind(y,x), ncol=2) > > z = x+y > > for(j in 1:length(x)) { > out <- vector("list", ) > > for(i in 1:10) { > > t.s <- t[sample(n,n,replace=T),] > > y.s <- t.s[,1] > x.s <- t.s[,2] > > z.s <- y.s+x.s > > out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) > kk <- sapply(out, function(x) {x$finding}) > ff <- out[! kk] > } > }Hi. It is hard to debug a code, which we do not understand. Both me and Petr Pikal expressed objections against your code. It would help us to reply your question, if you take our objections and suggestions into account or explain, what we do not understand well. Can you comment on the suggestions from the previous emails? I would like to add one more. Why do you use ff <- (z.s) inside out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) ? This expression includes the value into the list, but not under the name ff and rewrites the global variable ff instead. If you want to include (z.s) as a component named as ff, then use list(ff = (z.s), finding... Petr Savicky.
On Tue, Feb 28, 2012 at 08:50:45AM -0800, helin_susam wrote:> Dear Petr Pikal and Petr Savicky thank you for your replies.. > > If the y vector contains different elements my algorithm gives this result; > y <- c(1,2,3,4,5,6,7,8,9,10) > x <- c(1,0,0,1,1,0,0,1,1,0) > > n <- length(x) > > t <- matrix(cbind(y,x), ncol=2) > > z = x+y > > for(j in 1:length(x)) { > out <- vector("list", ) > > for(i in 1:10) { > > t.s <- t[sample(n,n,replace=T),] > > y.s <- t.s[,1] > x.s <- t.s[,2] > > z.s <- y.s+x.s > > out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) > kk <- sapply(out, function(x) {x$finding}) > ff <- out[! kk] > } > } > > > kk > [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE FALSE > > ff > [[1]] > [[1]][[1]] > [1] 5 7 3 2 2 6 7 2 6 6 > > [[1]]$finding > [1] FALSE > > > [[2]] > [[2]][[1]] > [1] 7 10 6 2 2 2 6 6 9 3 > > [[2]]$finding > [1] FALSE > > Here, the two situations are FALSE, that is 5th and 10th bootstrap > re-samples do not contain one (or more) element(s) of original vector ("y").Hi. Your code generates a new list "out" for each j. This means that you generate a list "out", test the presence of y[1] in its components, then delete "out", replace it by a new list and test the presence of y[2] in this new list, then "out" is deleted and replaced by another "out", etc. This is probably not, what you want. Is this correct? Petr Savicky.
Dear Petr Savicky, Actually, this is based on jackknife after bootstrap algorithm. In summary, I have a data set, and I want to compute some values by using this algorithm. Firstly, using bootstrap, I create some bootstrap re-samples. This step O.K. Then, for each data point within these re-samples, I want to get a subset which do not contain that data point ( this point would be any point of the original data set), in general, if B is the number of bootstrap-resamples, there are B/e resamples obtained for each data point. And finally, I want to calculate some values for each of this re samples. Explanation of my algorithm; #My data set: (x and y) y <- c(1,2,3,4,5,6,7,8,9,10) x <- c(1,0,0,1,1,0,0,1,1,0) n <- length(x) t <- matrix(cbind(y,x), ncol=2) z = x+y for(j in 1:length(x)) { out <- vector("list", ) for(i in 1:10) { t.s <- t[sample(n,n,replace=T),] # Here is the bootstrap step y.s <- t.s[,1] x.s <- t.s[,2] z.s <- y.s+x.s nn <- sum (z.s) # For example, I want to calculate this value out[[i]] <- list(ff <- (nn), finding=any (y.s==y[j])) # I get the mentioned subset in here kk <- sapply(out, function(x) {x$finding}) ff <- out[! kk] } } I obtained the following results of an experiment;> kk[1] FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE> ff[[1]] [[1]][[1]] [1] 47 [[1]]$finding [1] FALSE [[2]] [[2]][[1]] [1] 46 [[2]]$finding [1] FALSE [[3]] [[3]][[1]] [1] 52 [[3]]$finding [1] FALSE It is easy to do when "y" contains different elements. "out[[i]] <- list(ff <- (nn), finding=any (y.s==y[j]))" But, when y contains the same element, doing this process can be confusing confusing.. Because, (y <- c(1,1,1,0,0,1,0,1,0,0)) for y[j] when j= 1 there are some other 1 in the y. Is there something special about the y to an j ? Thanks -- View this message in context: http://r.789695.n4.nabble.com/indexing-tp4428210p4429280.html Sent from the R help mailing list archive at Nabble.com.
On Tue, Feb 28, 2012 at 11:42:32AM -0800, helin_susam wrote:> Dear Petr Savicky, > > Actually, this is based on jackknife after bootstrap algorithm. In summary, > > I have a data set, and I want to compute some values by using this > algorithm. > > Firstly, using bootstrap, I create some bootstrap re-samples. This step O.K. > Then, for each data point within these re-samples, I want to get a subsetThe point y[j], which you are searching in the generated samples, is not from "these re-samples", but from the original data set.> which do not contain that data point ( this point would be any point of the > original data set), in general, if B is the number of bootstrap-resamples, > there are B/e resamples obtained for each data point.Your previous explanations were more accurate in this point and implied that you want to take all resamples, which miss at least one of y[j].> And finally, I want > to calculate some values for each of this re samples.> Explanation of my algorithm; > > #My data set: (x and y) > y <- c(1,2,3,4,5,6,7,8,9,10) > x <- c(1,0,0,1,1,0,0,1,1,0) > > n <- length(x) > > t <- matrix(cbind(y,x), ncol=2) > > z = x+y > > for(j in 1:length(x)) { > out <- vector("list", ) > > for(i in 1:10) { > > t.s <- t[sample(n,n,replace=T),] # Here is the bootstrap step > > y.s <- t.s[,1] > x.s <- t.s[,2] > > z.s <- y.s+x.s > nn <- sum (z.s) # For example, I want to calculate this value > > out[[i]] <- list(ff <- (nn), finding=any (y.s==y[j])) # I get the mentioned > subset in here > kk <- sapply(out, function(x) {x$finding}) > ff <- out[! kk] > } > }You did not reply to the question concerning regenerating "out" for each "j" and using "<-" inside a list. This makes a discussion complicated. The following code is equivalent to your code. y <- c(1,2,3,4,5,6,7,8,9,10) x <- c(1,0,0,1,1,0,0,1,1,0) n <- length(x) tt <- unname(cbind(y,x)) # do not overwrite function t() z <- x+y # needed only to shift the sequence of random numbers for (j in 1:(10*(n-1))) sample(n,n,replace=T) j <- length(x) out <- vector("list") for(i in 1:10) { tt.s <- tt[sample(n,n,replace=T),] # Here is the bootstrap step y.s <- tt.s[,1] x.s <- tt.s[,2] z.s <- y.s+x.s nn <- sum(z.s) # For example, I want to calculate this value out[[i]] <- list((nn), finding=any(y.s==y[j])) # I get the mentioned subset in here } kk <- sapply(out, function(x) {x$finding}) ff <- out[! kk] You can check the equivalence by running both codes with the same command set.seed(seed) at the beginning. I tried this and the obtained "ff" were identical for several different values of "seed". What can be seen is that the output depends only on the run of the loop for j with the value j = length(x). Searching the values y[j] for j = 1, ..., length(x)-1 does not influence the result. In other words, the output of your code consists of 10 samples, which do not contain y[10] (the last element of y). The tests of the presence of y[1:9] in the samples are performed in your code, but their results are later overwritten, so they do not influence the output. Is this, what you want?> I obtained the following results of an experiment; > > > kk > [1] FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE TRUE > > ff > [[1]] > [[1]][[1]] > [1] 47 > > [[1]]$finding > [1] FALSE > > > [[2]] > [[2]][[1]] > [1] 46 > > [[2]]$finding > [1] FALSE > > > [[3]] > [[3]][[1]] > [1] 52 > > [[3]]$finding > [1] FALSE > > It is easy to do when "y" contains different elements. "out[[i]] <- list(ff > <- (nn), finding=any (y.s==y[j]))" > > But, when y contains the same element, doing this process can be confusing > confusing.. > Because, (y <- c(1,1,1,0,0,1,0,1,0,0)) for y[j] when j= 1 there are some > other 1 in the y. Is there something special about the y to an j ?This question is unclear to me. There are some problems in your code, which i tried to explain repeatedly in the previous emails. Without clarifying these things, i am not able to provide any help. Petr Savicky.