thr3ads.net - R help - [R] indexing?? [Feb 2012]

If this information is useful, please help other people find it:
Share via:

helin_susam

2012-Feb-28 13:59 UTC

[R] indexing??

Hello All,

My algorithm as follows;
y <- c(1,1,1,0,0,1,0,1,0,0)
x <- c(1,0,0,1,1,0,0,1,1,0)

n <- length(x)

t <- matrix(cbind(y,x), ncol=2)

z = x+y

for(j in 1:length(x)) {
out <- vector("list", )

for(i in 1:10) {

t.s <- t[sample(n,n,replace=T),]

y.s <- t.s[,1]
x.s <- t.s[,2]

z.s <- y.s+x.s

out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j]))
kk <- sapply(out, function(x) {x$finding})
ff <- out[! kk]
}

I tried to find the total of the two vectors as statistic by using
bootstrap. Finally, I want to get the values which do not contain the y's
each elemet. In the algorithm ti is referred to "ff". But i get always
the
same result ;> ff
list()> kk [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Because, my "y" vector contains only 2 elements, and probably all of
the
bootstrap resamples  include "1", or all of resamples include
"0". So I can
not find the true matches. Can anyone help me about how to be?
Thanks.

--
View this message in context:
http://r.789695.n4.nabble.com/indexing-tp4428210p4428210.html
Sent from the R help mailing list archive at Nabble.com.

Petr PIKAL

2012-Feb-28 15:24 UTC

head link

[R] indexing??

Hi
> 
> My algorithm as follows;
> y <- c(1,1,1,0,0,1,0,1,0,0)
> x <- c(1,0,0,1,1,0,0,1,1,0)
> 
> n <- length(x)
> 
> t <- matrix(cbind(y,x), ncol=2)
Do not use t, it is a function for transposing matrix and after you 
redefine it you can get nasty surprise in future.

tt <- cbind(y,x)

is enough
> 
> z = x+y
> 
> for(j in 1:length(x)) {
> out <- vector("list", )
> 
> for(i in 1:10) {
> 
> t.s <- t[sample(n,n,replace=T),]
t.s <- tt[sample(n,n,replace=T),]
> 
> y.s <- t.s[,1]
> x.s <- t.s[,2]
> 
> z.s <- y.s+x.s
> 
> out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j]))
Here you compare vector y.s with one element of y as y.s is set of (0,1) 
values y is either 0 or 1, any tests if there is any match so only in rare 
case where all values in y.s are 0 and y[something] is 1 you get FALSE
> kk <- sapply(out, function(x) {x$finding})
finding is (almost) always TRUE therefore kk is TRUE
> ff <- out[! kk]
> }
> 
> I tried to find the total of the two vectors as statistic by using
> bootstrap. Finally, I want to get the values which do not contain the 
y's> each elemet. In the algorithm ti is referred to "ff". But i get
always
the> same result ;
I do not understand your intention so it is difficult to help. What is 
total of two vectors? sum?

What does it mean "to get values which do not contain y's each
element"?

Maybe you shall rethink your code and first try to evaluate each line 
separately to see what it does and if the result is same as you intended.

Regards
Petr

> > ff
> list()
> > kk
>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> Because, my "y" vector contains only 2 elements, and probably all
of the
> bootstrap resamples  include "1", or all of resamples include
"0". So I
can> not find the true matches. Can anyone help me about how to be?
> Thanks.
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/indexing-
> tp4428210p4428210.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

Petr Savicky

2012-Feb-28 15:33 UTC

head link

[R] indexing??

On Tue, Feb 28, 2012 at 05:59:24AM -0800, helin_susam
wrote:> Hello All,
> 
> My algorithm as follows;
> y <- c(1,1,1,0,0,1,0,1,0,0)
> x <- c(1,0,0,1,1,0,0,1,1,0)
> 
> n <- length(x)
> 
> t <- matrix(cbind(y,x), ncol=2)
> 
> z = x+y
> 
> for(j in 1:length(x)) {
> out <- vector("list", )
> 
> for(i in 1:10) {
> 
> t.s <- t[sample(n,n,replace=T),]
> 
> y.s <- t.s[,1]
> x.s <- t.s[,2]
> 
> z.s <- y.s+x.s
> 
> out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j]))
> kk <- sapply(out, function(x) {x$finding})
> ff <- out[! kk]
> }
> 
> I tried to find the total of the two vectors as statistic by using
> bootstrap. Finally, I want to get the values which do not contain the
y's
> each elemet. In the algorithm ti is referred to "ff". But i get
always the
> same result ;
> > ff
> list()
> > kk
>  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
> Because, my "y" vector contains only 2 elements, and probably all
of the
> bootstrap resamples  include "1", or all of resamples include
"0". So I can
> not find the true matches. Can anyone help me about how to be?
Hi.

First of all, there are some unclear points in your code.
In particular, i would expect "}" between the line

  out[[i]] <- list(...

and

  kk <- sapply(...

Moreover, i do not see, why the loop over j contains the
loop over i. I would expect these loops be disjoint,
since the loop over i collects all the samples to a list.

The following code is a modification, which i suggest
as an alternative.

  y <- c(1:5, 1:5)
  x <- c(1,0,0,1,1,0,0,1,1,0)
 
  n <- length(x)
 
  t <- matrix(cbind(y,x), ncol=2)
 
  z = x+y
 
  # generate 10 bootstrap samples and keep z.s, y.s
  out <- vector("list", 10)
  for(i in 1:10) {
    t.s <- t[sample(n,n,replace=T),]
    y.s <- t.s[,1]
    x.s <- t.s[,2]
    z.s <- y.s+x.s
    out[[i]] <- list(zz = z.s, yy =y.s)
  }

  # check, which replications do not contain y[j] in their y.s,
  # and take the OR of these conditions over j
  ff <- rep(FALSE, times=length(out))
  for(j in 1:length(y)) {
     kk <- sapply(out, function(x) {any(x$yy == y[j])})
     ff <- ff | (! kk)
  }
  out[ff]

With the original y <- c(1,1,1,0,0,1,0,1,0,0), the probability
that a bootstrap sample contains only 1's or only 0's is
2 * (1/2)^10, so i replaced the vector y with another, where
a missing value is more frequent. I obtained, for example

  [[1]]
  [[1]]$zz
   [1] 2 2 5 2 3 2 3 2 2 6
  
  [[1]]$yy
   [1] 1 1 5 1 3 2 3 2 1 5   # 4 is missing
  
  
  [[2]]
  [[2]]$zz
   [1] 5 5 5 5 3 5 2 5 6 4
  
  [[2]]$yy
   [1] 4 4 5 4 3 5 2 5 5 3  # 1 is missing
  
  
  [[3]]
  [[3]]$zz
   [1] 5 2 5 1 5 1 2 5 5 5
  
  [[3]]$yy
   [1] 4 2 5 1 5 1 1 4 5 4  # 3 is missing
 
Hope this helps.

Petr Savicky.

helin_susam

2012-Feb-28 16:50 UTC

head link

[R] indexing??

Dear Petr Pikal and Petr Savicky thank you for your replies..

If the y vector contains different elements my algorithm gives this result;
y <- c(1,2,3,4,5,6,7,8,9,10) 
x <- c(1,0,0,1,1,0,0,1,1,0) 

n <- length(x) 

t <- matrix(cbind(y,x), ncol=2) 

z = x+y 

for(j in 1:length(x)) { 
out <- vector("list", ) 

for(i in 1:10) { 

t.s <- t[sample(n,n,replace=T),] 

y.s <- t.s[,1] 
x.s <- t.s[,2] 

z.s <- y.s+x.s 

out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) 
kk <- sapply(out, function(x) {x$finding}) 
ff <- out[! kk] 
} 
}
> kk [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE
FALSE> ff[[1]]
[[1]][[1]]
 [1] 5 7 3 2 2 6 7 2 6 6

[[1]]$finding
[1] FALSE


[[2]]
[[2]][[1]]
 [1]  7 10  6  2  2  2  6  6  9  3

[[2]]$finding
[1] FALSE

Here, the two situations are FALSE, that is 5th and 10th bootstrap
re-samples do not contain one (or more) element(s) of original vector
("y").

How can I get the similar result when the y vector includes the only
response variable (1 or 0) ? That is
y <- c(1,1,1,0,0,1,0,1,0,0) 

Many thanks.

--
View this message in context:
http://r.789695.n4.nabble.com/indexing-tp4428210p4428746.html
Sent from the R help mailing list archive at Nabble.com.

Petr Savicky

2012-Feb-28 18:59 UTC

head link

[R] indexing??

On Tue, Feb 28, 2012 at 08:50:45AM -0800, helin_susam
wrote:> Dear Petr Pikal and Petr Savicky thank you for your replies..
> 
> If the y vector contains different elements my algorithm gives this result;
> y <- c(1,2,3,4,5,6,7,8,9,10) 
> x <- c(1,0,0,1,1,0,0,1,1,0) 
> 
> n <- length(x) 
> 
> t <- matrix(cbind(y,x), ncol=2) 
> 
> z = x+y 
> 
> for(j in 1:length(x)) { 
> out <- vector("list", ) 
> 
> for(i in 1:10) { 
> 
> t.s <- t[sample(n,n,replace=T),] 
> 
> y.s <- t.s[,1] 
> x.s <- t.s[,2] 
> 
> z.s <- y.s+x.s 
> 
> out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) 
> kk <- sapply(out, function(x) {x$finding}) 
> ff <- out[! kk] 
> } 
> }
Hi.

It is hard to debug a code, which we do not understand.
Both me and Petr Pikal expressed objections against your
code. It would help us to reply your question, if you take
our objections and suggestions into account or explain,
what we do not understand well.

Can you comment on the suggestions from the previous emails?

I would like to add one more. Why do you use

  ff <- (z.s)

inside

  out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j]))

?

This expression includes the value into the list, but not
under the name ff and rewrites the global variable ff instead.

If you want to include (z.s) as a component named as ff, then use 

  list(ff = (z.s), finding...

Petr Savicky.

Petr Savicky

2012-Feb-28 19:24 UTC

head link

[R] indexing??

On Tue, Feb 28, 2012 at 08:50:45AM -0800, helin_susam
wrote:> Dear Petr Pikal and Petr Savicky thank you for your replies..
> 
> If the y vector contains different elements my algorithm gives this result;
> y <- c(1,2,3,4,5,6,7,8,9,10) 
> x <- c(1,0,0,1,1,0,0,1,1,0) 
> 
> n <- length(x) 
> 
> t <- matrix(cbind(y,x), ncol=2) 
> 
> z = x+y 
> 
> for(j in 1:length(x)) { 
> out <- vector("list", ) 
> 
> for(i in 1:10) { 
> 
> t.s <- t[sample(n,n,replace=T),] 
> 
> y.s <- t.s[,1] 
> x.s <- t.s[,2] 
> 
> z.s <- y.s+x.s 
> 
> out[[i]] <- list(ff <- (z.s), finding=any (y.s==y[j])) 
> kk <- sapply(out, function(x) {x$finding}) 
> ff <- out[! kk] 
> } 
> }
> 
> > kk
>  [1]  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE
> > ff
> [[1]]
> [[1]][[1]]
>  [1] 5 7 3 2 2 6 7 2 6 6
> 
> [[1]]$finding
> [1] FALSE
> 
> 
> [[2]]
> [[2]][[1]]
>  [1]  7 10  6  2  2  2  6  6  9  3
> 
> [[2]]$finding
> [1] FALSE
> 
> Here, the two situations are FALSE, that is 5th and 10th bootstrap
> re-samples do not contain one (or more) element(s) of original vector
("y").
Hi.

Your code generates a new list "out" for each j. This means
that you generate a list "out", test the presence of y[1]
in its components, then delete "out", replace it by a new
list and test the presence of y[2] in this new list, then
"out" is deleted and replaced by another "out", etc.

This is probably not, what you want. Is this correct?

Petr Savicky.

helin_susam

2012-Feb-28 19:42 UTC

head link

[R] indexing??

Dear Petr Savicky,

Actually, this is based on jackknife after bootstrap algorithm. In summary,

I have a data set, and I want to compute some values by using this
algorithm.

Firstly, using bootstrap, I create some bootstrap re-samples. This step O.K.
Then, for each data point within these re-samples, I want to get a subset
which do not contain that data point ( this point would be any point of the
original data set), in general, if B is the number of bootstrap-resamples,
there are B/e resamples obtained for each data point.  And finally, I want
to calculate some values for each of this re samples.

Explanation of my algorithm;

#My data set: (x and y)
y <- c(1,2,3,4,5,6,7,8,9,10)
x <- c(1,0,0,1,1,0,0,1,1,0)

n <- length(x)

t <- matrix(cbind(y,x), ncol=2)

z = x+y

for(j in 1:length(x)) {
out <- vector("list", )

for(i in 1:10) {

t.s <- t[sample(n,n,replace=T),] # Here is the bootstrap step

y.s <- t.s[,1]
x.s <- t.s[,2]

z.s <- y.s+x.s
nn <- sum (z.s)  # For example, I want to calculate this value

out[[i]] <- list(ff <- (nn), finding=any (y.s==y[j])) # I get the
mentioned
subset in here
kk <- sapply(out, function(x) {x$finding})
ff <- out[! kk]
}
}

I obtained the following results of an experiment;
> kk [1] FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE 
TRUE> ff[[1]]
[[1]][[1]]
[1] 47

[[1]]$finding
[1] FALSE


[[2]]
[[2]][[1]]
[1] 46

[[2]]$finding
[1] FALSE


[[3]]
[[3]][[1]]
[1] 52

[[3]]$finding
[1] FALSE

It is easy to do when "y" contains different elements.  "out[[i]]
<- list(ff
<- (nn), finding=any (y.s==y[j]))"

But, when y contains the same element, doing this process can be confusing
confusing..
Because, (y <- c(1,1,1,0,0,1,0,1,0,0)) for y[j] when j= 1 there are some
other 1 in the y.  Is there something special about the y to an j ? 
Thanks

--
View this message in context:
http://r.789695.n4.nabble.com/indexing-tp4428210p4429280.html
Sent from the R help mailing list archive at Nabble.com.

Petr Savicky

2012-Feb-29 08:40 UTC

head link

[R] indexing??

On Tue, Feb 28, 2012 at 11:42:32AM -0800, helin_susam
wrote:> Dear Petr Savicky,
> 
> Actually, this is based on jackknife after bootstrap algorithm. In summary,
> 
> I have a data set, and I want to compute some values by using this
> algorithm.
> 
> Firstly, using bootstrap, I create some bootstrap re-samples. This step
O.K.
> Then, for each data point within these re-samples, I want to get a subset
The point y[j], which you are searching in the generated samples, is
not from "these re-samples", but from the original data set.
> which do not contain that data point ( this point would be any point of the
> original data set), in general, if B is the number of bootstrap-resamples,
> there are B/e resamples obtained for each data point.
Your previous explanations were more accurate in this point and
implied that you want to take all resamples, which miss at least
one of y[j].
>  And finally, I want
> to calculate some values for each of this re samples.
> Explanation of my algorithm;
> 
> #My data set: (x and y)
> y <- c(1,2,3,4,5,6,7,8,9,10)
> x <- c(1,0,0,1,1,0,0,1,1,0)
> 
> n <- length(x)
> 
> t <- matrix(cbind(y,x), ncol=2)
> 
> z = x+y
> 
> for(j in 1:length(x)) {
> out <- vector("list", )
> 
> for(i in 1:10) {
> 
> t.s <- t[sample(n,n,replace=T),] # Here is the bootstrap step
> 
> y.s <- t.s[,1]
> x.s <- t.s[,2]
> 
> z.s <- y.s+x.s
> nn <- sum (z.s)  # For example, I want to calculate this value
> 
> out[[i]] <- list(ff <- (nn), finding=any (y.s==y[j])) # I get the
mentioned
> subset in here
> kk <- sapply(out, function(x) {x$finding})
> ff <- out[! kk]
> }
> }
You did not reply to the question concerning regenerating "out"
for each "j" and using "<-" inside a list. This makes a
discussion
complicated.

The following code is equivalent to your code.

  y <- c(1,2,3,4,5,6,7,8,9,10)
  x <- c(1,0,0,1,1,0,0,1,1,0)
  n <- length(x)
  tt <- unname(cbind(y,x)) # do not overwrite function t()
  z <- x+y
 
  # needed only to shift the sequence of random numbers
  for (j in 1:(10*(n-1))) sample(n,n,replace=T)
 
  j <- length(x)
  out <- vector("list")
  for(i in 1:10) {
      tt.s <- tt[sample(n,n,replace=T),] # Here is the bootstrap step
 
      y.s <- tt.s[,1]
      x.s <- tt.s[,2]
 
      z.s <- y.s+x.s
      nn <- sum(z.s)  # For example, I want to calculate this value
 
      out[[i]] <- list((nn), finding=any(y.s==y[j])) # I get the mentioned
subset in here
  }
  kk <- sapply(out, function(x) {x$finding})
  ff <- out[! kk]

You can check the equivalence by running both codes with the same
command set.seed(seed) at the beginning. I tried this and the
obtained "ff" were identical for several different values of
"seed".

What can be seen is that the output depends only on the run of the
loop for j with the value j = length(x). Searching the values y[j]
for j = 1, ..., length(x)-1 does not influence the result.

In other words, the output of your code consists of 10 samples,
which do not contain y[10] (the last element of y). The tests of
the presence of y[1:9] in the samples are performed in your code,
but their results are later overwritten, so they do not influence
the output.

Is this, what you want?
> I obtained the following results of an experiment;
> 
> > kk
>  [1] FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE
> > ff
> [[1]]
> [[1]][[1]]
> [1] 47
> 
> [[1]]$finding
> [1] FALSE
> 
> 
> [[2]]
> [[2]][[1]]
> [1] 46
> 
> [[2]]$finding
> [1] FALSE
> 
> 
> [[3]]
> [[3]][[1]]
> [1] 52
> 
> [[3]]$finding
> [1] FALSE
> 
> It is easy to do when "y" contains different elements. 
"out[[i]] <- list(ff
> <- (nn), finding=any (y.s==y[j]))"
> 
> But, when y contains the same element, doing this process can be confusing
> confusing..
> Because, (y <- c(1,1,1,0,0,1,0,1,0,0)) for y[j] when j= 1 there are some
> other 1 in the y.  Is there something special about the y to an j ? 
This question is unclear to me.

There are some problems in your code, which i tried to explain repeatedly
in the previous emails. Without clarifying these things, i am not able
to provide any help.

Petr Savicky.

Seemingly Similar Threads

Search for more maybe matching threads

R help - Feb 2012 - indexing??

[R] indexing??

[R] indexing??

[R] indexing??

[R] indexing??

[R] indexing??

[R] indexing??

[R] indexing??

[R] indexing??

Seemingly Similar Threads