Dan Bolser
2005-Jun-07 18:15 UTC
[R] Help with possible bug (assigning NA value to data.frame)?
This 'strange behaviour' manifest itself within some quite complex code. When I created a *very* simple example the behaviour dissapeared. Here is the simplest version I have found which still causes the strange behaviour (it could be quite unrelated to the boot library, however). library(boot) ## boot statistic function my.mean.s <- function(data,subset){ mean(data[subset]) } ## dummy data, deliberatly no variance my.test.dat.1 <- rep(4,5) my.test.dat.2 <- rep(8,5) ## not much can happen here my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 ) my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 ) ## returns a null object as ci is meaningless for this data my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal') my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal') ## now try to store this data (the problem begins)... ## dummy existing data a <- data.frame(matrix(c(1,2,3,4),nrow=2)) ## make space for new data a$X3 <- NA a$X4 <- NA ## try to store the upper and lower ci (not) calculated above a[a$X1==1,]$X3 <- my.test.boot.ci.1$normal[2] a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3] a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2] a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3] a What I see is> aX1 X2 X3 X4 1 1 3 NA 1 2 2 4 NA 2 What I expected to see was> aX1 X2 X3 X4 1 1 3 NA NA 2 2 4 NA NA Some how the last assignment of the data from within the null object assigns the value of the '==x' part of the logical vector subscript. If I make the following (trivial?) adjustment a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3] a[a$X1==1,]$X3 <- my.test.boot.ci.a$normal[2] a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3] a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2] The output changes to> aX1 X2 X3 X4 1 1 3 1 1 2 2 4 2 2 Which is even wronger. Not sure if this is usefull without the full context, but here is the output from the real version of this program (where most of the above code is within a loop). What is printed out for each cycle of the loop is the value of the '==x' part of the subscript. [1] 2 [1] 3 [1] 4 [1] 5 [1] "All values of t are equal to 1 \n Cannot calculate confidence intervals" [1] 6 [1] 7 [1] "All values of t are equal to 1 \n Cannot calculate confidence intervals" [1] 8 [1] 10 [1] 11 [1] "All values of t are equal to 1 \n Cannot calculate confidence intervals">Above you see that for some values I can't calculate a ci (but storing it as above), then...> dat.5.hoCHAINS DOM_PER_CHAIN lower upper 1 2 1.416539 1.3626253 1.468387 2 3 1.200000 1.1146014 1.288724 3 4 1.363636 1.2675657 1.462571 4 5 1.000000 NA 5.000000 5 6 1.323529 1.0991974 1.546156 6 7 1.000000 NA 7.000000 7 8 1.100000 0.9037904 1.289210 8 10 1.142857 0.8775104 1.403918 9 11 1.000000 NA 11.000000>Do you spot the same problem? Namely for each value of the 'CHAINS' column that was unable to calculate a ci, the second assignment to the data table from the 'null' object assigned the lookup value of CHAINS to that column instead! The assignment (within the loop) looks like this... dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2] dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3] (where chain is the 'loop variable'). As far as I can tell this is a bug. It dosn't happen when I try... dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- NA dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- NA And doing the following (swapping the order) changes the behaviour... dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3] dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2] Giving...> dat.5.hoCHAINS DOM_PER_CHAIN lower upper 1 2 1.416539 1.3616070 1.472716 2 3 1.200000 1.1134237 1.287601 3 4 1.363636 1.2587204 1.466037 4 5 1.000000 5.0000000 5.000000 5 6 1.323529 1.1082482 1.547222 6 7 1.000000 7.0000000 7.000000 7 8 1.100000 0.9021282 1.287672 8 10 1.142857 0.8766731 1.403327 9 11 1.000000 11.0000000 11.000000 Which is again incorrect and unpredicted (as above). Please let me know what to do to report this problem better, or if I just missed something silly. I am RH9, R-2.1.0 (compiled from source), latest boot from CRAN (if that makes a difference). Cheers, Dan.