Liaw, Andy
2005-Jun-07 19:15 UTC
[R] Help with possible bug (assigning NA value to data.frame) ?
There's something peculiar that I do not understand here. However, did you realize that the thing you are assigning into parts of `a' is NULL? Check you're my.test.boot.ci.1: It's NULL. Be that as it may, I get:> a <- data.frame(matrix(1:4, nrow=2), X3=NA, X4=NA) > aX1 X2 X3 X4 1 1 3 NA NA 2 2 4 NA NA> a[a$X1 == 1,]$X3 <- NULL > aX1 X2 X3 X4 1 1 3 NA 1 2 2 4 NA NA> a[a$X1 == 1,]$X4 <- NULL > aX1 X2 X3 X4 1 1 3 NA 1 2 2 4 NA NA which really baffles me... In any case, that's not how I would assign into part of a data frame. I would do either a[a$X1 == 1, "X3"] <- something or a$X3[a$X1 == 1] <- something In either case you'd get an error if `something' is NULL. Andy> From: Dan Bolser > > > This 'strange behaviour' manifest itself within some quite complex > code. When I created a *very* simple example the behaviour > dissapeared. > > Here is the simplest version I have found which still causes > the strange > behaviour (it could be quite unrelated to the boot library, however). > > > library(boot) > > ## boot statistic function > my.mean.s <- function(data,subset){ > mean(data[subset]) > } > > ## dummy data, deliberatly no variance > my.test.dat.1 <- rep(4,5) > my.test.dat.2 <- rep(8,5) > > ## not much can happen here > my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 ) > my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 ) > > ## returns a null object as ci is meaningless for this data > my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal') > my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal') > > > ## now try to store this data (the problem begins)... > > ## dummy existing data > a <- data.frame(matrix(c(1,2,3,4),nrow=2)) > > ## make space for new data > a$X3 <- NA > a$X4 <- NA > > ## try to store the upper and lower ci (not) calculated above > a[a$X1==1,]$X3 <- my.test.boot.ci.1$normal[2] > a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3] > a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2] > a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3] > > a > > > What I see is > > > a > X1 X2 X3 X4 > 1 1 3 NA 1 > 2 2 4 NA 2 > > > What I expected to see was > > > a > X1 X2 X3 X4 > 1 1 3 NA NA > 2 2 4 NA NA > > Some how the last assignment of the data from within the null object > assigns the value of the '==x' part of the logical vector subscript. > > If I make the following (trivial?) adjustment > > a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3] > a[a$X1==1,]$X3 <- my.test.boot.ci.a$normal[2] > a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3] > a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2] > > > The output changes to > > > a > X1 X2 X3 X4 > 1 1 3 1 1 > 2 2 4 2 2 > > Which is even wronger. > > > > Not sure if this is usefull without the full context, but here is the > output from the real version of this program (where most of > the above code > is within a loop). What is printed out for each cycle of the > loop is the > value of the '==x' part of the subscript. > > > [1] 2 > [1] 3 > [1] 4 > [1] 5 > [1] "All values of t are equal to 1 \n Cannot calculate confidence > intervals" > [1] 6 > [1] 7 > [1] "All values of t are equal to 1 \n Cannot calculate confidence > intervals" > [1] 8 > [1] 10 > [1] 11 > [1] "All values of t are equal to 1 \n Cannot calculate confidence > intervals" > > > > > Above you see that for some values I can't calculate a ci > (but storing it > as above), then... > > > dat.5.ho > CHAINS DOM_PER_CHAIN lower upper > 1 2 1.416539 1.3626253 1.468387 > 2 3 1.200000 1.1146014 1.288724 > 3 4 1.363636 1.2675657 1.462571 > 4 5 1.000000 NA 5.000000 > 5 6 1.323529 1.0991974 1.546156 > 6 7 1.000000 NA 7.000000 > 7 8 1.100000 0.9037904 1.289210 > 8 10 1.142857 0.8775104 1.403918 > 9 11 1.000000 NA 11.000000 > > > > > Do you spot the same problem? Namely for each value of the > 'CHAINS' column > that was unable to calculate a ci, the second assignment to > the data table > from the 'null' object assigned the lookup value of CHAINS to > that column > instead! The assignment (within the loop) looks like this... > > dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2] > dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3] > > (where chain is the 'loop variable'). > > > As far as I can tell this is a bug. It dosn't happen when I try... > > dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- NA > dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- NA > > > And doing the following (swapping the order) changes the behaviour... > > dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3] > dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2] > > > Giving... > > > dat.5.ho > CHAINS DOM_PER_CHAIN lower upper > 1 2 1.416539 1.3616070 1.472716 > 2 3 1.200000 1.1134237 1.287601 > 3 4 1.363636 1.2587204 1.466037 > 4 5 1.000000 5.0000000 5.000000 > 5 6 1.323529 1.1082482 1.547222 > 6 7 1.000000 7.0000000 7.000000 > 7 8 1.100000 0.9021282 1.287672 > 8 10 1.142857 0.8766731 1.403327 > 9 11 1.000000 11.0000000 11.000000 > > > Which is again incorrect and unpredicted (as above). > > > Please let me know what to do to report this problem better, > or if I just > missed something silly. > > I am RH9, R-2.1.0 (compiled from source), latest boot from > CRAN (if that > makes a difference). > > Cheers, > Dan. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
James Reilly
2005-Jun-07 23:36 UTC
[R] Help with possible bug (assigning NA value to data.frame) ?
This seems to have more to do with NULLs than NAs. For instance:> a <- data.frame(matrix(1:8, nrow=2)) > aX1 X2 X3 X4 1 1 3 5 7 2 2 4 6 8> a[a$X2 == 4,]$X1 <- NULL > aX1 X2 X3 X4 1 1 3 5 7 2 4 6 8 4 James On 8/06/2005 7:15 a.m., Liaw, Andy wrote:> There's something peculiar that I do not understand here. However, did you > realize that the thing you are assigning into parts of `a' is NULL? Check > you're my.test.boot.ci.1: It's NULL. > > Be that as it may, I get: > > >>a <- data.frame(matrix(1:4, nrow=2), X3=NA, X4=NA) >>a > > X1 X2 X3 X4 > 1 1 3 NA NA > 2 2 4 NA NA > >>a[a$X1 == 1,]$X3 <- NULL >>a > > X1 X2 X3 X4 > 1 1 3 NA 1 > 2 2 4 NA NA > >>a[a$X1 == 1,]$X4 <- NULL >>a > > X1 X2 X3 X4 > 1 1 3 NA 1 > 2 2 4 NA NA > > which really baffles me... > > In any case, that's not how I would assign into part of a data frame. I > would do either > > a[a$X1 == 1, "X3"] <- something > > or > > a$X3[a$X1 == 1] <- something > > In either case you'd get an error if `something' is NULL. > > Andy > > >>From: Dan Bolser >> >> >>This 'strange behaviour' manifest itself within some quite complex >>code. When I created a *very* simple example the behaviour >>dissapeared. >> >>Here is the simplest version I have found which still causes >>the strange >>behaviour (it could be quite unrelated to the boot library, however). >> >> >>library(boot) >> >>## boot statistic function >>my.mean.s <- function(data,subset){ >> mean(data[subset]) >>} >> >>## dummy data, deliberatly no variance >>my.test.dat.1 <- rep(4,5) >>my.test.dat.2 <- rep(8,5) >> >>## not much can happen here >>my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 ) >>my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 ) >> >>## returns a null object as ci is meaningless for this data >>my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal') >>my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal') >> >> >>## now try to store this data (the problem begins)... >> >>## dummy existing data >>a <- data.frame(matrix(c(1,2,3,4),nrow=2)) >> >>## make space for new data >>a$X3 <- NA >>a$X4 <- NA >> >>## try to store the upper and lower ci (not) calculated above >>a[a$X1==1,]$X3 <- my.test.boot.ci.1$normal[2] >>a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3] >>a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2] >>a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3] >> >>a >> >> >>What I see is >> >> >>>a >> >> X1 X2 X3 X4 >>1 1 3 NA 1 >>2 2 4 NA 2 >> >> >>What I expected to see was >> >> >>>a >> >> X1 X2 X3 X4 >>1 1 3 NA NA >>2 2 4 NA NA >> >>Some how the last assignment of the data from within the null object >>assigns the value of the '==x' part of the logical vector subscript. >> >>If I make the following (trivial?) adjustment >> >>a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3] >>a[a$X1==1,]$X3 <- my.test.boot.ci.a$normal[2] >>a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3] >>a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2] >> >> >>The output changes to >> >> >>>a >> >> X1 X2 X3 X4 >>1 1 3 1 1 >>2 2 4 2 2 >> >>Which is even wronger. >> >> >> >>Not sure if this is usefull without the full context, but here is the >>output from the real version of this program (where most of >>the above code >>is within a loop). What is printed out for each cycle of the >>loop is the >>value of the '==x' part of the subscript. >> >> >>[1] 2 >>[1] 3 >>[1] 4 >>[1] 5 >>[1] "All values of t are equal to 1 \n Cannot calculate confidence >>intervals" >>[1] 6 >>[1] 7 >>[1] "All values of t are equal to 1 \n Cannot calculate confidence >>intervals" >>[1] 8 >>[1] 10 >>[1] 11 >>[1] "All values of t are equal to 1 \n Cannot calculate confidence >>intervals" >> >> >>Above you see that for some values I can't calculate a ci >>(but storing it >>as above), then... >> >> >>>dat.5.ho >> >> CHAINS DOM_PER_CHAIN lower upper >>1 2 1.416539 1.3626253 1.468387 >>2 3 1.200000 1.1146014 1.288724 >>3 4 1.363636 1.2675657 1.462571 >>4 5 1.000000 NA 5.000000 >>5 6 1.323529 1.0991974 1.546156 >>6 7 1.000000 NA 7.000000 >>7 8 1.100000 0.9037904 1.289210 >>8 10 1.142857 0.8775104 1.403918 >>9 11 1.000000 NA 11.000000 >> >> >>Do you spot the same problem? Namely for each value of the >>'CHAINS' column >>that was unable to calculate a ci, the second assignment to >>the data table >>from the 'null' object assigned the lookup value of CHAINS to >>that column >>instead! The assignment (within the loop) looks like this... >> >> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2] >> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3] >> >>(where chain is the 'loop variable'). >> >> >>As far as I can tell this is a bug. It dosn't happen when I try... >> >> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- NA >> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- NA >> >> >>And doing the following (swapping the order) changes the behaviour... >> >> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3] >> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2] >> >> >>Giving... >> >> >>>dat.5.ho >> >> CHAINS DOM_PER_CHAIN lower upper >>1 2 1.416539 1.3616070 1.472716 >>2 3 1.200000 1.1134237 1.287601 >>3 4 1.363636 1.2587204 1.466037 >>4 5 1.000000 5.0000000 5.000000 >>5 6 1.323529 1.1082482 1.547222 >>6 7 1.000000 7.0000000 7.000000 >>7 8 1.100000 0.9021282 1.287672 >>8 10 1.142857 0.8766731 1.403327 >>9 11 1.000000 11.0000000 11.000000 >> >> >>Which is again incorrect and unpredicted (as above). >> >> >>Please let me know what to do to report this problem better, >>or if I just >>missed something silly. >> >>I am RH9, R-2.1.0 (compiled from source), latest boot from >>CRAN (if that >>makes a difference). >> >>Cheers, >>Dan. >> >>______________________________________________ >>R-help at stat.math.ethz.ch mailing list >>https://stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide! >>http://www.R-project.org/posting-guide.html >> >> >> > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html-- James Reilly Department of Statistics, University of Auckland Private Bag 92019, Auckland, New Zealand