smckinney at bccrc.ca
2009-May-26 22:50 UTC
[Rd] Bug in "$<-.data.frame" yields corrupt data frame (PR#13724)
Full_Name: Steven McKinney Version: 2.9.0 OS: Mac OS X 10.5.6 Submission from: (NULL) (142.103.207.10) A corrupt data frame can be constructed as follows: foo <- matrix(1:12, nrow = 3) bar <- data.frame(foo) bar$NewCol <- foo[foo[, 1] == 4, 4] bar lapply(bar, length)> foo <- matrix(1:12, nrow = 3) > bar <- data.frame(foo) > bar$NewCol <- foo[foo[, 1] == 4, 4] barX1 X2 X3 X4 NewCol 1 1 4 7 10 <NA> 2 2 5 8 11 <NA> 3 3 6 9 12 <NA> Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs> lapply(bar, length)$X1 [1] 3 $X2 [1] 3 $X3 [1] 3 $X4 [1] 3 $NewCol [1] 0 The data.frame method is> getAnywhere("$<-.data.frame" )A single object matching '$<-.data.frame' was found It was found in the following places package:base registered S3 method for $<- from namespace base namespace:base with value function (x, i, value) { cl <- oldClass(x) class(x) <- NULL nrows <- .row_names_info(x, 2L) if (!is.null(value)) { N <- NROW(value) if (N > nrows) stop(gettextf("replacement has %d rows, data has %d", N, nrows), domain = NA) if (N < nrows && N > 0L) if (nrows%%N == 0L && length(dim(value)) <= 1L) value <- rep(value, length.out = nrows) else stop(gettextf("replacement has %d rows, data has %d", N, nrows), domain = NA) if (is.atomic(value)) names(value) <- NULL } x[[i]] <- value class(x) <- cl return(x) }<environment: namespace:base>>I placed a browser() command before return(x) and did some poking around. The issue is that the example above creates an object with N < nrows but N == 0L, so either an else clause to check for this condition is needed, or, it appears to me, the N > 0L part of the conditional clause needs to be moved to the next if clause. I modified the rows if (N < nrows && N > 0L) if (nrows%%N == 0L && length(dim(value)) <= 1L) to read if (N < nrows) if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L) as in "$<-.data.frame" <- function (x, i, value) { cl <- oldClass(x) class(x) <- NULL nrows <- .row_names_info(x, 2L) if (!is.null(value)) { N <- NROW(value) if (N > nrows) stop(gettextf("replacement has %d rows, data has %d", N, nrows), domain = NA) if (N < nrows) if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L) value <- rep(value, length.out = nrows) else stop(gettextf("replacement has %d rows, data has %d", N, nrows), domain = NA) if (is.atomic(value)) names(value) <- NULL } x[[i]] <- value class(x) <- cl return(x) } Now it detects the problem above:> foo <- matrix(1:12, nrow = 3) > bar <- data.frame(foo) > bar$NewCol <- foo[foo[, 1] == 4, 4]Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) : replacement has 0 rows, data has 3 It doesn't appear to stumble on weird data frames (these from the ?data.frame help page)> L3 <- LETTERS[1:3] > (d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10,replace=TRUE)))> (d0 <- d[, FALSE]) # NULL data frame with 10 rows> (d.0 <- d[FALSE, ]) # <0 rows> data frame (3 cols)> (d00 <- d0[FALSE,]) # NULL data frame with 0 rows> d0$NewCol <- foo[foo[, 1] == 4, 4]Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) : replacement has 0 rows, data has 10 ### Catches this problem above alright.> d.0$NewCol <- foo[foo[, 1] == 4, 4] > d.0[1] x y fac NewCol <0 rows> (or 0-length row.names) ### Lets the above one through alright.> d00$NewCol <- foo[foo[, 1] == 4, 4] > > d00[1] NewCol <0 rows> (or 0-length row.names) ### Lets the above one through alright. Would the above modification work to fix this problem?> sessionInfo()R version 2.9.0 (2009-04-17) powerpc-apple-darwin8.11.1 locale: en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] nlme_3.1-90 loaded via a namespace (and not attached): [1] grid_2.9.0 lattice_0.17-22 tools_2.9.0 Also occurs on Windows box with R 2.8.1 Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada
Prof Brian Ripley
2009-May-28 06:50 UTC
[Rd] Bug in "$<-.data.frame" yields corrupt data frame (PR#13724)
> Would the above modification work to fix this problem?Yes thank you, and I've incorporated it in R-patched and R-devel. It does catch 3 packages, DescribeDisplay, rgcvpack and BioC:rHVDM. On Wed, 27 May 2009, smckinney at bccrc.ca wrote:> Full_Name: Steven McKinney > Version: 2.9.0 > OS: Mac OS X 10.5.6 > Submission from: (NULL) (142.103.207.10) > > > > A corrupt data frame can be constructed as follows: > foo <- matrix(1:12, nrow = 3) > bar <- data.frame(foo) > bar$NewCol <- foo[foo[, 1] == 4, 4] > bar > lapply(bar, length) > > > > >> foo <- matrix(1:12, nrow = 3) >> bar <- data.frame(foo) >> bar$NewCol <- foo[foo[, 1] == 4, 4] bar > X1 X2 X3 X4 NewCol > 1 1 4 7 10 <NA> > 2 2 5 8 11 <NA> > 3 3 6 9 12 <NA> > Warning message: > In format.data.frame(x, digits = digits, na.encode = FALSE) : > corrupt data frame: columns will be truncated or padded with NAs >> lapply(bar, length) > $X1 > [1] 3 > > $X2 > [1] 3 > > $X3 > [1] 3 > > $X4 > [1] 3 > > $NewCol > [1] 0 > > > The data.frame method is > >> getAnywhere("$<-.data.frame" ) > A single object matching '$<-.data.frame' was found It was found in the > following places > package:base > registered S3 method for $<- from namespace base > namespace:base > with value > > function (x, i, value) > { > cl <- oldClass(x) > class(x) <- NULL > nrows <- .row_names_info(x, 2L) > if (!is.null(value)) { > N <- NROW(value) > if (N > nrows) > stop(gettextf("replacement has %d rows, data has %d", > N, nrows), domain = NA) > if (N < nrows && N > 0L) > if (nrows%%N == 0L && length(dim(value)) <= 1L) > value <- rep(value, length.out = nrows) > else stop(gettextf("replacement has %d rows, data has %d", > N, nrows), domain = NA) > if (is.atomic(value)) > names(value) <- NULL > } > x[[i]] <- value > class(x) <- cl > return(x) > }<environment: namespace:base> >> > > > I placed a browser() command before return(x) and did some poking > around. The issue is that the example above creates an object with > N < nrows but N == 0L, so either an else clause to check for this > condition is needed, or, it appears to me, the N > 0L part of the > conditional clause needs to be moved to the next if clause. > > I modified the rows > if (N < nrows && N > 0L) > if (nrows%%N == 0L && length(dim(value)) <= 1L) > to read > if (N < nrows) > if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L) > > as in > > "$<-.data.frame" <- > function (x, i, value) > { > cl <- oldClass(x) > class(x) <- NULL > nrows <- .row_names_info(x, 2L) > if (!is.null(value)) { > N <- NROW(value) > if (N > nrows) > stop(gettextf("replacement has %d rows, data has %d", > N, nrows), domain = NA) > if (N < nrows) > if (N > 0L && nrows%%N == 0L && length(dim(value)) <= 1L) > value <- rep(value, length.out = nrows) > else stop(gettextf("replacement has %d rows, data has %d", > N, nrows), domain = NA) > if (is.atomic(value)) > names(value) <- NULL > } > x[[i]] <- value > class(x) <- cl > return(x) > } > > Now it detects the problem above: > >> foo <- matrix(1:12, nrow = 3) >> bar <- data.frame(foo) >> bar$NewCol <- foo[foo[, 1] == 4, 4] > Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) : > replacement has 0 rows, data has 3 > > It doesn't appear to stumble on weird data frames (these from the > ?data.frame help page) > > >> L3 <- LETTERS[1:3] >> (d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, > replace=TRUE))) >> (d0 <- d[, FALSE]) # NULL data frame with 10 rows > >> (d.0 <- d[FALSE, ]) # <0 rows> data frame (3 cols) > >> (d00 <- d0[FALSE,]) # NULL data frame with 0 rows > >> d0$NewCol <- foo[foo[, 1] == 4, 4] > Error in `$<-.data.frame`(`*tmp*`, "NewCol", value = integer(0)) : > replacement has 0 rows, data has 10 > > ### Catches this problem above alright. > >> d.0$NewCol <- foo[foo[, 1] == 4, 4] >> d.0 > [1] x y fac NewCol > <0 rows> (or 0-length row.names) > > ### Lets the above one through alright. > >> d00$NewCol <- foo[foo[, 1] == 4, 4] >> >> d00 > [1] NewCol > <0 rows> (or 0-length row.names) > ### Lets the above one through alright. > > > Would the above modification work to fix this problem? > > > > > > >> sessionInfo() > R version 2.9.0 (2009-04-17) > powerpc-apple-darwin8.11.1 > > locale: > en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] nlme_3.1-90 > > loaded via a namespace (and not attached): > [1] grid_2.9.0 lattice_0.17-22 tools_2.9.0 > > > Also occurs on Windows box with R 2.8.1 > > > > Steven McKinney > > Statistician > Molecular Oncology and Breast Cancer Program British Columbia Cancer > Research Centre > > email: smckinney +at+ bccrc +dot+ ca > > tel: 604-675-8000 x7561 > > BCCRC > Molecular Oncology > 675 West 10th Ave, Floor 4 > Vancouver B.C. > V5Z 1L3 > Canada > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595