Jenny Bryan
2016-Jan-12 17:15 UTC
[R] printing a data.frame that contains a list-column of S4 objects
Is there a general problem with printing a data.frame when it has a list-column of S4 objects? Or am I just unlucky in my life choices? I ran across this with objects from the git2r package but maintainer Stefan Widgren points out this example below from Matrix as well. I note that the offending object can be printed if sent through dplyr::tbl_df(). I accept that that printing doesn't provide much info on S4 objects. I'd just like those vars to not prevent data.frame-style inpsection of the entire object. I asked this on stack overflow, where commenter provided the lead to the workaround below. Is that the best solution? library(Matrix) m <- new("dgCMatrix") isS4(m) #> [1] TRUE df <- data.frame(id = 1:2) df$matrices <- list(m, m) df #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic ## fairly costly workaround df2 <- df df2[] <- lapply(df2, as.character) df2 #> id matrices #> 1 1 <S4 object of class "dgCMatrix"> #> 2 2 <S4 object of class "dgCMatrix"> ## dplyr handles original object better but not as well as workaround library(dplyr) ## use select to force dplyr to show the tricky column tbl_df(select(df, matrices)) #> Source: local data frame [2 x 1] #> #> matrices #> (list) #> 1 <S4:dgCMatrix, CsparseMatrix, dsparseMatrix, generalMatrix, dCsparseMatrix, #> 2 <S4:dgCMatrix, CsparseMatrix, dsparseMatrix, generalMatrix, dCsparseMatrix, Thanks, Jenny Jennifer Bryan Associate Professor Department of Statistics and the Michael Smith Laboratories University of British Columbia Vancouver, BC Canada
boB Rudis
2016-Jan-12 18:51 UTC
[R] printing a data.frame that contains a list-column of S4 objects
I wonder if something like: format.list <- function(x, ...) { rep(class(x[[1]]), length(x)) } would be sufficient? (prbly needs more 'if's though) On Tue, Jan 12, 2016 at 12:15 PM, Jenny Bryan <jenny at stat.ubc.ca> wrote:> Is there a general problem with printing a data.frame when it has a > list-column of S4 objects? Or am I just unlucky in my life choices? > > I ran across this with objects from the git2r package but maintainer > Stefan Widgren points out this example below from Matrix as well. I note > that the offending object can be printed if sent through > dplyr::tbl_df(). I accept that that printing doesn't provide much info > on S4 objects. I'd just like those vars to not prevent data.frame-style > inpsection of the entire object. > > I asked this on stack overflow, where commenter provided the lead to the > workaround below. Is that the best solution? > > library(Matrix) > > m <- new("dgCMatrix") > isS4(m) > #> [1] TRUE > df <- data.frame(id = 1:2) > df$matrices <- list(m, m) > df > #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic > #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic > > ## fairly costly workaround > df2 <- df > df2[] <- lapply(df2, as.character) > df2 > #> id matrices > #> 1 1 <S4 object of class "dgCMatrix"> > #> 2 2 <S4 object of class "dgCMatrix"> > > ## dplyr handles original object better but not as well as workaround > library(dplyr) > ## use select to force dplyr to show the tricky column > tbl_df(select(df, matrices)) > #> Source: local data frame [2 x 1] > #> > #> matrices > #> (list) > #> 1 <S4:dgCMatrix, CsparseMatrix, dsparseMatrix, generalMatrix, dCsparseMatrix, > #> 2 <S4:dgCMatrix, CsparseMatrix, dsparseMatrix, generalMatrix, dCsparseMatrix, > > Thanks, > Jenny > > Jennifer Bryan > Associate Professor > Department of Statistics and > the Michael Smith Laboratories > University of British Columbia > Vancouver, BC Canada > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Martin Maechler
2016-Jan-14 08:34 UTC
[R] printing a data.frame that contains a list-column of S4 objects
>>>>> boB Rudis <bob at rudis.net> >>>>> on Tue, 12 Jan 2016 13:51:50 -0500 writes:> I wonder if something like: > format.list <- function(x, ...) { > rep(class(x[[1]]), length(x)) > } > would be sufficient? (prbly needs more 'if's though) Dear Jenny, for a different perspective (and a lot of musings), see inline below > On Tue, Jan 12, 2016 at 12:15 PM, Jenny Bryan <jenny at stat.ubc.ca> wrote: >> Is there a general problem with printing a data.frame when it has a >> list-column of S4 objects? Or am I just unlucky in my life choices? >> >> I ran across this with objects from the git2r package but maintainer >> Stefan Widgren points out this example below from Matrix as well. I note >> that the offending object can be printed if sent through >> dplyr::tbl_df(). I accept that that printing doesn't provide much info >> on S4 objects. I'd just like those vars to not prevent data.frame-style >> inpsection of the entire object. >> >> I asked this on stack overflow, where commenter provided the lead to the >> workaround below. Is that the best solution? >> >> library(Matrix) >> >> m <- new("dgCMatrix") >> isS4(m) >> #> [1] TRUE >> df <- data.frame(id = 1:2) >> df$matrices <- list(m, m) This only works by accident (I think), and fails for df <- data.frame(id = 1) df$matrices <- list(m, m) > df <- data.frame(id = 1) > df$matrices <- list(m, m) Error in `$<-.data.frame`(`*tmp*`, "matrices", value = list(<S4 object of class "dgCMatrix">, : replacement has 2 rows, data has 1 > >> df >> #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic >> #> Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, : first argument must be atomic Hmm, As 'data.frame' is just an S3 class there is no formal definition to go with and in this sense you are of course entitled to all expectations. ;-) Even though data frames are internally coded as lists, I strongly believe data frames should be taught as (and thought of) "generalized matrices" in the sense that data frames should be thought of n (say) rows and p (say) columns. The help pages for data.frame() and as.data.frame() should make it clear that you can *not* put all kinds of entries into data frame columns, but I agree the documentation is vague and probably has to remain vague, because if you provide as.data.frame() methods for your class you should be able to go quite far. In addition, the data frame columns need to fulfill properties, e.g., subsetting (aka "indexing") and also subassignment ( df[i,j] <- v ) Now the real "problem" here is that the '$<-' and '[<-' methods for data frames which you call via df$m <- v or df[,co] <- V are too "forgiving". They only check that NROW(.) of the new entry corresponds to the nrow(<data.frame>). Currently they allow very easy construction of illegal data frames(*), as in your present case. -- *) Yes, it is hard to say when a data.frame is illegal, as there is no formal definition There is more to be said and thought about if you really want sparse matrices in a data frame, and as 'Matrix' maintainers, I'm quite interested *why* you'd want that, but I won't go there now. One last issue though: The idea of allowing to put 'matrix' or 'array' into data frames is that each column of the matrix becomes a separate column of the data frame> data.frame(D = diag(3), M = matrix(1:12, 3,4))D.1 D.2 D.3 M.1 M.2 M.3 M.4 1 1 0 0 1 4 7 10 2 0 1 0 2 5 8 11 3 0 0 1 3 6 9 12 .... and that would be quite inefficient for large sparse matrices. --------- Final recommendation as a summary: If data.frame(.., .., ..) does not work to put entries into a data frame, then don't do it, but rather think about how to make data.frame() work with your objects -- namely by ensuring that as.data.frame() works .. possibly by providing an as.data.frame() method. Best regards, Martin Maechler