William Dunlap
2016-Jan-13 21:46 UTC
[Rd] as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)
as.data.frame methods behave inconsistently when they are given a row.name argument of the wrong length. The matrix method silently ignores row.names if it has the wrong length and the numeric, integer, and character methods do not bother to check and thus make an illegal data.frame.> as.data.frame(matrix(1:6,nrow=3), row.names=c("One","Two"))V1 V2 1 1 4 2 2 5 3 3 6> as.data.frame(1:3, row.names=c("One","Two"))1:3 One 1 Two 2 Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs> as.data.frame(c("a","b","c"), row.names=c("One","Two"))c("a", "b", "c") One a Two b Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs (The warnings are from the printing, not the making, of the data.frames.) I ran into this while using the DoE.wrapper package, which has what I think is a typo, giving "t" as the row.names for the output of mapply(): cross.design.R: ro <- as.data.frame(mapply("touter",ro1, ro2, "paste", sep="_"),"t") I don't know all the reasons why people use as.data.frame instead of data.frame. Bill Dunlap TIBCO Software wdunlap tibco.com [[alternative HTML version deleted]]
Paul Grosu
2016-Jan-14 22:35 UTC
[Rd] as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)
Hi Bill, The thing is that is happening here is the specific instance of as.data.frame that is being run, which in this instance switch between as.data.frame.matrix() and as.data.frame.matrix(). I attached the dataframe.R code, which you can find the src/library/base/R folder of the source code. Though if you use data.frame() it will give a more expected result. For instance the first runs as follows through matrix:> as.data.frame.matrix(matrix(1:6,nrow=3), row.names=c("One","Two"))V1 V2 1 1 4 2 2 5 3 3 6 The other two run via vector:> as.data.frame.vector(1:3, row.names=c("One","Two"))1:3 One 1 Two 2 Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs> as.data.frame.vector(c("a","b","c"), row.names=c("One","Two"))c("a", "b", "c") One a Two b Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs The thing is that if you use data.frame() it will work more as expected:> data.frame(matrix(1:6,nrow=3), row.names=c("One","Two"))Error in data.frame(matrix(1:6, nrow = 3), row.names = c("One", "Two")) : row names supplied are of the wrong length> data.frame(matrix(1:6,nrow=3), row.names=c("One","Two","Three"))X1 X2 One 1 4 Two 2 5 Three 3 6> data.frame(c("a","b","c"), row.names=c("One","Two"))Error in data.frame(c("a", "b", "c"), row.names = c("One", "Two")) : row names supplied are of the wrong length> data.frame(c("a","b","c"), row.names=c("One","Two","Three"))c..a....b....c.. One a Two b Three c> data.frame(1:3, row.names=c("One","Two"))Error in data.frame(1:3, row.names = c("One", "Two")) : row names supplied are of the wrong length> data.frame(1:3, row.names=c("One","Two","Three"))X1.3 One 1 Two 2 Three 3 Hope it helps, Paul -----Original Message----- From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of William Dunlap via R-devel Sent: Wednesday, January 13, 2016 4:46 PM To: r-devel at r-project.org; Ulrike Groemping Subject: [Rd] as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?) as.data.frame methods behave inconsistently when they are given a row.name argument of the wrong length. The matrix method silently ignores row.names if it has the wrong length and the numeric, integer, and character methods do not bother to check and thus make an illegal data.frame.> as.data.frame(matrix(1:6,nrow=3), row.names=c("One","Two"))V1 V2 1 1 4 2 2 5 3 3 6> as.data.frame(1:3, row.names=c("One","Two"))1:3 One 1 Two 2 Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs> as.data.frame(c("a","b","c"), row.names=c("One","Two"))c("a", "b", "c") One a Two b Warning message: In format.data.frame(x, digits = digits, na.encode = FALSE) : corrupt data frame: columns will be truncated or padded with NAs (The warnings are from the printing, not the making, of the data.frames.) I ran into this while using the DoE.wrapper package, which has what I think is a typo, giving "t" as the row.names for the output of mapply(): cross.design.R: ro <- as.data.frame(mapply("touter",ro1, ro2, "paste", sep="_"),"t") I don't know all the reasons why people use as.data.frame instead of data.frame. Bill Dunlap TIBCO Software wdunlap tibco.com [[alternative HTML version deleted]] ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Martin Maechler
2016-Jan-15 07:57 UTC
[Rd] as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)
>>>>> Paul Grosu <pgrosu at gmail.com> >>>>> on Thu, 14 Jan 2016 17:35:49 -0500 writes:> Hi Bill, > The thing is that is happening here is the specific > instance of as.data.frame that is being run, which in this > instance switch between as.data.frame.matrix() and as.data.frame.matrix(). (This must be another typo i.e. "cut/n/paste forgot to modify" lapsus; you probably meant *.vector in the 2nd case). I'm pretty sure Bill was not asking *why* this happens {he would easily find out if he wanted} but reporting two (potential) bugs: - one in R [not reporting erronous as.data.frame() usage] - one in DoE.wrapper I'm going to look into the R one, which is indeed in the as.data.frame.vector() method, as you've noted. -- Martin Maechler ETH Zurich
Martin Maechler
2016-Jan-16 19:45 UTC
[Rd] as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)
>>>>> William Dunlap via R-devel <r-devel at r-project.org> >>>>> on Wed, 13 Jan 2016 13:46:05 -0800 writes:> as.data.frame methods behave inconsistently when they are given a row.name > argument of the wrong length. The matrix method silently ignores row.names > if it has the wrong length and the numeric, integer, and character methods > do not bother to check and thus make an illegal data.frame. > > > as.data.frame(matrix(1:6,nrow=3), row.names=c("One","Two")) > V1 V2 > 1 1 4 > 2 2 5 > 3 3 6 > > as.data.frame(1:3, row.names=c("One","Two")) > 1:3 > One 1 > Two 2 > Warning message: > In format.data.frame(x, digits = digits, na.encode = FALSE) : > corrupt data frame: columns will be truncated or padded with NAs > > as.data.frame(c("a","b","c"), row.names=c("One","Two")) > c("a", "b", "c") > One a > Two b > Warning message: > In format.data.frame(x, digits = digits, na.encode = FALSE) : > corrupt data frame: columns will be truncated or padded with NAsas I said yesterday, I want to "fix" this in R. As Paul Grosu mentioned, the bugous -- too tolerant -- behavior is in the as.data.frame.vector() method, and the as.data.frame.matrix() simply drops wrong row.names and use default row names in that case. This would leave (at least) two ways to change: 1) the *.matrix compatible one simply forgets wrong 'row.names' 2) Wrong row.names are a user error. Now, '1)' would be more in line with the matrix method, but really feels wrong, because it does not catch user error and silently disregards a specifically specified argument. For '2)' I propose a fix which will only *warn* about the wrong 'row.names' for now (so code continues to work which has implicitly relied on the wrong behavior, but with a warning: > as.data.frame(1:3, row.names=c("One","Two")) 1:3 1 1 2 2 3 3 Warning message: In as.data.frame.integer(1:3, row.names = c("One", "Two")) : 'row.names' is not a character vector of length 3 -- omitting it. Will be an error! > This will give new warnings in packages, and package authors can fix these.... before the above will eventually become an error. The remaining question is if the as.data.frame.matrix() method should not also produce the same warning about illegal row.names. Interestingly, the *model.matrix* method does produce an error even now, when row.names are specified of wrong length: > ff <- log(Volume) ~ log(Height) + log(Girth) > m <- model.frame(ff, trees) > mat <- model.matrix(ff, m) > data.frame(mat, row.names = paste0("r", 1:30)) Error in data.frame(mat, row.names = paste0("r", 1:30)) : row names supplied are of the wrong length >
Apparently Analagous Threads
- as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)
- as.data.frame and illegal row.names argument (bug in package:DoE.wrapper?)
- corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)
- corrupt data frame: columns will be truncated or padded with NAs in: format.data.frame(x, digits = digits)
- OT: DOE - experiments for teaching