Colin Phillips
2016-Sep-22 08:45 UTC
[R] `head` doesn't show all columns for an empty data.frame
I'm sure I'm doing something wrong, but I'm seeing strange behaviour using the `head` and `tail` functions on an empty data.frame. To reproduce: # create an empty data frame. I actually read an empty table from Excel using `readWorkbook` from package `openxlsx` test <- structure(list(Code = NULL, Name = NULL, Address = NULL, Sun.Hrs = NULL, Mon.Hrs = NULL), .Names = c("Code", "Name", "Address", "Sun.Hrs", "Mon.Hrs"), class = "data.frame", row.names = integer(0)) # show the data frame test # output in console: # [1] Code Name Address Sun.Hrs Mon.Hrs # <0 rows> (or 0-length row.names) # note that the data frame has 0 rows and 5 columns # show the structure str(test) # output in console: #'data.frame': 0 obs. of 5 variables: # $ Code : NULL # $ Name : NULL # $ Address: NULL # $ Sun.Hrs: NULL # $ Mon.Hrs: NULL #again, the structure shows 5 columns. However... head(test); tail(test) # output in console: #[1] Name Sun.Hrs #<0 rows> (or 0-length row.names) #[1] Name Sun.Hrs #<0 rows> (or 0-length row.names) # now we have only two columns Weird, right? So, here's my session info:> sessionInfo()R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 grid stats graphics grDevices utils datasets methods base other attached packages: [1] tidyr_0.6.0 lpSolve_5.6.13 flexclust_1.3-4 modeltools_0.2-21 lattice_0.20-34 gtools_3.5.0 reshape2_1.4.1 ash_1.0-15 RODBC_1.3-13 [10] ggmap_2.6.1 ggplot2_2.1.0 dplyr_0.5.0 assertthat_0.1 openxlsx_3.0.0 loaded via a namespace (and not attached): [1] Rcpp_0.12.7 plyr_1.8.4 tools_3.3.1 digest_0.6.10 tibble_1.2 gtable_0.2.0 png_0.1-7 DBI_0.5-1 mapproj_1.2-4 [10] parallel_3.3.1 proto_0.3-10 stringr_1.1.0 RgoogleMaps_1.4.1 maps_3.1.1 R6_2.1.3 jpeg_0.1-8 sp_1.2-3 magrittr_1.5 [19] scales_0.4.0 geosphere_1.5-5 colorspace_1.2-6 labeling_0.3 stringi_1.1.1 lazyeval_0.2.0 munsell_0.4.3 rjson_0.2.15 This is not an urgent issue, I just think it's curious, so it would be nice to understand why it happens. Thanks, Colin
Duncan Murdoch
2016-Sep-22 14:24 UTC
[R] `head` doesn't show all columns for an empty data.frame
On 22/09/2016 4:45 AM, Colin Phillips wrote:> I'm sure I'm doing something wrong, but I'm seeing strange behaviour using the `head` and `tail` functions on an empty data.frame. > > To reproduce: > # create an empty data frame. I actually read an empty table from Excel using `readWorkbook` from package `openxlsx` > test <- structure(list(Code = NULL, Name = NULL, Address = NULL, Sun.Hrs = NULL, > Mon.Hrs = NULL), .Names = c("Code", "Name", "Address", "Sun.Hrs", > "Mon.Hrs"), class = "data.frame", row.names = integer(0))That's not a valid dataframe, it's just labelled as one. If you tried to create it with data.frame(), you'd get something different. > test <- data.frame(Code = NULL, Name = NULL, Address = NULL, Sun.Hrs = NULL, + Mon.Hrs = NULL) > test data frame with 0 columns and 0 rows You can create a zero-row dataframe as long as you put 0-length vectors in as columns. NULL is not a vector. > test <- data.frame(Code = numeric(0), Name = numeric(0), Address = numeric(0), Sun.Hrs = numeric(0), + Mon.Hrs = numeric(0)) > test [1] Code Name Address Sun.Hrs Mon.Hrs <0 rows> (or 0-length row.names) If you do that, head() works: > head(test) [1] Code Name Address Sun.Hrs Mon.Hrs <0 rows> (or 0-length row.names) So this is a bug in openxlsx. It's also a well-known limitation of the S3 object system: you can easily create things that are labelled with a certain class, but aren't valid objects of that class. Duncan Murdoch