Dear R-users, I am somewhat puzzled by how R treats data frames with nested data frames. Below are a couple of examples, maybe someone could help explain what the guiding logic here is. ## construct plain data frame> z <- data.frame(x=1)## add a data frame member> z$y <- data.frame(a=1,b=2)## puzzle 1: z is apparently different from a straightforward construction of the 'same' object> all.equal(z, data.frame(x=1,y=data.frame(a=1,b=2)))[1] "Names: 1 string mismatch" "Length mismatch: comparison on first 2 components" [3] "Component 2: Modes: list, numeric" "Component 2: names for target but not for current" [5] "Component 2: Attributes: < Modes: list, NULL >" "Component 2: Attributes: < names for target but not for current >" [7] "Component 2: Attributes: < Length mismatch: comparison on first 0 components >" "Component 2: Length mismatch: comparison on first 1 components" ## puzzle 2: could not rbind z> rbind.data.frame(z, z)Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "1")) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': '1'> version_ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 9.1 year 2009 month 06 day 26 svn rev 48839 language R version.string R version 2.9.1 (2009-06-26) Thanks, Vadim Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. Jump Trading, therefore, does not make any guarantees as to the completeness or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial product.
On Dec 23, 2010, at 5:06 PM, Vadim Ogranovich wrote:> Dear R-users, > > I am somewhat puzzled by how R treats data frames with nested data > frames.Speaking as a fellow user, .... why? Why would we want dataframes inside dataframes? Why wouldn't lists of dataframes be more appropriate if you were hoping to use apply or <some other function> ?> Below are a couple of examples, maybe someone could help explain > what the guiding logic here is. > > ## construct plain data frame >> z <- data.frame(x=1) > > ## add a data frame member >> z$y <- data.frame(a=1,b=2)cbind.data.frame (dispatched if the first argument to cbind is a dataframe) would give you another dataframe without the mess of having nesting. > cbind(z, b=2) x b 1 1 2 This is also the time to ask .... what is it that you are _really_ trying to accomplish?> > ## puzzle 1: z is apparently different from a straightforward > construction of the 'same' object >> all.equal(z, data.frame(x=1,y=data.frame(a=1,b=2))) > [1] "Names: 1 string > mismatch" > "Length mismatch: comparison on first 2 components" > [3] "Component 2: Modes: list, > numeric" "Component 2: > names for target but not for current" > [5] "Component 2: Attributes: < Modes: list, NULL > >" "Component 2: Attributes: < names > for target but not for current >" > [7] "Component 2: Attributes: < Length mismatch: comparison on first > 0 components >" "Component 2: Length mismatch: comparison on first 1 > components"Yes. the second one is equivalent to passing just the list portions of the nameless data.frame and ignoring attributes.> > ## puzzle 2: could not rbind z >> rbind.data.frame(z, z) > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "1")) : > duplicate 'row.names' are not allowed > In addition: Warning message: > non-unique value when setting 'row.names': '1'That is a puzzle, I agree. This succeeds: z <- data.frame(x=1, y=2) rbind(z,z ######### x y 1 1 2 2 1 2 Perhaps a bug (... trying to add drop=FALSE had an amusing result: > rbind(z,z, drop=FALSE) x 1 1 2 1 drop 0 -- David> >> version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 9.1 > year 2009 > month 06 > day 26 > svn rev 48839 > language R > version.string R version 2.9.1 (2009-06-26) > > > Thanks, > Vadim-- David Winsemius, MD West Hartford, CT > sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods base other attached packages: [1] sos_1.3-0 brew_1.0-4 lattice_0.19-13 loaded via a namespace (and not attached): [1] grid_2.12.1 tools_2.12.1
On Thu, Dec 23, 2010 at 5:06 PM, Vadim Ogranovich <vogranovich at jumptrading.com> wrote:> Dear R-users, > > I am somewhat puzzled by how R treats data frames with nested data frames. Below are a couple of examples, maybe someone could help explain what the guiding logic here is. > > ## construct plain data frame >> z <- data.frame(x=1) > > ## add a data frame member >> z$y <- data.frame(a=1,b=2) > > ## puzzle 1: z is apparently different from a straightforward construction of the 'same' object >> all.equal(z, data.frame(x=1,y=data.frame(a=1,b=2))) > [1] "Names: 1 string mismatch" ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "Length mismatch: comparison on first 2 components" > [3] "Component 2: Modes: list, numeric" ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"Component 2: names for target but not for current" > [5] "Component 2: Attributes: < Modes: list, NULL >" ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "Component 2: Attributes: < names for target but not for current >" > [7] "Component 2: Attributes: < Length mismatch: comparison on first 0 components >" "Component 2: Length mismatch: comparison on first 1 components" > > ## puzzle 2: could not rbind z >> rbind.data.frame(z, z) > Error in `row.names<-.data.frame`(`*tmp*`, value = c("1", "1")) : > ?duplicate 'row.names' are not allowed > In addition: Warning message: > non-unique value when setting 'row.names': '1' >1. If we strip out all data frames and leave them as lists then (a) z is basically a nested list list(x=1,y=list(a=1,b=2)) whereas (b) the construct data.frame(x=1,y=data.frame(a=1,b=2))) is interpreted to be a flat list, namely, the same as: data.frame(x = 1, y.a = 1, y.b = 2) and if we strip out data frames is basically list(x = 1, y.a = 1, y.b = 2) 2. Although this may be nothing more than stating the obvious, it seems its not necessarily true that operations that work in the normal cases also work in strange uncommon nested cases like this. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com