> Date: Wed, 7 Feb 2001 09:33:12 -0800 (PST) > From: Thomas Lumley <tlumley@u.washington.edu> > To: Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> > cc: Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk>, R-devel@r-project.org > Subject: Re: [Rd] RE: [R] Removing "row.names" > MIME-Version: 1.0 > > On Wed, 7 Feb 2001, Kurt Hornik wrote: > > > >>>>> Thomas Lumley writes: > > > > > On Wed, 7 Feb 2001, Kurt Hornik wrote: > > >> >>>>> Peter Dalgaard BSA writes: > > >> > > >> > Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> writes: > > >> >> names(sampled) <- " " > > >> >> and > > >> >> dimnames(sampled)[[2]] <- " " > > >> >> > > >> >> happily introduce non-unique variable names in the data frame. > > >> >> > > >> >> Is the rule that row.names and names must be unique still on? > > >> >> > > >> >> Argh ... > > >> > > >> > Splus 3.4 dispatches on dimnames<-, but not on names<- with the > > >> > following curious result: > > >> > > >> >> d <- data.frame(a=1:3,b=4:6) > > >> >> names(d)<-c(" "," ") > > >> >> d > > >> > > >> > 1 1 4 > > >> > 2 2 5 > > >> > 3 3 6 > > >> >> dimnames(d)[[1]] <- rep(" ",3) > > >> > Error in "dimnames<-.data.frame"(d, .A0): column names must be unique > > >> > Dumped > > >> > > >> > R dispatches similarly, but doesn't check the dimnames in > > >> > dimnames<-.data.frame. It could do so quite easily. Just add > > >> > > >> > || any(duplicated(d[[1]])) || any(duplicated(d[[2]])) > > >> > > >> > at the appropriate spot. > > >> > > >> Thomas' view about what should be permitted seems to be different. > > > > > I wouldn't object to making it hard to create duplicated names(), but > > > I think it would be a bad idea to have data.frame() make up unique > > > names if it's given non-unique ones. > > > > Maybe `check.names' could also be used for uniqueness testing? > > > > In any case, I think we should specify what *exactly* a data frame is. > > > > I think we should specify, and check.names is a logical way to > allow/forbid non-unique columns. > > Having a new class would be messy: logically it shouldn't inherit from > data.frame, data.frame should inherit from it, but that would be a real > pain to set up. >Data frames were originally meant to be used in modeling functions. The opening paragraph in Chapter 3 (Data for Models) in the White Book says: "This chapter describes the general structure for data that will be used throughout the book. In particular, it introduces the data frame, a class of objects to represent the data typically encounterd in fitting models." However, data.frames may not be quite appropriate for representing other types of tabular data (certainly a data.frame does not capture the essence of, say, a "relational" table in the SQL sense, which doesn't have the concept of row names). Several manifestations of this problem are coercing character data to factors "at the drop of a hat" (as someone wrote here or in s-news), the row.names issue now being discussed, problems including general objets in the "cells" of the data.frame, etc. I think that the concept of a data.frame to represent data for fitting models is fine, but we may (certainly I) have abused this concept. We need other classes of tabular data objects in addition (not as a replacement) to data.frames, together with coercion methods and perhaps other utilities. David A. James Statistics Research, Room 2C-253 Phone: (908) 582-3082 Bell Labs, Lucent Technologies Fax: (908) 582-3340 Murray Hill, NJ 09794-0636 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>>>>> David James writes:>> Date: Wed, 7 Feb 2001 09:33:12 -0800 (PST) >> From: Thomas Lumley <tlumley@u.washington.edu> >> To: Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> >> cc: Peter Dalgaard BSA <p.dalgaard@biostat.ku.dk>, R-devel@r-project.org >> Subject: Re: [Rd] RE: [R] Removing "row.names" >> MIME-Version: 1.0 >> >> On Wed, 7 Feb 2001, Kurt Hornik wrote: >> >> > >>>>> Thomas Lumley writes: >> > >> > > On Wed, 7 Feb 2001, Kurt Hornik wrote: >> > >> >>>>> Peter Dalgaard BSA writes: >> > >> >> > >> > Kurt Hornik <Kurt.Hornik@ci.tuwien.ac.at> writes: >> > >> >> names(sampled) <- " " >> > >> >> and >> > >> >> dimnames(sampled)[[2]] <- " " >> > >> >> >> > >> >> happily introduce non-unique variable names in the data frame. >> > >> >> >> > >> >> Is the rule that row.names and names must be unique still on? >> > >> >> >> > >> >> Argh ... >> > >> >> > >> > Splus 3.4 dispatches on dimnames<-, but not on names<- with the >> > >> > following curious result: >> > >> >> > >> >> d <- data.frame(a=1:3,b=4:6) >> > >> >> names(d)<-c(" "," ") >> > >> >> d >> > >> >> > >> > 1 1 4 >> > >> > 2 2 5 >> > >> > 3 3 6 >> > >> >> dimnames(d)[[1]] <- rep(" ",3) >> > >> > Error in "dimnames<-.data.frame"(d, .A0): column names must be unique >> > >> > Dumped >> > >> >> > >> > R dispatches similarly, but doesn't check the dimnames in >> > >> > dimnames<-.data.frame. It could do so quite easily. Just add >> > >> >> > >> > || any(duplicated(d[[1]])) || any(duplicated(d[[2]])) >> > >> >> > >> > at the appropriate spot. >> > >> >> > >> Thomas' view about what should be permitted seems to be different. >> > >> > > I wouldn't object to making it hard to create duplicated names(), but >> > > I think it would be a bad idea to have data.frame() make up unique >> > > names if it's given non-unique ones. >> > >> > Maybe `check.names' could also be used for uniqueness testing? >> > >> > In any case, I think we should specify what *exactly* a data frame is. >> > >> >> I think we should specify, and check.names is a logical way to >> allow/forbid non-unique columns. >> >> Having a new class would be messy: logically it shouldn't inherit from >> data.frame, data.frame should inherit from it, but that would be a real >> pain to set up. >>> Data frames were originally meant to be used in modeling functions. > The opening paragraph in Chapter 3 (Data for Models) in the White Book > says:> "This chapter describes the general structure for data that > will be used throughout the book. In particular, it introduces the > data frame, a class of objects to represent the data typically encounterd > in fitting models."> However, data.frames may not be quite appropriate for representing > other types of tabular data (certainly a data.frame does not capture > the essence of, say, a "relational" table in the SQL sense, which > doesn't have the concept of row names). Several manifestations of > this problem are coercing character data to factors "at the drop of a > hat" (as someone wrote here or in s-news), the row.names issue now > being discussed, problems including general objets in the "cells" of > the data.frame, etc.> I think that the concept of a data.frame to represent data for fitting > models is fine, but we may (certainly I) have abused this concept. We > need other classes of tabular data objects in addition (not as a > replacement) to data.frames, together with coercion methods and > perhaps other utilities.Thomas had said that yes it would be nice to have something with less restrictions for modeling, but that it was uneconomical at least to introduce a new class that data.frame would then inherit from. I interpret your comment as suggesting that we introduce a new class for holding tabular data? Do you have specific ideas on this? -k -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._