thr3ads.net - R devel - [Rd] Non-unique column names in data frames [Apr 2007]

If this information is useful, please help other people find it:
Share via:

John Fox

2007-Apr-01 13:57 UTC

[Rd] Non-unique column names in data frames

Dear r-devel members,

It's just been brought to my attention that R permits non-unique column
names in data frames -- e.g., via assignment to names() or colnames(). This
behaviour is consistent with the help files (as I discovered), but it's not
consistent with the behaviour of rownames() and row.names(). For example,

	row.names(airquality) <- rep("a", nrow(airquality)) 

generates an error, but 

	names(airquality) <- rep("a", ncol(airquality))

or even 

	names(airquality) <- rep("", ncol(airquality))

do not.

I figure that there must be some rationale for this difference, but I can't
think of what it might be. Any thoughts?

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox

Prof Brian Ripley

2007-Apr-03 07:21 UTC

head link

[Rd] Non-unique column names in data frames

On Sun, 1 Apr 2007, John Fox wrote:
> Dear r-devel members,
>
> It's just been brought to my attention that R permits non-unique column
> names in data frames -- e.g., via assignment to names() or colnames(). This
> behaviour is consistent with the help files (as I discovered), but it's
not
> consistent with the behaviour of rownames() and row.names(). For example,
??  matrices and data frames are different, but rownames() and row.names() 
do the same on each class.
>
> 	row.names(airquality) <- rep("a", nrow(airquality))
>
> generates an error, but
as does rownames().
>
> 	names(airquality) <- rep("a", ncol(airquality))
>
> or even
>
> 	names(airquality) <- rep("", ncol(airquality))
>
> do not.
>
> I figure that there must be some rationale for this difference, but I
can't
> think of what it might be. Any thoughts?
It's part of the definition of a data frame, from long ago (White Book 
p.60).  Think of the row names as a 'primary key' in the sense of a 
DBMS/SQL.

Why the names are not also required to be non-empty and unique 
is something for the designer (and John Chambers has not (yet) replied), 
but it is clearly deliberate as data.frame(check.names=FALSE) is allowed.
One possible issue is that there are many ways to set names of a data 
frame, e.g. DF$name <- value can add a column, and checking them all could 
be tedious.  OTOH, setting row names is centralized (it is done inside
attr<-()).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Reasonably Related Threads

Search for more maybe matching threads

R devel - Apr 2007 - Non-unique column names in data frames

[Rd] Non-unique column names in data frames

[Rd] Non-unique column names in data frames

Reasonably Related Threads