William Revelle
2007-May-08 14:26 UTC
[R] strange behavior in data frames with duplicated column names
Dear R gurus, There is an interesting problem with accessing specific items in a column of data frame that has incorrectly been given a duplicate name, even though addressing the item by row and column number. Although the column is correctly listed, an item addressed by row and column number gives the item with the correct row and the original not the duplicated column. Here are the instructions to get this problem x <- matrix(seq(1:12),ncol=3) colnames(x) <- c("A","B","A") #a redundant name for column 2 x.df <- data.frame(x) x.df #the redundant name is corrected x.df[,3] #show the column -- this always works x.df[2,3] #this works here #now incorrectly label the columns with a duplicate name colnames(x.df) <- c("A","B","A") #the redundant name is not detected x.df x.df[,3] #this works as above and shows the column x.df[2,3] #but this gives the value of the first column, not the third <--- rownames(x.df) <- c("First","Second","Third","Third") #detects duplicate name x.df x.df[4,] #correct second row and corrected column names! x.df[4,3] #wrong column x.df #still has the original names with the duplication and corresponding output:> x <- matrix(seq(1:12),ncol=3) > colnames(x) <- c("A","B","A") #a redundant name for column 2 > x.df <- data.frame(x) > x.df #the redundant name is correctedA B A.1 1 1 5 9 2 2 6 10 3 3 7 11 4 4 8 12> x.df[,3] #show the column -- this always works[1] 9 10 11 12> x.df[2,3] #this works here[1] 10> #now incorrectly label the columns with a duplicate name > colnames(x.df) <- c("A","B","A") #the redundant name is not detected > x.dfA B A 1 1 5 9 2 2 6 10 3 3 7 11 4 4 8 12> x.df[,3] #this works as above and shows the column[1] 9 10 11 12> x.df[2,3] #but this gives the value of the first column, not the >third <---[1] 2> rownames(x.df) <- c("First","Second","Third","Third") #detects >duplicate nameError in `row.names<-.data.frame`(`*tmp*`, value = c("First", "Second", : duplicate 'row.names' are not allowed> x.dfA B A 1 1 5 9 2 2 6 10 3 3 7 11 4 4 8 12> x.df[4,] #correct second row and corrected column names!A B A.1 4 4 8 12> x.df[4,3] #wrong column[1] 4> x.df #still has the original names with the duplication> unlist(R.Version())platform arch os "i386-apple-darwin8.9.1" "i386" "darwin8.9.1" system status major "i386, darwin8.9.1" "Patched" "2" minor year month "5.0" "2007" "04" day svn rev language "25" "41315" "R" version.string "R version 2.5.0 Patched (2007-04-25 r41315)">Bill -- William Revelle http://personality-project.org/revelle.html Professor http://personality-project.org/personality.html Department of Psychology http://www.wcas.northwestern.edu/psych/ Northwestern University http://www.northwestern.edu/ Use R for statistics: http://personality-project.org/r
Prof Brian Ripley
2007-May-08 18:10 UTC
[R] strange behavior in data frames with duplicated column names
First, you should not be using colnames<-, which is for a matrix, on a data frame. Use names<- for data frames (and as.data.frame to convert to a data frame). Second, whereas duplicate row names are not allowed in a data frame, duplicate column names are but at your own risk. Third, there is a 'optimization too far' here which I will change in 2.5.0 patched. Often with R development there is a tradeoff between speed and generality. On Tue, 8 May 2007, William Revelle wrote:> Dear R gurus, > > There is an interesting problem with accessing specific items in a > column of data frame that has incorrectly been given a duplicate > name, even though addressing the item by row and column number. > Although the column is correctly listed, an item addressed by row and > column number gives the item with the correct row and the original > not the duplicated column. > > Here are the instructions to get this problem > > x <- matrix(seq(1:12),ncol=3) > colnames(x) <- c("A","B","A") #a redundant name for column 2 > x.df <- data.frame(x) > x.df #the redundant name is corrected > x.df[,3] #show the column -- this always works > x.df[2,3] #this works here > #now incorrectly label the columns with a duplicate name > colnames(x.df) <- c("A","B","A") #the redundant name is not detected > x.df > x.df[,3] #this works as above and shows the column > x.df[2,3] #but this gives the value of the first column, not the third <--- > rownames(x.df) <- c("First","Second","Third","Third") #detects duplicate name > x.df > x.df[4,] #correct second row and corrected column names! > x.df[4,3] #wrong column > x.df #still has the original names with the duplication > > > and corresponding output: > >> x <- matrix(seq(1:12),ncol=3) >> colnames(x) <- c("A","B","A") #a redundant name for column 2 >> x.df <- data.frame(x) >> x.df #the redundant name is corrected > A B A.1 > 1 1 5 9 > 2 2 6 10 > 3 3 7 11 > 4 4 8 12 >> x.df[,3] #show the column -- this always works > [1] 9 10 11 12 >> x.df[2,3] #this works here > [1] 10 >> #now incorrectly label the columns with a duplicate name >> colnames(x.df) <- c("A","B","A") #the redundant name is not detected >> x.df > A B A > 1 1 5 9 > 2 2 6 10 > 3 3 7 11 > 4 4 8 12 >> x.df[,3] #this works as above and shows the column > [1] 9 10 11 12 >> x.df[2,3] #but this gives the value of the first column, not the >> third <--- > [1] 2 >> rownames(x.df) <- c("First","Second","Third","Third") #detects >> duplicate name > Error in `row.names<-.data.frame`(`*tmp*`, value = c("First", "Second", : > duplicate 'row.names' are not allowed >> x.df > A B A > 1 1 5 9 > 2 2 6 10 > 3 3 7 11 > 4 4 8 12 >> x.df[4,] #correct second row and corrected column names! > A B A.1 > 4 4 8 12 >> x.df[4,3] #wrong column > [1] 4 >> x.df #still has the original names with the duplication > >> unlist(R.Version()) > platform > arch os > "i386-apple-darwin8.9.1" > "i386" "darwin8.9.1" > system > status major > "i386, darwin8.9.1" > "Patched" "2" > minor > year month > "5.0" > "2007" "04" > day > svn rev language > "25" > "41315" "R" > version.string > "R version 2.5.0 Patched (2007-04-25 r41315)" >> > > > Bill > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595