Dear R forum Suppose I have a data.frame df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38, 55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D", "B", "A", "F", "H", "P"))> dfid x y x.1 z 1 1 15 36 15 D 2 2 21 38 21 B 3 3 14 55 14 A 4 4 21 11 21 F 5 5 14 5 14 H 6 6 38 18 38 P Clearly columns x and x.1 are identical. In reality, I have a large data.frame and can't make out which columns are identical, but I am sure that column with name say x is repeated as x.1, x.2 etc. How to automatically identify and retain only one column (in this example column x) among the identical columns besides other non-identical columns (viz. id, y and z). Regards Katherine [[alternative HTML version deleted]]
Hi, Katherine, IF the naming scheme of the columns of your data frame is consistently <stringwithoutdot> and <stringwithoutdot.number> if duplicated columns appear THEN (something like) df[ -grep( "\\.", names( df))] could help. (But it's maybe more efficient to avoid - a priori - producing duplicated columns, if the data frame is large, as you say.) Regards -- Gerrit On Thu, 28 Mar 2013, Katherine Gobin wrote:> Dear R forum > > Suppose I have a data.frame > > df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38, 55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D", "B", "A", "F", "H", "P")) > > >> df > ? id? x? y??? x.1 z > 1? 1 15 36? 15 D > 2? 2 21 38? 21 B > 3? 3 14 55? 14 A > 4? 4 21 11? 21 F > 5? 5 14? 5? 14 H > 6? 6 38 18? 38 P > > > Clearly columns x and x.1 are identical. In reality, I have a large data.frame and can't make out which columns are identical, but I am sure that column with name say x is repeated as x.1, x.2 etc. > > How to automatically identify and retain only one column (in this example column x) among the identical columns besides other non-identical columns (viz. id, y and z). > > > Regards > > Katherine
this might screw up the column classes of some of your columns, but it could be enough for what you're doing :) # start with a data frame with duplicate columns v <- data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38, 55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D", "B", "A", "F", "H", "P")) # remove column names names( v ) <- NULL # transpose w <- t( v ) # remove duplicate rows x <- unique( w ) # transpose again y <- t( x ) # convert back to data frame z <- data.frame( y ) On Thu, Mar 28, 2013 at 4:39 AM, Katherine Gobin <katherine_gobin@yahoo.com>wrote:> Dear R forum > > Suppose I have a data.frame > > df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38, > 55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D", "B", "A", "F", > "H", "P")) > > > > df > id x y x.1 z > 1 1 15 36 15 D > 2 2 21 38 21 B > 3 3 14 55 14 A > 4 4 21 11 21 F > 5 5 14 5 14 H > 6 6 38 18 38 P > > > Clearly columns x and x.1 are identical. In reality, I have a large > data.frame and can't make out which columns are identical, but I am sure > that column with name say x is repeated as x.1, x.2 etc. > > How to automatically identify and retain only one column (in this example > column x) among the identical columns besides other non-identical columns > (viz. id, y and z). > > > Regards > > Katherine > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Hi Katherine, May be this helps: df[!duplicated(lapply(df,summary))] #? id? x? y z #1? 1 15 36 D #2? 2 21 38 B #3? 3 14 55 A #4? 4 21 11 F #5? 5 14? 5 H #6? 6 38 18 P #or df[,colnames(unique(as.matrix(df),MARGIN=2))] #? id? x? y z #1? 1 15 36 D #2? 2 21 38 B #3? 3 14 55 A #4? 4 21 11 F #5? 5 14? 5 H #6? 6 38 18 P A.K. ----- Original Message ----- From: Katherine Gobin <katherine_gobin at yahoo.com> To: r-help at r-project.org Cc: Sent: Thursday, March 28, 2013 4:39 AM Subject: [R] How to delete Identical columns Dear R forum Suppose I have a data.frame df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38, 55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D", "B", "A", "F", "H", "P"))> df? id? x? y??? x.1 z 1? 1 15 36? 15 D 2? 2 21 38? 21 B 3? 3 14 55? 14 A 4? 4 21 11? 21 F 5? 5 14? 5? 14 H 6? 6 38 18? 38 P Clearly columns x and x.1 are identical. In reality, I have a large data.frame and can't make out which columns are identical, but I am sure that column with name say x is repeated as x.1, x.2 etc. How to automatically identify and retain only one column (in this example column x) among the identical columns besides other non-identical columns (viz. id, y and z). Regards Katherine ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Mar 28, 2013, at 1:39 AM, Katherine Gobin wrote:> Dear R forum > > Suppose I have a data.frame > > df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38, 55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D", "B", "A", "F", "H", "P")) > > >> df > id x y x.1 z > 1 1 15 36 15 D > 2 2 21 38 21 B > 3 3 14 55 14 A > 4 4 21 11 21 F > 5 5 14 5 14 H > 6 6 38 18 38 P > > > Clearly columns x and x.1 are identical. In reality, I have a large data.frame and can't make out which columns are identical, but I am sure that column with name say x is repeated as x.1, x.2 etc. > > How to automatically identify and retain only one column (in this example column x) among the identical columns besides other non-identical columns (viz. id, y and z). >> df[!duplicated(as.list(df))]id x y z 1 1 15 36 D 2 2 21 38 B 3 3 14 55 A 4 4 21 11 F 5 5 14 5 H 6 6 38 18 P>> Regards > > Katherine > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Katherine Gobin <katherine_gobin <at> yahoo.com> writes:> > Dear R forum > > Suppose I have a data.frame >Say. [snip]> How to automatically identify and retain only one column (in this examplecolumn x) among the identical> columns besides other non-identical columns (viz. id, y and z).See ?unique Details This is a generic function with methods for vectors, *data frames* and ... [emphasis added] So, unique( df, MARGIN=2 ) is what you want. HTH,
Charles Berry <ccberry <at> ucsd.edu> writes: [snip]> > Katherine Gobin <katherine_gobin <at> yahoo.com> writes:> > How to automatically identify and retain only one column (in this example > column x) among the identical > > columns besides other non-identical columns (viz. id, y and z). > > See > > ?unique > > Details > > This is a generic function with methods for vectors, *data frames* and ... > > [emphasis added] > > So, > > unique( df, MARGIN=2 ) > > is what you want. >My bad. Mea culpa, etc. There is a data.frame method, but it ignores the MARGIN arg. Better to stick with what David suggested: http://article.gmane.org/gmane.comp.lang.r.general/289881 HTH,