thr3ads.net - R help - [R] How to delete Identical columns [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Katherine Gobin

2013-Mar-28 08:39 UTC

[R] How to delete Identical columns

Dear R forum

Suppose I have a data.frame 

df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38, 55,
11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D", "B",
"A", "F", "H", "P"))

> df  id  x  y    x.1 z
1  1 15 36  15 D
2  2 21 38  21 B
3  3 14 55  14 A
4  4 21 11  21 F
5  5 14  5  14 H
6  6 38 18  38 P


Clearly columns x and x.1 are identical. In reality, I have a large data.frame
and can't make out which columns are identical, but I am sure that column
with name say x is repeated as x.1, x.2 etc.

How to automatically identify and retain only one column (in this example column
x) among the identical columns besides other non-identical columns (viz. id, y
and z).


Regards

Katherine

	[[alternative HTML version deleted]]

Gerrit Eichner

2013-Mar-28 08:58 UTC

head link

[R] How to delete Identical columns

Hi, Katherine,

IF the naming scheme of the columns of your data frame is consistently 
<stringwithoutdot> and <stringwithoutdot.number> if duplicated
columns
appear THEN (something like)

df[ -grep( "\\.", names( df))]

could help. (But it's maybe more efficient to avoid - a priori - producing 
duplicated columns, if the data frame is large, as you say.)

  Regards -- Gerrit


On Thu, 28 Mar 2013, Katherine Gobin wrote:
> Dear R forum
>
> Suppose I have a data.frame
>
> df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38,
55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D",
"B", "A", "F", "H", "P"))
>
>
>> df
> ? id? x? y??? x.1 z
> 1? 1 15 36? 15 D
> 2? 2 21 38? 21 B
> 3? 3 14 55? 14 A
> 4? 4 21 11? 21 F
> 5? 5 14? 5? 14 H
> 6? 6 38 18? 38 P
>
>
> Clearly columns x and x.1 are identical. In reality, I have a large
data.frame and can't make out which columns are identical, but I am sure
that column with name say x is repeated as x.1, x.2 etc.
>
> How to automatically identify and retain only one column (in this example
column x) among the identical columns besides other non-identical columns (viz.
id, y and z).
>
>
> Regards
>
> Katherine

Anthony Damico

2013-Mar-28 10:44 UTC

head link

[R] How to delete Identical columns

this might screw up the column classes of some of your columns, but it
could be enough for what you're doing :)


# start with a data frame with duplicate columns
v <- data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38,
55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D",
"B", "A", "F",
"H", "P"))

# remove column names
names( v ) <- NULL

# transpose
w <- t( v )
# remove duplicate rows
x <- unique( w )
# transpose again
y <- t( x )
# convert back to data frame
z <- data.frame( y )






On Thu, Mar 28, 2013 at 4:39 AM, Katherine Gobin
<katherine_gobin@yahoo.com>wrote:
> Dear R forum
>
> Suppose I have a data.frame
>
> df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38,
> 55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D",
"B", "A", "F",
> "H", "P"))
>
>
> > df
>   id  x  y    x.1 z
> 1  1 15 36  15 D
> 2  2 21 38  21 B
> 3  3 14 55  14 A
> 4  4 21 11  21 F
> 5  5 14  5  14 H
> 6  6 38 18  38 P
>
>
> Clearly columns x and x.1 are identical. In reality, I have a large
> data.frame and can't make out which columns are identical, but I am
sure
> that column with name say x is repeated as x.1, x.2 etc.
>
> How to automatically identify and retain only one column (in this example
> column x) among the identical columns besides other non-identical columns
> (viz. id, y and z).
>
>
> Regards
>
> Katherine
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

arun

2013-Mar-28 13:34 UTC

head link

[R] How to delete Identical columns

Hi Katherine,
May be this helps:


df[!duplicated(lapply(df,summary))]
#? id? x? y z
#1? 1 15 36 D
#2? 2 21 38 B
#3? 3 14 55 A
#4? 4 21 11 F
#5? 5 14? 5 H
#6? 6 38 18 P
#or
df[,colnames(unique(as.matrix(df),MARGIN=2))]
#? id? x? y z
#1? 1 15 36 D
#2? 2 21 38 B
#3? 3 14 55 A
#4? 4 21 11 F
#5? 5 14? 5 H
#6? 6 38 18 P
A.K.







----- Original Message -----
From: Katherine Gobin <katherine_gobin at yahoo.com>
To: r-help at r-project.org
Cc: 
Sent: Thursday, March 28, 2013 4:39 AM
Subject: [R] How to delete Identical columns

Dear R forum

Suppose I have a data.frame 

df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38, 55,
11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D", "B",
"A", "F", "H", "P"))

> df? id? x? y??? x.1 z
1? 1 15 36? 15 D
2? 2 21 38? 21 B
3? 3 14 55? 14 A
4? 4 21 11? 21 F
5? 5 14? 5? 14 H
6? 6 38 18? 38 P


Clearly columns x and x.1 are identical. In reality, I have a large data.frame
and can't make out which columns are identical, but I am sure that column
with name say x is repeated as x.1, x.2 etc.

How to automatically identify and retain only one column (in this example column
x) among the identical columns besides other non-identical columns (viz. id, y
and z).


Regards

Katherine

??? [[alternative HTML version deleted]]


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius

2013-Mar-28 15:25 UTC

head link

[R] How to delete Identical columns

On Mar 28, 2013, at 1:39 AM, Katherine Gobin wrote:
> Dear R forum
> 
> Suppose I have a data.frame 
> 
> df = data.frame(id = c(1:6), x = c(15, 21, 14, 21, 14, 38), y = c(36, 38,
55, 11, 5, 18), x.1 = c(15, 21, 14, 21, 14, 38), z = c("D",
"B", "A", "F", "H", "P"))
> 
> 
>> df
>   id  x  y    x.1 z
> 1  1 15 36  15 D
> 2  2 21 38  21 B
> 3  3 14 55  14 A
> 4  4 21 11  21 F
> 5  5 14  5  14 H
> 6  6 38 18  38 P
> 
> 
> Clearly columns x and x.1 are identical. In reality, I have a large
data.frame and can't make out which columns are identical, but I am sure
that column with name say x is repeated as x.1, x.2 etc.
> 
> How to automatically identify and retain only one column (in this example
column x) among the identical columns besides other non-identical columns (viz.
id, y and z).
> 
> df[!duplicated(as.list(df))]  id  x  y z
1  1 15 36 D
2  2 21 38 B
3  3 14 55 A
4  4 21 11 F
5  5 14  5 H
6  6 38 18 P
> 
> Regards
> 
> Katherine
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Charles Berry

2013-Mar-28 15:40 UTC

head link

[R] How to delete Identical columns

Katherine Gobin <katherine_gobin <at> yahoo.com> writes:
> 
> Dear R forum
> 
> Suppose I have a data.frame 
> 
Say.

[snip]
> How to automatically identify and retain only one column (in this example
column x) among the identical> columns besides other non-identical columns (viz. id, y and z).

See 

?unique

Details

This is a generic function with methods for vectors, *data frames* and ...

[emphasis added]

So,

   unique( df, MARGIN=2 )

is what you want.

HTH,

Charles Berry

2013-Mar-28 16:25 UTC

head link

[R] How to delete Identical columns

Charles Berry <ccberry <at> ucsd.edu> writes:
[snip]> 
> Katherine Gobin <katherine_gobin <at> yahoo.com> writes:
> > How to automatically identify and retain only one column (in this
example
> column x) among the identical
> > columns besides other non-identical columns (viz. id, y and z).
> 
> See 
> 
> ?unique
> 
> Details
> 
> This is a generic function with methods for vectors, *data frames* and ...
> 
> [emphasis added]
> 
> So,
> 
>    unique( df, MARGIN=2 )
> 
> is what you want.
> 

My bad. Mea culpa, etc.

There is a data.frame method, but it ignores the MARGIN arg.

Better to stick with what David suggested:

  http://article.gmane.org/gmane.comp.lang.r.general/289881

HTH,

Reasonably Related Threads

Search for more apparently analagous threads

R help - Mar 2013 - How to delete Identical columns

[R] How to delete Identical columns

[R] How to delete Identical columns

[R] How to delete Identical columns

[R] How to delete Identical columns

[R] How to delete Identical columns

[R] How to delete Identical columns

[R] How to delete Identical columns

Reasonably Related Threads