thr3ads.net - R help - [R] Fast Removing Duplicates from Every Column [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Bert Jacobs

2007-Jan-05 08:54 UTC

[R] Fast Removing Duplicates from Every Column

Hi,

I'm looking for some lines of code that does the following:
I have a dataframe with 160 Columns and a number of rows (max 30):

		Col1 Col2 Col3 ... Col 159 Col 160 
Row 1 	0 	0 	LD ... 0	   VD 
Row 2 	HD 	0 	0 	 0 	   MD 
Row 3 	0 	HD 	HD 	 0       LD 
Row 4 	LD 	HD 	HD 	 0 	   LD 
...		...
LastRow	HD    HD    LD     0       MD


Now I want a dataframe that looks like this. As you see all duplicates are
removed. Can this dataframe be constructed in a fast way?

		Col1 Col2 Col3 ... Col 159 Col 160 
Row 1       0    0    LD       0	    VD
Row 2     	HD   HD   0        0        MD
Row 3     	LD   0    HD       0        LD

Thx for helping me out.
Bert

Petr Pikal

2007-Jan-05 10:51 UTC

head link

[R] Fast Removing Duplicates from Every Column

Hi

I am not sure if I understand how do you want to select unique items.

with
 sapply(DF, function(x) !duplicated(x))
you can get data frame with TRUE when an item in particular column is 
unique and FALSE in opposite. However then you need to choose which 
rows to keep or discard

e.g.

DF[rowSums(sapply(comp, function(x) !duplicated(x)))>1,]

selects all rows in which are 2 or more unique values.

HTH
Petr


On 5 Jan 2007 at 9:54, Bert Jacobs wrote:

From:           	"Bert Jacobs" <b.jacobs at pandora.be>
To:             	"'R help list'" <r-help at
stat.math.ethz.ch>
Date sent:      	Fri, 5 Jan 2007 09:54:17 +0100
Subject:        	Re: [R] Fast Removing Duplicates from Every Column
> Hi,
> 
> I'm looking for some lines of code that does the following:
> I have a dataframe with 160 Columns and a number of rows (max 30):
> 
>   Col1 Col2 Col3 ... Col 159 Col 160 
> Row 1 	0 	0 	LD ... 0	   VD 
> Row 2 	HD 	0 	0 	 0 	   MD 
> Row 3 	0 	HD 	HD 	 0       LD 
> Row 4 	LD 	HD 	HD 	 0 	   LD 
> ...		...
> LastRow	HD    HD    LD     0       MD
> 
> 
> Now I want a dataframe that looks like this. As you see all duplicates
> are removed. Can this dataframe be constructed in a fast way?
> 
>   Col1 Col2 Col3 ... Col 159 Col 160 
> Row 1       0    0    LD       0	    VD
> Row 2     	HD   HD   0        0        MD
> Row 3     	LD   0    HD       0        LD
> 
> Thx for helping me out.
> Bert
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
Petr Pikal
petr.pikal at precheza.cz

Petr Pikal

2007-Jan-16 10:47 UTC

head link

[R] Fast Removing Duplicates from Every Column

Hi

I have no idea how Test data look like. However help pages of 
functions

data.frame()
as.data.frame()
str()

and maybe few others can help you find how to change objects to data 
frames.

HTH
Petr


On 16 Jan 2007 at 10:36, Bert Jacobs wrote:

From:           	"Bert Jacobs" <b.jacobs at pandora.be>
To:             	"'Petr Pikal'" <petr.pikal at
precheza.cz>
Subject:        	RE: [R] Fast Removing Duplicates from Every Column
Date sent:      	Tue, 16 Jan 2007 10:36:42 +0100
> Hi Petr,
> 
> Thx for answeringen me on the question below.
> Actually I could use this line of code to get my problem solved.
> 
> Test = apply(X=my_data, MARGIN=2, FUN=unique)
> 
> Now I was wondering how to transform 'Test' into a dataframe, while
> there are different rows implied.
> 
> Thx,
> Bert
> 
> _____________________________
> 
> Bert Jacobs
> Marketing Intelligence Engineer
> Plasveldlaan 5
> 9400 Ninove
> Tel: 0477/68.74.07
> Fax: 054/25.00.35
> E-mail: b.jacobs at pandora.be
> 
> -----Original Message-----
> From: Petr Pikal [mailto:petr.pikal at precheza.cz] 
> Sent: 05 January 2007 11:51
> To: Bert Jacobs; 'R help list'
> Subject: Re: [R] Fast Removing Duplicates from Every Column
> 
> Hi
> 
> I am not sure if I understand how do you want to select unique items.
> 
> with
>  sapply(DF, function(x) !duplicated(x))
> you can get data frame with TRUE when an item in particular column is
> unique and FALSE in opposite. However then you need to choose which
> rows to keep or discard
> 
> e.g.
> 
> DF[rowSums(sapply(comp, function(x) !duplicated(x)))>1,]
> 
> selects all rows in which are 2 or more unique values.
> 
> HTH
> Petr
> 
> 
> On 5 Jan 2007 at 9:54, Bert Jacobs wrote:
> 
> From:           	"Bert Jacobs" <b.jacobs at pandora.be>
> To:             	"'R help list'" <r-help at
stat.math.ethz.ch>
> Date sent:      	Fri, 5 Jan 2007 09:54:17 +0100
> Subject:        	Re: [R] Fast Removing Duplicates from Every Column
> 
> > Hi,
> > 
> > I'm looking for some lines of code that does the following:
> > I have a dataframe with 160 Columns and a number of rows (max 30):
> > 
> >   Col1 Col2 Col3 ... Col 159 Col 160 
> > Row 1 	0 	0 	LD ... 0	   VD 
> > Row 2 	HD 	0 	0 	 0 	   MD 
> > Row 3 	0 	HD 	HD 	 0       LD 
> > Row 4 	LD 	HD 	HD 	 0 	   LD 
> > ...		...
> > LastRow	HD    HD    LD     0       MD
> > 
> > 
> > Now I want a dataframe that looks like this. As you see all
> > duplicates are removed. Can this dataframe be constructed in a fast
> > way?
> > 
> >   Col1 Col2 Col3 ... Col 159 Col 160 
> > Row 1       0    0    LD       0	    VD
> > Row 2     	HD   HD   0        0        MD
> > Row 3     	LD   0    HD       0        LD
> > 
> > Thx for helping me out.
> > Bert
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented,
> > minimal, self-contained, reproducible code.
> 
> Petr Pikal
> petr.pikal at precheza.cz
> 
> 
Petr Pikal
petr.pikal at precheza.cz

Bert Jacobs

2007-Jan-17 22:37 UTC

head link

[R] Fast Removing Duplicates from Every Column

Hi,

 

Working further on this dataframe : my_data

 

          Col1 Col2 Col3 ... Col 159 Col 160 

 Row 1      0     0     LD ... 0       VD 

 Row 2      HD    0     0      0       MD 

 Row 3      0     HD    HD     0       LD 

 Row 4      LD    HD    HD     0       0 

 ...        ...

 LastRow    HD    HD    LD     0       MD

 

Running this line of code:

Test = apply(X=my_data, MARGIN=2, FUN=unique)

 

I get this list:

 

$Col1

[1] "0" "HD" "LD"               

$Col2

[1] "0" "HD"

$Col3

[1] "LD" "0" "HD"

...

$Col159

[1] "0" 

$Col160

[1] "VD" "MD" "LD" "0"

 

Now I was wondering how I can get this list into a data.frame:

because a simple data.frame doesn't work (error: arguments imply differing
number of rows)

 

Can someone help me out on this. Thx

 

So that I get the following result:

           Col1 Col2 Col3 ... Col 159 Col 160 

 Row 1       0   0    LD       0        VD

 Row 2     HD   HD   0        0        MD

 Row 3     LD   0    HD       0        LD

 Row 4      0    0    0        0        0

 

 

 

 

-----Original Message-----
From: Petr Pikal [mailto:petr.pikal@precheza.cz] 
Sent: 05 January 2007 11:51
To: Bert Jacobs; 'R help list'
Subject: Re: [R] Fast Removing Duplicates from Every Column

 

Hi

 

I am not sure if I understand how do you want to select unique items.

 

with

 sapply(DF, function(x) !duplicated(x))

you can get data frame with TRUE when an item in particular column is 

unique and FALSE in opposite. However then you need to choose which 

rows to keep or discard

 

e.g.

 

DF[rowSums(sapply(comp, function(x) !duplicated(x)))>1,]

 

selects all rows in which are 2 or more unique values.

 

HTH

Petr

 

 

On 5 Jan 2007 at 9:54, Bert Jacobs wrote:

 

From:             "Bert Jacobs" <b.jacobs@pandora.be>

To:               "'R help list'"
<r-help@stat.math.ethz.ch>

Date sent:        Fri, 5 Jan 2007 09:54:17 +0100

Subject:          Re: [R] Fast Removing Duplicates from Every Column

 
> Hi,
> 
> I'm looking for some lines of code that does the following:
> I have a dataframe with 160 Columns and a number of rows (max 30):
> 
>          Col1 Col2 Col3 ... Col 159 Col 160 
> Row 1     0     0     LD ... 0       VD 
> Row 2     HD    0     0      0       MD 
> Row 3     0     HD    HD     0       LD 
> Row 4     LD    HD    HD     0       0 
> ...       ...
> LastRow   HD    HD    LD     0       MD
> 
> 
> Now I want a dataframe that looks like this. As you see all duplicates
> are removed. Can this dataframe be constructed in a fast way?
> 
>   Col1 Col2 Col3 ... Col 159 Col 160 
> Row 1       0    0    LD       0      VD
> Row 2           HD   HD   0        0        MD
> Row 3           LD   0    HD       0        LD
> 
> Thx for helping me out.
> Bert
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.
 

Petr Pikal

petr.pikal@precheza.cz

 


	[[alternative HTML version deleted]]

Bert Jacobs

2007-Jan-18 12:21 UTC

head link

[R] Fast Removing Duplicates from Every Column

Thx Jim. The Code works perfect when I run it in a plain way.

It gives a dataframe with 160 columns (like it should) 

 

But when I run the code in a function it gives a dataframe with 161 columns
and the order of the columns has changed into an alfabethical order. Do you
know why this occurs?

 

 

Selection = function ()

{

TestUnique <- apply(Selection,2,unique)

MaxLen <- max(sapply(TestUnique,length))

TestUnique <- lapply(TestUnique,function(x)

{

c(x,rep('0',MaxLen-length(x)))

})

Selection.Unique <<- data.frame(TestUnique)

}

 

  _____  

From: jim holtman [mailto:jholtman@gmail.com] 
Sent: 18 January 2007 02:28
To: Bert Jacobs
Cc: Petr Pikal; R help list
Subject: Re: [R] Fast Removing Duplicates from Every Column

 

Here is one way of doing it by 'padding' all the elements to the same
length:

 
>  x <- "Col1 Col2 Col3  Col159 Col160+  Row1      0     0     LD  0       VD
+  Row2      HD    0     0      0       MD
+  Row3      0     HD    HD     0       LD
+  Row4      LD    HD    HD     0       0 
+  LastRow    HD    HD    LD     0       MD">  input <- read.table(textConnection(x), header=TRUE)
> Uniq <- apply(input, 2, unique)
> # find maximum length of an element
> maxLen <- max(sapply(Uniq, length)) 
> # pad with '0' all element to maxLen
> Uniq <- lapply(Uniq, function(x){+     c(x, rep('0', maxLen - length(x)))
+ })> as.data.frame(Uniq)  Col1 Col2 Col3 Col159 Col160
1    0    0   LD      0     VD
2   HD   HD    0      0     MD
3   LD    0   HD      0     LD
4    0    0    0      0      0


 

On 1/17/07, Bert Jacobs <b.jacobs@pandora.be> wrote: 

Hi,



Working further on this dataframe : my_data



         Col1 Col2 Col3 ... Col 159 Col 160 

Row 1      0     0     LD ... 0       VD

Row 2      HD    0     0      0       MD

Row 3      0     HD    HD     0       LD

Row 4      LD    HD    HD     0       0

...        ...

LastRow    HD    HD    LD     0       MD



Running this line of code:

Test = apply(X=my_data, MARGIN=2, FUN=unique)



I get this list:



$Col1

[1] "0" "HD" "LD" 

$Col2

[1] "0" "HD"

$Col3

[1] "LD" "0" "HD"

...

$Col159

[1] "0"

$Col160

[1] "VD" "MD" "LD" "0" 



Now I was wondering how I can get this list into a data.frame:

because a simple data.frame doesn't work (error: arguments imply differing
number of rows)



Can someone help me out on this. Thx 



So that I get the following result:

          Col1 Col2 Col3 ... Col 159 Col 160

Row 1       0   0    LD       0        VD

Row 2     HD   HD   0        0        MD

Row 3     LD   0    HD       0        LD 

Row 4      0    0    0        0        0









-----Original Message-----
From: Petr Pikal [mailto:petr.pikal@precheza.cz]
Sent: 05 January 2007 11:51 
To: Bert Jacobs; 'R help list'
Subject: Re: [R] Fast Removing Duplicates from Every Column



Hi



I am not sure if I understand how do you want to select unique items.



with

sapply(DF, function(x) !duplicated(x))

you can get data frame with TRUE when an item in particular column is

unique and FALSE in opposite. However then you need to choose which

rows to keep or discard 



e.g.



DF[rowSums(sapply(comp, function(x) !duplicated(x)))>1,]



selects all rows in which are 2 or more unique values.



HTH

Petr





On 5 Jan 2007 at 9:54, Bert Jacobs wrote: 



From:             "Bert Jacobs" <b.jacobs@pandora.be>

To:               "'R help list'" <
<mailto:r-help@stat.math.ethz.ch>
r-help@stat.math.ethz.ch>

Date sent:        Fri, 5 Jan 2007 09:54:17 +0100

Subject:          Re: [R] Fast Removing Duplicates from Every Column


> Hi,
>
> I'm looking for some lines of code that does the following: 
> I have a dataframe with 160 Columns and a number of rows (max 30):
>
>          Col1 Col2 Col3 ... Col 159 Col 160
> Row 1     0     0     LD ... 0       VD
> Row 2     HD    0     0      0       MD 
> Row 3     0     HD    HD     0       LD
> Row 4     LD    HD    HD     0       0
> ...       ...
> LastRow   HD    HD    LD     0       MD
>
>
> Now I want a dataframe that looks like this. As you see all duplicates 
> are removed. Can this dataframe be constructed in a fast way?
>
>   Col1 Col2 Col3 ... Col 159 Col 160
> Row 1       0    0    LD       0      VD
> Row 2           HD   HD   0        0        MD 
> Row 3           LD   0    HD       0        LD
>
> Thx for helping me out.
> Bert
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.


Petr Pikal

petr.pikal@precheza.cz  <mailto:petr.pikal@precheza.cz> 




       [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve? 


	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Jan 2007 - Fast Removing Duplicates from Every Column

[R] Fast Removing Duplicates from Every Column

[R] Fast Removing Duplicates from Every Column

[R] Fast Removing Duplicates from Every Column

[R] Fast Removing Duplicates from Every Column

[R] Fast Removing Duplicates from Every Column

Seemingly Similar Threads