thr3ads.net - R help - [R] Extracing only Unique Rows based on only 1 Column [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Bryan M Hangartner

2010-Jan-16 22:04 UTC

[R] Extracing only Unique Rows based on only 1 Column

To Whomever is Interested,

I have spent several days searching the web, help files, the R wiki  
and the archives of this mailing list for a solution to this problem,  
but nonetheless I apologize in advance if I have missed something  
obvious.

The problem is this; I have a 5-column data frame with about 4.2  
million rows, and want to create a new (and hopefully much smaller)  
data frame that contains only the rows which have a unique value in  
the first column only. In other words, I do not care about the  
uniqueness of the values in the other four rows, only the uniqueness  
of the entries in the first row. The "unique" command does not seem to
have this option available, at least based on what I've read in the  
help file.

A simplified example matrix (designated as "traveltimes"):

ID Time1 Time2
1    3     4
1    4     7
2    3     5
2    5     6
3    4     5
3    2     8

When I use a command such as

matches <- unique(traveltimes, incomparables = FALSE, fromLast = FALSE)

I will end up with a 6-row matrix, exactly what I already have. What I  
would like to do is to remove the duplicate values in the column  
labeled "ID" and their associated Time1 and Time2 entries. This will  
give me a 3x3 matrix which contains only one instance of each "ID"  
variable. For the purposes of this particular problem, the uniqueness  
of the Time1 and Time2 rows is not relevant.

If this question is not clear enough please let me know. Thank you for  
your time.


-- 
Bryan Hangartner
hangartb at cecs.pdx.edu

Dennis Murphy

2010-Jan-16 22:14 UTC

head link

[R] Extracing only Unique Rows based on only 1 Column

Hi:

This question arose a few days ago. There are two simple ways to do this:
(i) using ddply in the
plyr package and (ii) using the firstobs() function in the doBy package.

(i)  library(plyr)
> ddply(x, .(ID), head, n = 1)  ID Time1 Time2
1  1     3     4
2  2     3     5
3  3     4     5

(ii) library(doBy)

 x[firstobs(x[, 1]), ]
  ID Time1 Time2
1  1     3     4
3  2     3     5
5  3     4     5

HTH,
Dennis

On Sat, Jan 16, 2010 at 2:04 PM, Bryan M Hangartner
<hangartb@cecs.pdx.edu>wrote:
> To Whomever is Interested,
>
> I have spent several days searching the web, help files, the R wiki and the
> archives of this mailing list for a solution to this problem, but
> nonetheless I apologize in advance if I have missed something obvious.
>
> The problem is this; I have a 5-column data frame with about 4.2 million
> rows, and want to create a new (and hopefully much smaller) data frame that
> contains only the rows which have a unique value in the first column only.
> In other words, I do not care about the uniqueness of the values in the
> other four rows, only the uniqueness of the entries in the first row. The
> "unique" command does not seem to have this option available, at
least based
> on what I've read in the help file.
>
> A simplified example matrix (designated as "traveltimes"):
>
> ID Time1 Time2
> 1    3     4
> 1    4     7
> 2    3     5
> 2    5     6
> 3    4     5
> 3    2     8
>
> When I use a command such as
>
> matches <- unique(traveltimes, incomparables = FALSE, fromLast = FALSE)
>
> I will end up with a 6-row matrix, exactly what I already have. What I
> would like to do is to remove the duplicate values in the column labeled
> "ID" and their associated Time1 and Time2 entries. This will give
me a 3x3
> matrix which contains only one instance of each "ID" variable.
For the
> purposes of this particular problem, the uniqueness of the Time1 and Time2
> rows is not relevant.
>
> If this question is not clear enough please let me know. Thank you for your
> time.
>
>
> --
> Bryan Hangartner
> hangartb@cecs.pdx.edu
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Gabor Grothendieck

2010-Jan-16 23:06 UTC

head link

[R] Extracing only Unique Rows based on only 1 Column

Try this where DF is your data frame:

subset(DF, !duplicated(ID))

or equivalently:

DF[!duplicated(DF$ID), ]


On Sat, Jan 16, 2010 at 5:04 PM, Bryan M Hangartner
<hangartb at cecs.pdx.edu> wrote:> To Whomever is Interested,
>
> I have spent several days searching the web, help files, the R wiki and the
> archives of this mailing list for a solution to this problem, but
> nonetheless I apologize in advance if I have missed something obvious.
>
> The problem is this; I have a 5-column data frame with about 4.2 million
> rows, and want to create a new (and hopefully much smaller) data frame that
> contains only the rows which have a unique value in the first column only.
> In other words, I do not care about the uniqueness of the values in the
> other four rows, only the uniqueness of the entries in the first row. The
> "unique" command does not seem to have this option available, at
least based
> on what I've read in the help file.
>
> A simplified example matrix (designated as "traveltimes"):
>
> ID Time1 Time2
> 1 ? ?3 ? ? 4
> 1 ? ?4 ? ? 7
> 2 ? ?3 ? ? 5
> 2 ? ?5 ? ? 6
> 3 ? ?4 ? ? 5
> 3 ? ?2 ? ? 8
>
> When I use a command such as
>
> matches <- unique(traveltimes, incomparables = FALSE, fromLast = FALSE)
>
> I will end up with a 6-row matrix, exactly what I already have. What I
would
> like to do is to remove the duplicate values in the column labeled
"ID" and
> their associated Time1 and Time2 entries. This will give me a 3x3 matrix
> which contains only one instance of each "ID" variable. For the
purposes of
> this particular problem, the uniqueness of the Time1 and Time2 rows is not
> relevant.
>
> If this question is not clear enough please let me know. Thank you for your
> time.
>
>
> --
> Bryan Hangartner
> hangartb at cecs.pdx.edu
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Jan 2010 - Extracing only Unique Rows based on only 1 Column

[R] Extracing only Unique Rows based on only 1 Column

[R] Extracing only Unique Rows based on only 1 Column

[R] Extracing only Unique Rows based on only 1 Column

Seemingly Similar Threads