thr3ads.net - R help - [R] get the rows so that there is no redundant element in a certain column [Oct 2010]

If this information is useful, please help other people find it:
Share via:

boshao zhang

2010-Oct-28 18:29 UTC

[R] get the rows so that there is no redundant element in a certain column

Dear everyone in the Mailing list:
 
It is easy to get the unique elements in a column. But I would like to get rid
of those rows that the elements of this column are redundant. Or sometimes, to
have a look at the rows that the elements of this column are redundant is also
important. I guess it boils down to throw out the index of the redundant
elements.
 
With millions of rows, how can I efficiently perform the task?
 
Thank you in advance.
 
Boshao


      
	[[alternative HTML version deleted]]

Dennis Murphy

2010-Oct-28 20:33 UTC

head link

[R] get the rows so that there is no redundant element in a certain column

Hi:

Is something like this what you were after?

x <- data.frame(A = sample(LETTERS[1:5], 1000, replace = TRUE),
                 B = rpois(1000, 50),
                 C = rnorm(1000))
x[unique(x$A), ]
  A  B          C
4 E 49 1.18424176
5 B 51 0.51911271
1 D 71 0.06266016
2 E 61 0.59862609
3 A 45 0.55970798

x[unique(x$B), ]   is a bit longer so is not included :)

HTH,
Dennis

On Thu, Oct 28, 2010 at 11:29 AM, boshao zhang <zboshao@yahoo.com> wrote:
> Dear everyone in the Mailing list:
>
> It is easy to get the unique elements in a column. But I would like to get
> rid of those rows that the elements of this column are redundant. Or
> sometimes, to have a look at the rows that the elements of this column are
> redundant is also important. I guess it boils down to throw out the index
of
> the redundant elements.
>
> With millions of rows, how can I efficiently perform the task?
>
> Thank you in advance.
>
> Boshao
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

Jorge Ivan Velez

2010-Oct-28 20:34 UTC

head link

[R] get the rows so that there is no redundant element in a certain column

Hi Boshao,

Check ?duplicated.

HTH,
Jorge


On Thu, Oct 28, 2010 at 2:29 PM, boshao zhang <> wrote:
> Dear everyone in the Mailing list:
>
> It is easy to get the unique elements in a column. But I would like to get
> rid of those rows that the elements of this column are redundant. Or
> sometimes, to have a look at the rows that the elements of this column are
> redundant is also important. I guess it boils down to throw out the index
of
> the redundant elements.
>
> With millions of rows, how can I efficiently perform the task?
>
> Thank you in advance.
>
> Boshao
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
	[[alternative HTML version deleted]]

Joshua Wiley

2010-Oct-28 20:40 UTC

head link

[R] get the rows so that there is no redundant element in a certain column

On Thu, Oct 28, 2010 at 11:29 AM, boshao zhang <zboshao at yahoo.com>
wrote:> Dear everyone in the Mailing list:
>
> It is easy to get the unique elements in a column. But I would like to get
rid of those rows that the elements of this column?are redundant. Or sometimes,
to have a look at the rows that the elements of this column?are redundant is
also important. I guess it boils down to throw out the index of the redundant
elements.
This is rather general, which indices do you want to throw out?  For
instance, suppose that "A" occurs in rows 1, 2, 3, 19, and 50.  Which
four do you throw out and which do you keep?  Do you always want to
keep the first?  The middle? The Last?

I would look at ?unique and ?duplicated for starters

unique() will keep the first instance, which may be fine for your
purposes (Dennis already gave an example of how to implement this).
If you need something else (e.g., the middle), you'll need something a
bit fancier.

HTH,

Josh
>
> With millions of rows, how can I efficiently perform the task?
>
> Thank you in advance.
>
> Boshao
>
>
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

Reasonably Related Threads

Search for more possibly parallel threads

R help - Oct 2010 - get the rows so that there is no redundant element in a certain column

[R] get the rows so that there is no redundant element in a certain column

[R] get the rows so that there is no redundant element in a certain column

[R] get the rows so that there is no redundant element in a certain column

[R] get the rows so that there is no redundant element in a certain column

Reasonably Related Threads