boshao zhang
2010-Oct-28 18:29 UTC
[R] get the rows so that there is no redundant element in a certain column
Dear everyone in the Mailing list: It is easy to get the unique elements in a column. But I would like to get rid of those rows that the elements of this column are redundant. Or sometimes, to have a look at the rows that the elements of this column are redundant is also important. I guess it boils down to throw out the index of the redundant elements. With millions of rows, how can I efficiently perform the task? Thank you in advance. Boshao [[alternative HTML version deleted]]
Dennis Murphy
2010-Oct-28 20:33 UTC
[R] get the rows so that there is no redundant element in a certain column
Hi: Is something like this what you were after? x <- data.frame(A = sample(LETTERS[1:5], 1000, replace = TRUE), B = rpois(1000, 50), C = rnorm(1000)) x[unique(x$A), ] A B C 4 E 49 1.18424176 5 B 51 0.51911271 1 D 71 0.06266016 2 E 61 0.59862609 3 A 45 0.55970798 x[unique(x$B), ] is a bit longer so is not included :) HTH, Dennis On Thu, Oct 28, 2010 at 11:29 AM, boshao zhang <zboshao@yahoo.com> wrote:> Dear everyone in the Mailing list: > > It is easy to get the unique elements in a column. But I would like to get > rid of those rows that the elements of this column are redundant. Or > sometimes, to have a look at the rows that the elements of this column are > redundant is also important. I guess it boils down to throw out the index of > the redundant elements. > > With millions of rows, how can I efficiently perform the task? > > Thank you in advance. > > Boshao > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Jorge Ivan Velez
2010-Oct-28 20:34 UTC
[R] get the rows so that there is no redundant element in a certain column
Hi Boshao, Check ?duplicated. HTH, Jorge On Thu, Oct 28, 2010 at 2:29 PM, boshao zhang <> wrote:> Dear everyone in the Mailing list: > > It is easy to get the unique elements in a column. But I would like to get > rid of those rows that the elements of this column are redundant. Or > sometimes, to have a look at the rows that the elements of this column are > redundant is also important. I guess it boils down to throw out the index of > the redundant elements. > > With millions of rows, how can I efficiently perform the task? > > Thank you in advance. > > Boshao > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Joshua Wiley
2010-Oct-28 20:40 UTC
[R] get the rows so that there is no redundant element in a certain column
On Thu, Oct 28, 2010 at 11:29 AM, boshao zhang <zboshao at yahoo.com> wrote:> Dear everyone in the Mailing list: > > It is easy to get the unique elements in a column. But I would like to get rid of those rows that the elements of this column?are redundant. Or sometimes, to have a look at the rows that the elements of this column?are redundant is also important. I guess it boils down to throw out the index of the redundant elements.This is rather general, which indices do you want to throw out? For instance, suppose that "A" occurs in rows 1, 2, 3, 19, and 50. Which four do you throw out and which do you keep? Do you always want to keep the first? The middle? The Last? I would look at ?unique and ?duplicated for starters unique() will keep the first instance, which may be fine for your purposes (Dennis already gave an example of how to implement this). If you need something else (e.g., the middle), you'll need something a bit fancier. HTH, Josh> > With millions of rows, how can I efficiently perform the task? > > Thank you in advance. > > Boshao > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/