Dear r-helpers, I have a very simple question. Suppose my data is like id=c(rep(1,2),rep(2,2)) b=c(2,3,4,5) m=cbind(id,b)> mid b [1,] 1 2 [2,] 1 3 [3,] 2 4 [4,] 2 5 I wish to select the first observation for each id. That is, I want to quickly select two rows: id b 1 2 2 4 only. how should i do this? [[alternative HTML version deleted]]
Try the duplicated() function. As in m[!duplicated(id), ] -tgs On Wed, Apr 21, 2010 at 10:17 PM, gallon li <gallon.li@gmail.com> wrote:> Dear r-helpers, > > I have a very simple question. Suppose my data is like > > id=c(rep(1,2),rep(2,2)) > b=c(2,3,4,5) > m=cbind(id,b) > > > m > id b > [1,] 1 2 > [2,] 1 3 > [3,] 2 4 > [4,] 2 5 > I wish to select the first observation for each id. That is, I want to > quickly select two rows: > > id b > 1 2 > 2 4 > > only. how should i do this? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of gallon li > Sent: Wednesday, April 21, 2010 7:18 PM > To: r-help > Subject: [R] how to select the first observation only? > > Dear r-helpers, > > I have a very simple question. Suppose my data is like > > id=c(rep(1,2),rep(2,2)) > b=c(2,3,4,5) > m=cbind(id,b) > > > m > id b > [1,] 1 2 > [2,] 1 3 > [3,] 2 4 > [4,] 2 5 > I wish to select the first observation for each id. That is, I want to > quickly select two rows: > > id b > 1 2 > 2 4The following will quickly select the first row in each run of identical 'id's. If your data is sorted by 'id' then it solves your problem. > isFirstInRun <- function(x) c(TRUE, x[-1] != x[-length(x)]) > m[ isFirstInRun(m[,"id"]), , drop=FALSE] id b [1,] 1 2 [2,] 2 4 If the 'id' column contains NA's then you need to decide how a run of NA's should be handled. E.g., turning it into a factor with an NA in the levels: m[ isFirstInRun(factor(m[,"id"], exclude=NULL)), ] will select the first in a run of NA's and isNaOrTrue <- function(x) is.na(x) | x m[ isNaOrTrue(isFirstInRun(m[,"id"])), ] will treat each NA in 'id' as a unique value (a run of length 1). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > only. how should i do this? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Gallon, your question looks very homeworky. People on this list are not likely to help you unless you can demonstrate own effort (even if it failed), and the list is not for homework questions in case it is one. Where exactly are you stuck? Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of gallon li Sent: Wednesday, April 21, 2010 10:18 PM To: r-help Subject: [R] how to select the first observation only? Dear r-helpers, I have a very simple question. Suppose my data is like id=c(rep(1,2),rep(2,2)) b=c(2,3,4,5) m=cbind(id,b)> mid b [1,] 1 2 [2,] 1 3 [3,] 2 4 [4,] 2 5 I wish to select the first observation for each id. That is, I want to quickly select two rows: id b 1 2 2 4 only. how should i do this? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Apr 21, 2010, at 10:17 PM, gallon li wrote:> Dear r-helpers, > > I have a very simple question. Suppose my data is like > > id=c(rep(1,2),rep(2,2)) > b=c(2,3,4,5) > m=cbind(id,b) > >> m > id b > [1,] 1 2 > [2,] 1 3 > [3,] 2 4 > [4,] 2 5 > I wish to select the first observation for each id. That is, I want to > quickly select two rows: > > id b > 1 2 > 2 4 >> m[ !duplicated(id), ] id b [1,] 1 2 [2,] 2 4> only. how should i do this? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT