Bob Green
2008-Apr-01 09:37 UTC
[R] filtering out duplicates & creating a dataframe with unique id
Hello, I am working on a dataframe that contains a number of duplicates (e.g a person may have more than one court appearance). There are 539 rows. If I run the code: > length(unique(Feb25$ Patient.Id)) this indicates there are 508 unique individuals. I have been unable to work out how to filter out rows where there is a duplicate id so that the resulting dataframe consists only of the one id per person, and this id, is the first one thartappears. I was also interested in creating a data frame that consisted of these removed duplicates. Any assistance with the code to do this is much appreciated, regards Bob Green
Dimitris Rizopoulos
2008-Apr-01 09:46 UTC
[R] filtering out duplicates & creating a dataframe with unique id
try the following: dat <- data.frame( id = gl(10, 5), y = rnorm(50), time = rep(1:5, 10), sex = gl(2, 25, labels = c("male", "female")), age = round(rep(runif(10, 18, 55), each = 5), 1) ) dat[tapply(row.names(dat), dat$id, head, n = 1), ] dat[!duplicated(dat$id), ] I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Bob Green" <bgreen at dyson.brisnet.org.au> To: <r-help at r-project.org> Sent: Tuesday, April 01, 2008 11:37 AM Subject: [R] filtering out duplicates & creating a dataframe with unique id> Hello, > > I am working on a dataframe that contains a number of duplicates > (e.g > a person may have more than one court appearance). There are 539 > rows. If I run the code: > > > length(unique(Feb25$ Patient.Id)) > > this indicates there are 508 unique individuals. I have been unable > to work out how to filter out rows where there is a duplicate id so > that the resulting dataframe consists only of the one id per person, > and this id, is the first one thartappears. > > I was also interested in creating a data frame that consisted of > these removed duplicates. > > Any assistance with the code to do this is much appreciated, > > > regards > > Bob Green > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm