Therneau, Terry M., Ph.D.
2021-Sep-10 12:13 UTC
[R] how to find "first" or "last" record after sort in R
I prefer the duplicated() function, since the final code will be clear to a future reader. (Particularly when I am that future reader). last <- !duplicated(mydata$ID, fromLast=TRUE) # point to the last ID for each subject mydata$data3[last] <- NA Terry T. (I read the list once a day in digest form, so am always a late reply.) On 9/10/21 5:00 AM, r-help-request at r-project.org wrote:> Hello List, > Please look at the sample data frame below: > > ID? ? ? ? ?date1? ? ? ? ? ? ? date2? ? ? ? ? ? ?date3 > 1? ? 2015-10-08? ? 2015-12-17? ? 2015-07-23 > > 2? ? 2016-01-16? ? NA? ? ? ? ? ? ? ? ?2015-10-08 > 3? ? 2016-08-01? ? NA? ? ? ? ? ? ? ? ?2017-01-10 > 3? ? 2017-01-10? ? NA? ? ? ? ? ? ? ? ?2016-01-16 > 4? ? 2016-01-19? ? 2016-02-24? ?2016-08-01 > 5? ? 2016-03-01? ? 2016-03-10? ?2016-01-19 > This data frame was sorted by ID and date1. I need to set the column date3 as missing for the "last" record for each ID. In the sample data set, the ID 1, 2, 4 and 5?has one row only, so they can be consider as first and last records. the data3 can be set as missing. But the ID 3 has 2 rows. Since I sorted the data by ID and date1, the ID=3 and date1=2017-01-10 should be the last record only. I need to set date3=NA for this row only. > > the question is, how can I identify the "last" record and set it as NA in date3 column. > Thank you, > Kai > [[alternative HTML version deleted]] >
Excellent function to use, Terry. I note when I used it on a vector (in this case the first column of a data.frame, it accepted last=TRUE as well a fromlast=TRUE, which I did not see documented. Used on a data.frame, that change fails as function duplicated.data.frame only passes along the fromlast keyword value. ? When given a problem, we sometimes use a hammer when existing functions are already there to help. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Therneau, Terry M., Ph.D. via R-help Sent: Friday, September 10, 2021 8:14 AM To: yangkai9999 at yahoo.com; R-help <r-help at R-project.org> Subject: Re: [R] how to find "first" or "last" record after sort in R I prefer the duplicated() function, since the final code will be clear to a future reader. (Particularly when I am that future reader). last <- !duplicated(mydata$ID, fromLast=TRUE) # point to the last ID for each subject mydata$data3[last] <- NA Terry T. (I read the list once a day in digest form, so am always a late reply.) On 9/10/21 5:00 AM, <mailto:r-help-request at r-project.org> r-help-request at r-project.org wrote:> Hello List,> Please look at the sample data frame below:>> ID date1 date2 date3> 1 2015-10-08 2015-12-17 2015-07-23>> 2 2016-01-16 NA 2015-10-08> 3 2016-08-01 NA 2017-01-10> 3 2017-01-10 NA 2016-01-16> 4 2016-01-19 2016-02-24 2016-08-01> 5 2016-03-01 2016-03-10 2016-01-19 This data frame was sorted> by ID and date1. I need to set the column date3 as missing for the "last" record for each ID. In the sample data set, the ID 1, 2, 4 and 5 has one row only, so they can be consider as first and last records. the data3 can be set as missing. But the ID 3 has 2 rows. Since I sorted the data by ID and date1, the ID=3 and date1=2017-01-10 should be the last record only. I need to set date3=NA for this row only.>> the question is, how can I identify the "last" record and set it as NA in date3 column.> Thank you,> Kai> [[alternative HTML version deleted]]>______________________________________________ <mailto:R-help at r-project.org> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see <https://stat.ethz.ch/mailman/listinfo/r-help> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide <http://www.R-project.org/posting-guide.html> http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]