Hello List, Please look at the sample data frame below: ID? ? ? ? ?date1? ? ? ? ? ? ? date2? ? ? ? ? ? ?date3 1? ? 2015-10-08? ? 2015-12-17? ? 2015-07-23 2? ? 2016-01-16? ? NA? ? ? ? ? ? ? ? ?2015-10-08 3? ? 2016-08-01? ? NA? ? ? ? ? ? ? ? ?2017-01-10 3? ? 2017-01-10? ? NA? ? ? ? ? ? ? ? ?2016-01-16 4? ? 2016-01-19? ? 2016-02-24? ?2016-08-01 5? ? 2016-03-01? ? 2016-03-10? ?2016-01-19 This data frame was sorted by ID and date1. I need to set the column date3 as missing for the "last" record for each ID. In the sample data set, the ID 1, 2, 4 and 5?has one row only, so they can be consider as first and last records. the data3 can be set as missing. But the ID 3 has 2 rows. Since I sorted the data by ID and date1, the ID=3 and date1=2017-01-10 should be the last record only. I need to set date3=NA for this row only. the question is, how can I identify the "last" record and set it as NA in date3 column. Thank you, Kai [[alternative HTML version deleted]]
Bert Gunter
2021-Sep-09 19:20 UTC
[R] how to find "first" or "last" record after sort in R
Many ways to do this, of course, but if I understand correctly ?rle may be the simplest, because you already have the data sorted by ID. The following little example should give you the idea. It gets the index of the last row in each id,, which you can then use to assign NA's or whatever:> id <- c(1,2,2,2,3,4,5,5) > last.index <- cumsum(rle(test)$lengths) > last.index[1] 1 4 5 6 8 Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, Sep 9, 2021 at 12:00 PM Kai Yang via R-help <r-help at r-project.org> wrote:> > Hello List, > Please look at the sample data frame below: > > ID date1 date2 date3 > 1 2015-10-08 2015-12-17 2015-07-23 > > 2 2016-01-16 NA 2015-10-08 > 3 2016-08-01 NA 2017-01-10 > 3 2017-01-10 NA 2016-01-16 > 4 2016-01-19 2016-02-24 2016-08-01 > 5 2016-03-01 2016-03-10 2016-01-19 > This data frame was sorted by ID and date1. I need to set the column date3 as missing for the "last" record for each ID. In the sample data set, the ID 1, 2, 4 and 5 has one row only, so they can be consider as first and last records. the data3 can be set as missing. But the ID 3 has 2 rows. Since I sorted the data by ID and date1, the ID=3 and date1=2017-01-10 should be the last record only. I need to set date3=NA for this row only. > > the question is, how can I identify the "last" record and set it as NA in date3 column. > Thank you, > Kai > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
I am sure there are many good ways to do the task including taking the data.frame out into a list of data.frames and making the change to each by taking the nth row that matches nrow(it) and changing it and then recombining. What follows are several attempts leading up to one at the end I find is probably the best choice. I did the following sample demo using the dplyr package in the tidyverse but want to explain. My data was three small groups of 1 then 2 then 3. The second column in each had the same number as the group and it was unique for that group. If the last item can be a duplicate of another item, this method changes too much: library(dplyr) mydf <- tribble( ~grouper, ~val, 1, 1, 2, 1, 2, 2, 3, 1, 3, 2, 3, 3, ) mydf %>% group_by(grouper) %>% mutate(val2 = last(val), val=ifelse(val==val2,0,val)) The result is this:> mydf %>% group_by(grouper) %>% mutate(val2 = last(val), val=ifelse(val==val2,0,val))# A tibble: 6 x 3 # Groups: grouper [3] grouper val val2 <dbl> <dbl> <dbl> 1 1 0 1 2 2 1 2 3 2 0 2 4 3 1 3 5 3 2 3 6 3 0 3 Now obviously this introduced an extra temporary row called val2, which is easily removed by many methods like piping to select(-val2) ... But that is not needed as a shorter and more direct method is this: mydf %>% group_by(grouper) %>% mutate(val = ifelse(val==last(val), 0, val)) But some more research shows the helper functions that make this trivial. Recall you wanted the last row in each group altered, I think to have an NA in column. I used 0 above but can use NA just as easily or any constant. The functions are: n() gives the number of rows in the group. row_number() gives the number of the current row as the functionality is being applied, within that group. The condition being offered is that n() == row_number() so this version surgically changes just the last rows no matter what other rows contain. mydf %>% group_by(grouper) %>% mutate(val = ifelse(row_number() == n(), 0, val)) If you have no interest in using a package like this, someone else will likely point you to a way. I suspect using something like split() to make a list of data.frames then applying some functionality to each smaller data.frame to get the result then recombining it back. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Kai Yang via R-help Sent: Thursday, September 9, 2021 3:00 PM To: R-help Mailing List <r-help at r-project.org> Subject: [R] how to find "first" or "last" record after sort in R Hello List, Please look at the sample data frame below: ID date1 date2 date3 1 2015-10-08 2015-12-17 2015-07-23 2 2016-01-16 NA 2015-10-08 3 2016-08-01 NA 2017-01-10 3 2017-01-10 NA 2016-01-16 4 2016-01-19 2016-02-24 2016-08-01 5 2016-03-01 2016-03-10 2016-01-19 This data frame was sorted by ID and date1. I need to set the column date3 as missing for the "last" record for each ID. In the sample data set, the ID 1, 2, 4 and 5 has one row only, so they can be consider as first and last records. the data3 can be set as missing. But the ID 3 has 2 rows. Since I sorted the data by ID and date1, the ID=3 and date1=2017-01-10 should be the last record only. I need to set date3=NA for this row only. the question is, how can I identify the "last" record and set it as NA in date3 column. Thank you, Kai [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Richard O'Keefe
2021-Sep-11 01:20 UTC
[R] how to find "first" or "last" record after sort in R
Let's simplify this to consider a single vector, such as x <- c(1,1,1,2,2,3,3,3,3,4,5,5,5) in which equal elements are in contiguous blocks.> diff(x)[1] 0 0 1 0 1 0 0 0 1 1 0 0 Of course, there could be gaps, or the sequence might be descending instead of ascending. So> diff(x) != 0We are nearly there, but there is a problem. The last element of the vector is always the last element of a group, but it will never be reported, because there is no following element to compare it with. So> c(diff(x) != 0, TRUE)That gives us a logical vector which we can use in indexing.> w <- c(diff(x) != 0, TRUE) > x[w] <- NA > x[1] 1 1 NA 2 NA 3 3 3 NA NA 5 5 NA On Fri, 10 Sept 2021 at 07:00, Kai Yang via R-help <r-help at r-project.org> wrote:> > Hello List, > Please look at the sample data frame below: > > ID date1 date2 date3 > 1 2015-10-08 2015-12-17 2015-07-23 > > 2 2016-01-16 NA 2015-10-08 > 3 2016-08-01 NA 2017-01-10 > 3 2017-01-10 NA 2016-01-16 > 4 2016-01-19 2016-02-24 2016-08-01 > 5 2016-03-01 2016-03-10 2016-01-19 > This data frame was sorted by ID and date1. I need to set the column date3 as missing for the "last" record for each ID. In the sample data set, the ID 1, 2, 4 and 5 has one row only, so they can be consider as first and last records. the data3 can be set as missing. But the ID 3 has 2 rows. Since I sorted the data by ID and date1, the ID=3 and date1=2017-01-10 should be the last record only. I need to set date3=NA for this row only. > > the question is, how can I identify the "last" record and set it as NA in date3 column. > Thank you, > Kai > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.