Hi All, I want to flag a record based on the following condition. The variables in the sample data are State, name, day, text, ddate Sort the data by State, name, day ddate, Within State, name, day assign consecutive number for each row find the date difference between consecutive rows, if the difference is less than 50 days and the text string in previous and current rows are the same then flag the record as X, otherwise Y. Here is sample data and my attempt, DF<-read.table(text="State name day text ddate CA A 1 xch 2014/09/16 CA A 2 xck 2015/5/29 CA A 2 xck 2015/6/18 CA A 2 xcm 2015/8/3 CA A 2 xcj 2015/8/26 FL B 3 xcu 2017/7/23 FL B 3 xcl 2017/7/03 FL B 3 xmc 2017/7/26 FL B 3 xca 2017/3/17 FL B 3 xcb 2017/4/8 FL B 4 xhh 2017/3/17 FL B 4 xhh 2017/1/29",header=TRUE) DF$ddate <- as.Date (as.Date(DF$ddate), format="%Y/%m/%d" ) DF3 <- DF[order(DF$State,DF$name,DF$day,xtfrm(DF$ddate)), ] DF3$C <- with(DF3, ave(State, name, day, FUN = seq_along)) DF3$diff <- with(DF3, ave(as.integer(ddate), State, name, day, FUN = function(x) x - x[1])) I stopped here, how do I evaluate the previous and the current rows text string and date difference? Desired result, State name day text ddate C diff flag 1 CA A 1 xch 2014-09-16 1 0 y 2 CA A 2 xck 2015-05-29 1 0 y 3 CA A 2 xck 2015-06-18 2 20 x 4 CA A 2 xcm 2015-08-03 3 66 y 5 CA A 2 xcj 2015-08-26 4 89 y 9 FL B 3 xca 2017-03-17 1 0 y 10 FL B 3 xcb 2017-04-08 2 22 y 7 FL B 3 xcl 2017-07-03 3 108 y 6 FL B 3 xcu 2017-07-23 4 128 y 8 FL B 3 xmc 2017-07-26 5 131 y 12 FL B 4 xhh 2017-01-29 1 0 y 11 FL B 4 xhh 2017-03-17 2 47 x Thank you,
Sounds like homework. We try not to do homework here. See the posting guide linked below for details. (If not homework, let us know). Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Apr 26, 2022 at 3:17 PM Val <valkremk at gmail.com> wrote:> > Hi All, > > I want to flag a record based on the following condition. > The variables in the sample data are > State, name, day, text, ddate > > Sort the data by State, name, day ddate, > > Within State, name, day > assign consecutive number for each row > find the date difference between consecutive rows, > if the difference is less than 50 days and the text string in > previous and current rows are the same then flag the record as X, > otherwise Y. > > Here is sample data and my attempt, > > DF<-read.table(text="State name day text ddate > CA A 1 xch 2014/09/16 > CA A 2 xck 2015/5/29 > CA A 2 xck 2015/6/18 > CA A 2 xcm 2015/8/3 > CA A 2 xcj 2015/8/26 > FL B 3 xcu 2017/7/23 > FL B 3 xcl 2017/7/03 > FL B 3 xmc 2017/7/26 > FL B 3 xca 2017/3/17 > FL B 3 xcb 2017/4/8 > FL B 4 xhh 2017/3/17 > FL B 4 xhh 2017/1/29",header=TRUE) > > DF$ddate <- as.Date (as.Date(DF$ddate), format="%Y/%m/%d" ) > DF3 <- DF[order(DF$State,DF$name,DF$day,xtfrm(DF$ddate)), ] > DF3$C <- with(DF3, ave(State, name, day, FUN = seq_along)) > DF3$diff <- with(DF3, ave(as.integer(ddate), State, name, day, > FUN = function(x) x - x[1])) > > I stopped here, how do I evaluate the previous and the current rows > text string and date difference? > > Desired result, > > > State name day text ddate C diff flag > 1 CA A 1 xch 2014-09-16 1 0 y > 2 CA A 2 xck 2015-05-29 1 0 y > 3 CA A 2 xck 2015-06-18 2 20 x > 4 CA A 2 xcm 2015-08-03 3 66 y > 5 CA A 2 xcj 2015-08-26 4 89 y > 9 FL B 3 xca 2017-03-17 1 0 y > 10 FL B 3 xcb 2017-04-08 2 22 y > 7 FL B 3 xcl 2017-07-03 3 108 y > 6 FL B 3 xcu 2017-07-23 4 128 y > 8 FL B 3 xmc 2017-07-26 5 131 y > 12 FL B 4 xhh 2017-01-29 1 0 y > 11 FL B 4 xhh 2017-03-17 2 47 x > > > > Thank you, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Lots of people here are touchy about homework and I can understand that. But sharing code and saying where you are stuck is a plus for me as?compared to asking someone to do it all. So, without commenting on whether your shown code is right, assuming youhave a data.frame object sorted as described so that adjacent rows MIGHT be?what you want, then look into a function called lag() which may be in the stats?package or dplyr. OR, consider what it means to loop though an index from 2 to the number of rows?and how that lets you look at adjacent entries and even modify them. -----Original Message----- From: Val <valkremk at gmail.com> To: r-help at R-project.org (r-help at r-project.org) <r-help at r-project.org> Sent: Tue, Apr 26, 2022 6:17 pm Subject: [R] flag records Hi All, I want to flag a record based on the following condition. The variables? in the sample data are State, name, day, text, ddate Sort the data by State, name, day ddate, Within? State, name, day ? ? assign consecutive number for each row ? ? find the date difference between consecutive rows, ? ? if the difference is less than 50 days and the text string in previous and current rows? are the same then flag the record as X, otherwise Y. Here is? sample data and my attempt, DF<-read.table(text="State name day text ddate ? CA A 1 xch 2014/09/16 ? CA A 2 xck 2015/5/29 ? CA A 2 xck 2015/6/18 ? CA A 2 xcm 2015/8/3 ? CA A 2 xcj 2015/8/26 ? FL B 3 xcu? 2017/7/23 ? FL B 3 xcl? 2017/7/03 ? FL B 3 xmc? 2017/7/26 ? FL B 3 xca? 2017/3/17 ? FL B 3 xcb? 2017/4/8 ? FL B 4 xhh? 2017/3/17 ? FL B 4 xhh? 2017/1/29",header=TRUE) ? DF$ddate? <- as.Date (as.Date(DF$ddate),? format="%Y/%m/%d" ) ? DF3? ? ? ? <- DF[order(DF$State,DF$name,DF$day,xtfrm(DF$ddate)), ] ? DF3$C? ? ? <- with(DF3, ave(State, name, day, FUN = seq_along)) ? DF3$diff? ? <- with(DF3, ave(as.integer(ddate), State, name, day, FUN = function(x) x - x[1])) I stopped here, how do I evaluate the previous and the current rows text string and date difference? Desired result, ? ? State name day text? ? ? ddate C diff flag 1? ? CA? ? A? 1? xch 2014-09-16 1? ? 0? ? y 2? ? CA? ? A? 2? xck 2015-05-29 1? ? 0? ? ? y 3? ? CA? ? A? 2? xck 2015-06-18 2? 20? ? x 4? ? CA? ? A? 2? xcm 2015-08-03 3? 66? ? y 5? ? CA? ? A? 2? xcj 2015-08-26 4? 89? ? ? y 9? ? FL? ? B? 3? xca 2017-03-17 1? ? 0? ? ? y 10? ? FL? ? B? 3? xcb 2017-04-08 2? 22? ? y 7? ? FL? ? B? 3? xcl 2017-07-03 3? 108? ? y 6? ? FL? ? B? 3? xcu 2017-07-23 4? 128? ? y 8? ? FL? ? B? 3? xmc 2017-07-26 5? 131? y 12? ? FL? ? B? 4? xhh 2017-01-29 1? ? 0? ? y 11? ? FL? ? B? 4? xhh 2017-03-17 2? 47? ? x Thank you, ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Hello, Maybe something like the following will do it. In the ave function, don't forget that diff returns a vector of a different length, one less element. So combine with an initial zero. Then 1 + FALSE/TRUE equals 1/2 and subset the target vector c("Y", "X") with these indices. i_ddiff <- with(DF3, ave(as.numeric(ddate), State, name, day, FUN = \(x) c(0L, diff(x))) < 50) DF3$ddiff <- c("Y", "X")[1L + i_ddiff] An alternative is to assign a default "Y" to the new column and then assign "X" where the condition is TRUE. This is easier to read. DF3$ddiff <- "Y" DF3$ddiff[i_ddiff] <- "X" Hope this helps, Rui Barradas ?s 23:17 de 26/04/2022, Val escreveu:> Hi All, > > I want to flag a record based on the following condition. > The variables in the sample data are > State, name, day, text, ddate > > Sort the data by State, name, day ddate, > > Within State, name, day > assign consecutive number for each row > find the date difference between consecutive rows, > if the difference is less than 50 days and the text string in > previous and current rows are the same then flag the record as X, > otherwise Y. > > Here is sample data and my attempt, > > DF<-read.table(text="State name day text ddate > CA A 1 xch 2014/09/16 > CA A 2 xck 2015/5/29 > CA A 2 xck 2015/6/18 > CA A 2 xcm 2015/8/3 > CA A 2 xcj 2015/8/26 > FL B 3 xcu 2017/7/23 > FL B 3 xcl 2017/7/03 > FL B 3 xmc 2017/7/26 > FL B 3 xca 2017/3/17 > FL B 3 xcb 2017/4/8 > FL B 4 xhh 2017/3/17 > FL B 4 xhh 2017/1/29",header=TRUE) > > DF$ddate <- as.Date (as.Date(DF$ddate), format="%Y/%m/%d" ) > DF3 <- DF[order(DF$State,DF$name,DF$day,xtfrm(DF$ddate)), ] > DF3$C <- with(DF3, ave(State, name, day, FUN = seq_along)) > DF3$diff <- with(DF3, ave(as.integer(ddate), State, name, day, > FUN = function(x) x - x[1])) > > I stopped here, how do I evaluate the previous and the current rows > text string and date difference? > > Desired result, > > > State name day text ddate C diff flag > 1 CA A 1 xch 2014-09-16 1 0 y > 2 CA A 2 xck 2015-05-29 1 0 y > 3 CA A 2 xck 2015-06-18 2 20 x > 4 CA A 2 xcm 2015-08-03 3 66 y > 5 CA A 2 xcj 2015-08-26 4 89 y > 9 FL B 3 xca 2017-03-17 1 0 y > 10 FL B 3 xcb 2017-04-08 2 22 y > 7 FL B 3 xcl 2017-07-03 3 108 y > 6 FL B 3 xcu 2017-07-23 4 128 y > 8 FL B 3 xmc 2017-07-26 5 131 y > 12 FL B 4 xhh 2017-01-29 1 0 y > 11 FL B 4 xhh 2017-03-17 2 47 x > > > > Thank you, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.