thr3ads.net - R help - [R] flag records [Apr 2022]

If this information is useful, please help other people find it:
Share via:

Bert Gunter

2022-Apr-27 18:06 UTC

[R] flag records

OK. I may completely misunderstand. If you are happy with what Rui and/or
others have given you, **read no further**, as it will just be noise.

Otherwise, I don't think tapply()/ave() etc. will do quite what you want
splitting by the list of separate factors -- or at least not easily. I
think it's simplest just to start by creating a single factor with just the
combinations of levels you have. It turns out here that because you are
ordering lexicographically -- which is what the factor() function will do
by default -- this will make it easy to get back your results in exactly
the form you want. It could be a bit of a hassle if this were not the case.

So first, starting from your already sorted DF3> fac <- with(DF3, factor(paste(State, name, day, sep = '.')))
## which gives:> fac [1] CA.A.1 CA.A.2 CA.A.2 CA.A.2 CA.A.2 FL.B.3 FL.B.3 FL.B.3 FL.B.3 FL.B.3
[11] FL.B.4 FL.B.4
Levels: CA.A.1 CA.A.2 FL.B.3 FL.B.4  ## ordering the same as in DF3

Next I wrote a little function that I think applies the logic you
specified, yielding TRUE for when you want to raise the flag and FALSE if
not:

yourfun <- function(text, day){
   len <- length(text)
   if(len == 1) FALSE ## only 1 record in the group
   else c(FALSE, (diff(day) < 50 & text[-1] == text[-len]))
   ## first record gets FALSE as there are none previous
}

Then I use the by() function to apply this groupwise as you specified:
> flag <-with(DF3,+       by(DF3, fac, function(x)foo(x$text,x$day))
+ )
## Here's what you get> flagfac: CA.A.1
[1] FALSE
-------------------------------------------------------
fac: CA.A.2
[1] FALSE  TRUE FALSE FALSE
-------------------------------------------------------
fac: FL.B.3
[1] FALSE FALSE FALSE FALSE FALSE
-------------------------------------------------------
fac: FL.B.4
[1] FALSE  TRUE

This is class "by", essentially a list. So one can use do.call() and
an
implicit cast to numeric to get the x,y  flags that you specified:
> flag <- c("y", "x")[do.call(c, flag) + 1]
## yielding> flag [1] "y" "y" "x" "y" "y"
"y" "y" "y" "y" "y"
"y" "x"

(you could also use the within() function to do this within DF3 and return
the modified DF)

HTH,

Bert




On Tue, Apr 26, 2022 at 10:53 PM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> Maybe something like the following will do it.
> In the ave function, don't forget that diff returns a vector of a
> different length, one less element. So combine with an initial zero.
> Then 1 + FALSE/TRUE equals 1/2 and subset the target vector
c("Y", "X")
> with these indices.
>
>
> i_ddiff <- with(DF3, ave(as.numeric(ddate), State, name, day, FUN = \(x)
> c(0L, diff(x))) < 50)
> DF3$ddiff <- c("Y", "X")[1L + i_ddiff]
>
>
> An alternative is to assign a default "Y" to the new column and
then
> assign "X" where the condition is TRUE. This is easier to read.
>
>
> DF3$ddiff <- "Y"
> DF3$ddiff[i_ddiff] <- "X"
>
>
> Hope this helps,
>
> Rui Barradas
>
> ?s 23:17 de 26/04/2022, Val escreveu:
> > Hi All,
> >
> > I want to flag a record based on the following condition.
> > The variables  in the sample data are
> > State, name, day, text, ddate
> >
> > Sort the data by State, name, day ddate,
> >
> > Within  State, name, day
> >      assign consecutive number for each row
> >      find the date difference between consecutive rows,
> >      if the difference is less than 50 days and the text string in
> > previous and current rows  are the same then flag the record as X,
> > otherwise Y.
> >
> > Here is  sample data and my attempt,
> >
> > DF<-read.table(text="State name day text ddate
> >    CA A 1 xch 2014/09/16
> >    CA A 2 xck 2015/5/29
> >    CA A 2 xck 2015/6/18
> >    CA A 2 xcm 2015/8/3
> >    CA A 2 xcj 2015/8/26
> >    FL B 3 xcu  2017/7/23
> >    FL B 3 xcl  2017/7/03
> >    FL B 3 xmc  2017/7/26
> >    FL B 3 xca  2017/3/17
> >    FL B 3 xcb  2017/4/8
> >    FL B 4 xhh  2017/3/17
> >    FL B 4 xhh  2017/1/29",header=TRUE)
> >
> >    DF$ddate   <- as.Date (as.Date(DF$ddate), 
format="%Y/%m/%d" )
> >    DF3         <-
DF[order(DF$State,DF$name,DF$day,xtfrm(DF$ddate)), ]
> >    DF3$C       <- with(DF3, ave(State, name, day, FUN = seq_along))
> >    DF3$diff    <- with(DF3, ave(as.integer(ddate), State, name,
day,
> > FUN = function(x) x - x[1]))
> >
> > I stopped here, how do I evaluate the previous and the current rows
> > text string and date difference?
> >
> > Desired result,
> >
> >
> >       State name day text      ddate C diff flag
> > 1     CA    A   1  xch 2014-09-16 1    0     y
> > 2     CA    A   2  xck 2015-05-29 1    0      y
> > 3     CA    A   2  xck 2015-06-18 2   20     x
> > 4     CA    A   2  xcm 2015-08-03 3   66    y
> > 5     CA    A   2  xcj 2015-08-26 4   89      y
> > 9     FL    B   3  xca 2017-03-17 1    0      y
> > 10    FL    B   3  xcb 2017-04-08 2   22    y
> > 7     FL    B   3  xcl 2017-07-03 3   108     y
> > 6     FL    B   3  xcu 2017-07-23 4  128    y
> > 8     FL    B   3  xmc 2017-07-26 5  131   y
> > 12    FL    B   4  xhh 2017-01-29 1    0     y
> > 11    FL    B   4  xhh 2017-03-17 2   47    x
> >
> >
> >
> > Thank you,
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Bert Gunter

2022-Apr-27 18:12 UTC

head link

[R] flag records

... and also, the with() is unnecessary:
flag <- by(DF3, fac, function(x)foo(x$text,x$day))
## will do.

Bert

On Wed, Apr 27, 2022 at 11:06 AM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
> OK. I may completely misunderstand. If you are happy with what Rui and/or
> others have given you, **read no further**, as it will just be noise.
>
> Otherwise, I don't think tapply()/ave() etc. will do quite what you
want
> splitting by the list of separate factors -- or at least not easily. I
> think it's simplest just to start by creating a single factor with just
the
> combinations of levels you have. It turns out here that because you are
> ordering lexicographically -- which is what the factor() function will do
> by default -- this will make it easy to get back your results in exactly
> the form you want. It could be a bit of a hassle if this were not the case.
>
> So first, starting from your already sorted DF3
> > fac <- with(DF3, factor(paste(State, name, day, sep =
'.')))
>
> ## which gives:
> > fac
>  [1] CA.A.1 CA.A.2 CA.A.2 CA.A.2 CA.A.2 FL.B.3 FL.B.3 FL.B.3 FL.B.3 FL.B.3
> [11] FL.B.4 FL.B.4
> Levels: CA.A.1 CA.A.2 FL.B.3 FL.B.4  ## ordering the same as in DF3
>
> Next I wrote a little function that I think applies the logic you
> specified, yielding TRUE for when you want to raise the flag and FALSE if
> not:
>
> yourfun <- function(text, day){
>    len <- length(text)
>    if(len == 1) FALSE ## only 1 record in the group
>    else c(FALSE, (diff(day) < 50 & text[-1] == text[-len]))
>    ## first record gets FALSE as there are none previous
> }
>
> Then I use the by() function to apply this groupwise as you specified:
>
> > flag <-with(DF3,
> +       by(DF3, fac, function(x)foo(x$text,x$day))
> + )
> ## Here's what you get
> > flag
> fac: CA.A.1
> [1] FALSE
> -------------------------------------------------------
> fac: CA.A.2
> [1] FALSE  TRUE FALSE FALSE
> -------------------------------------------------------
> fac: FL.B.3
> [1] FALSE FALSE FALSE FALSE FALSE
> -------------------------------------------------------
> fac: FL.B.4
> [1] FALSE  TRUE
>
> This is class "by", essentially a list. So one can use do.call()
and an
> implicit cast to numeric to get the x,y  flags that you specified:
>
> > flag <- c("y", "x")[do.call(c, flag) + 1]
>
> ## yielding
> > flag
>  [1] "y" "y" "x" "y" "y"
"y" "y" "y" "y" "y"
"y" "x"
>
> (you could also use the within() function to do this within DF3 and return
> the modified DF)
>
> HTH,
>
> Bert
>
>
>
>
> On Tue, Apr 26, 2022 at 10:53 PM Rui Barradas <ruipbarradas at
sapo.pt>
> wrote:
>
>> Hello,
>>
>> Maybe something like the following will do it.
>> In the ave function, don't forget that diff returns a vector of a
>> different length, one less element. So combine with an initial zero.
>> Then 1 + FALSE/TRUE equals 1/2 and subset the target vector
c("Y", "X")
>> with these indices.
>>
>>
>> i_ddiff <- with(DF3, ave(as.numeric(ddate), State, name, day, FUN =
\(x)
>> c(0L, diff(x))) < 50)
>> DF3$ddiff <- c("Y", "X")[1L + i_ddiff]
>>
>>
>> An alternative is to assign a default "Y" to the new column
and then
>> assign "X" where the condition is TRUE. This is easier to
read.
>>
>>
>> DF3$ddiff <- "Y"
>> DF3$ddiff[i_ddiff] <- "X"
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> ?s 23:17 de 26/04/2022, Val escreveu:
>> > Hi All,
>> >
>> > I want to flag a record based on the following condition.
>> > The variables  in the sample data are
>> > State, name, day, text, ddate
>> >
>> > Sort the data by State, name, day ddate,
>> >
>> > Within  State, name, day
>> >      assign consecutive number for each row
>> >      find the date difference between consecutive rows,
>> >      if the difference is less than 50 days and the text string in
>> > previous and current rows  are the same then flag the record as X,
>> > otherwise Y.
>> >
>> > Here is  sample data and my attempt,
>> >
>> > DF<-read.table(text="State name day text ddate
>> >    CA A 1 xch 2014/09/16
>> >    CA A 2 xck 2015/5/29
>> >    CA A 2 xck 2015/6/18
>> >    CA A 2 xcm 2015/8/3
>> >    CA A 2 xcj 2015/8/26
>> >    FL B 3 xcu  2017/7/23
>> >    FL B 3 xcl  2017/7/03
>> >    FL B 3 xmc  2017/7/26
>> >    FL B 3 xca  2017/3/17
>> >    FL B 3 xcb  2017/4/8
>> >    FL B 4 xhh  2017/3/17
>> >    FL B 4 xhh  2017/1/29",header=TRUE)
>> >
>> >    DF$ddate   <- as.Date (as.Date(DF$ddate), 
format="%Y/%m/%d" )
>> >    DF3         <-
DF[order(DF$State,DF$name,DF$day,xtfrm(DF$ddate)), ]
>> >    DF3$C       <- with(DF3, ave(State, name, day, FUN =
seq_along))
>> >    DF3$diff    <- with(DF3, ave(as.integer(ddate), State, name,
day,
>> > FUN = function(x) x - x[1]))
>> >
>> > I stopped here, how do I evaluate the previous and the current
rows
>> > text string and date difference?
>> >
>> > Desired result,
>> >
>> >
>> >       State name day text      ddate C diff flag
>> > 1     CA    A   1  xch 2014-09-16 1    0     y
>> > 2     CA    A   2  xck 2015-05-29 1    0      y
>> > 3     CA    A   2  xck 2015-06-18 2   20     x
>> > 4     CA    A   2  xcm 2015-08-03 3   66    y
>> > 5     CA    A   2  xcj 2015-08-26 4   89      y
>> > 9     FL    B   3  xca 2017-03-17 1    0      y
>> > 10    FL    B   3  xcb 2017-04-08 2   22    y
>> > 7     FL    B   3  xcl 2017-07-03 3   108     y
>> > 6     FL    B   3  xcu 2017-07-23 4  128    y
>> > 8     FL    B   3  xmc 2017-07-26 5  131   y
>> > 12    FL    B   4  xhh 2017-01-29 1    0     y
>> > 11    FL    B   4  xhh 2017-03-17 2   47    x
>> >
>> >
>> >
>> > Thank you,
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
	[[alternative HTML version deleted]]

R help - Apr 2022 - flag records

[R] flag records

[R] flag records