thr3ads.net - R help - [R] Countvariable for id by date [Aug 2007]

If this information is useful, please help other people find it:
Share via:

David Gyllenberg

2007-Aug-09 09:21 UTC

[R] Countvariable for id by date

Best R-users, 
       
      Here’s a  newbie question. I have tried to find an answer to this via help
and the “ave(x,factor(),FUN=function(y)  rank (z,tie=’first’)”-function, but
without success.
       
      I have a dataframe  (~8000 observations, registerdata) with four columns:
id, dg1, dg2 and date(YYYY-MM-DD)  of interest:
       
      id;dg1;dg2;date;
      1;F28;;1997-11-04;
      1;F20;F702;1998-11-09;
      1;F20;;1997-12-03;
      1;F208;;2001-03-18;
      2;F32;;1999-03-07;
      2;F29;F32;2000-01-06;
      2;F32;;2003-07-05;
      2;F323;F2800;2000-02-05;
      ...
       
      I would  like o have two additional columns:
      1. “countF20”:  a “countvariable” that shows which in order (by date) the
id has if it fulfils  the following logical expression: dg1 = F20* OR dg2 =
F20*,
      where *  means F201,F202... F2001,F2002...F20001,F20002...
      2. “countF2129”:  another “countvariable” that shows which in order (by
date) the id has if it fulfils  the following logical expression: dg1 =
F21*-F29* OR dg2 = F21*-F29*,
      where F21*-F29*  means F21*, F22*...F29* and
      where *  means F211,F212... F2101,F2102...F21001,F21002...
       
      ... so the  dataframe would look like this, where 1 is the first
observation for the id with  the right condition, 2 is the second etc.:
       
      id;dg1;dg2;date;countF20;countF2129;
      1;F28;;1997-11-04;;1;
      1;F20;F702;1998-11-09;2;;
      1;F20;;1997-12-03;1;;
      1;F208;;2001-03-18;3;;
      2;F32;;1999-03-07;;;
      2;F29;F32;2000-01-06;;1;
      2;F32;;2003-07-05;;;
      2;F323;F2800;2000-02-05;;2;
      ...
       
      Do you know  a convenient way to create these kind of “countvariables”?
Thank you in  advance!
       
      / David (david.gyllenberg  at  yahoo.com
    
       
---------------------------------
Park yourself in front of a world of choices in alternative vehicles.

	[[alternative HTML version deleted]]

jim holtman

2007-Aug-09 11:53 UTC

head link

[R] Countvariable for id by date

This should do what you want:
> x <- read.table(textConnection("id;dg1;dg2;date;+      1;F28;;1997-11-04;
+      1;F20;F702;1998-11-09;
+      1;F20;;1997-12-03;
+      1;F208;;2001-03-18;
+      2;F32;;1999-03-07;
+      2;F29;F32;2000-01-06;
+      2;F32;;2003-07-05;
+      2;F323;F2800;2000-02-05;"), header=TRUE, sep=";",
as.is=TRUE)> # convert dates
> x$dateP <- unclass(as.POSIXct(x$date))
> # matches for F20
> F20 <- grep("F20", paste(x$dg1, x$dg2))
> # matches for F21 - F29
> F21 <- grep("F2[1-9]", paste(x$dg1, x$dg2))
> # grouping
> x$F20 <- x$F21 <- NA
> x$F20[F20] <- rank(x$dateP[F20])
> x$F21[F21] <- rank(x$dateP[F21])
> x  id  dg1   dg2       date  X      dateP F21 F20
1  1  F28       1997-11-04 NA  878601600   1  NA
2  1  F20  F702 1998-11-09 NA  910569600  NA   2
3  1  F20       1997-12-03 NA  881107200  NA   1
4  1 F208       2001-03-18 NA  984873600  NA   3
5  2  F32       1999-03-07 NA  920764800  NA  NA
6  2  F29   F32 2000-01-06 NA  947116800   2  NA
7  2  F32       2003-07-05 NA 1057363200  NA  NA
8  2 F323 F2800 2000-02-05 NA  949708800   3  NA


On 8/9/07, David Gyllenberg <david.gyllenberg at yahoo.com>
wrote:>    Best R-users,
>
>      Here's a  newbie question. I have tried to find an answer to this
via help and the "ave(x,factor(),FUN=function(y)  rank
(z,tie='first')"-function, but without success.
>
>      I have a dataframe  (~8000 observations, registerdata) with four
columns: id, dg1, dg2 and date(YYYY-MM-DD)  of interest:
>
>      id;dg1;dg2;date;
>      1;F28;;1997-11-04;
>      1;F20;F702;1998-11-09;
>      1;F20;;1997-12-03;
>      1;F208;;2001-03-18;
>      2;F32;;1999-03-07;
>      2;F29;F32;2000-01-06;
>      2;F32;;2003-07-05;
>      2;F323;F2800;2000-02-05;
>      ...
>
>      I would  like o have two additional columns:
>      1. "countF20":  a "countvariable" that shows which
in order (by date) the id has if it fulfils  the following logical expression:
dg1 = F20* OR dg2 = F20*,
>      where *  means F201,F202... F2001,F2002...F20001,F20002...
>      2. "countF2129":  another "countvariable" that
shows which in order (by date) the id has if it fulfils  the following logical
expression: dg1 = F21*-F29* OR dg2 = F21*-F29*,
>      where F21*-F29*  means F21*, F22*...F29* and
>      where *  means F211,F212... F2101,F2102...F21001,F21002...
>
>      ... so the  dataframe would look like this, where 1 is the first
observation for the id with  the right condition, 2 is the second etc.:
>
>      id;dg1;dg2;date;countF20;countF2129;
>      1;F28;;1997-11-04;;1;
>      1;F20;F702;1998-11-09;2;;
>      1;F20;;1997-12-03;1;;
>      1;F208;;2001-03-18;3;;
>      2;F32;;1999-03-07;;;
>      2;F29;F32;2000-01-06;;1;
>      2;F32;;2003-07-05;;;
>      2;F323;F2800;2000-02-05;;2;
>      ...
>
>      Do you know  a convenient way to create these kind of
"countvariables"? Thank you in  advance!
>
>      / David (david.gyllenberg  at  yahoo.com
>
>
> ---------------------------------
> Park yourself in front of a world of choices in alternative vehicles.
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Gabor Grothendieck

2007-Aug-09 11:59 UTC

head link

[R] Countvariable for id by date

Try this:

Lines <- "id;dg1;dg2;date;
1;F28;;1997-11-04;
1;F20;F702;1998-11-09;
1;F20;;1997-12-03;
1;F208;;2001-03-18;
2;F32;;1999-03-07;
2;F29;F32;2000-01-06;
2;F32;;2003-07-05;
2;F323;F2800;2000-02-05;
"

# replace textConnection(Lines) with actual file name
DF <- read.csv2(textConnection(Lines), as.is = TRUE,
 colClasses = list("numeric", "character",
"character", "Date", NULL))

rk <- function(x, pat) {
  z <- regexpr(pat, x$dg1) > 0 | regexpr(pat, x$dg2) > 0
  rank(ifelse(z, x$date, NA), na.last = "keep")
}

DF$countF20 <- unlist(by(DF, DF$id, rk, pat = "^F20"))
DF$countF2129 <- unlist(by(DF, DF$id, rk, pat = "^F2[1-9]"))
DF




On 8/9/07, David Gyllenberg <david.gyllenberg at yahoo.com>
wrote:>    Best R-users,
>
>      Here's a  newbie question. I have tried to find an answer to this
via help and the "ave(x,factor(),FUN=function(y)  rank
(z,tie='first')"-function, but without success.
>
>      I have a dataframe  (~8000 observations, registerdata) with four
columns: id, dg1, dg2 and date(YYYY-MM-DD)  of interest:
>
>      id;dg1;dg2;date;
>      1;F28;;1997-11-04;
>      1;F20;F702;1998-11-09;
>      1;F20;;1997-12-03;
>      1;F208;;2001-03-18;
>      2;F32;;1999-03-07;
>      2;F29;F32;2000-01-06;
>      2;F32;;2003-07-05;
>      2;F323;F2800;2000-02-05;
>      ...
>
>      I would  like o have two additional columns:
>      1. "countF20":  a "countvariable" that shows which
in order (by date) the id has if it fulfils  the following logical expression:
dg1 = F20* OR dg2 = F20*,
>      where *  means F201,F202... F2001,F2002...F20001,F20002...
>      2. "countF2129":  another "countvariable" that
shows which in order (by date) the id has if it fulfils  the following logical
expression: dg1 = F21*-F29* OR dg2 = F21*-F29*,
>      where F21*-F29*  means F21*, F22*...F29* and
>      where *  means F211,F212... F2101,F2102...F21001,F21002...
>
>      ... so the  dataframe would look like this, where 1 is the first
observation for the id with  the right condition, 2 is the second etc.:
>
>      id;dg1;dg2;date;countF20;countF2129;
>      1;F28;;1997-11-04;;1;
>      1;F20;F702;1998-11-09;2;;
>      1;F20;;1997-12-03;1;;
>      1;F208;;2001-03-18;3;;
>      2;F32;;1999-03-07;;;
>      2;F29;F32;2000-01-06;;1;
>      2;F32;;2003-07-05;;;
>      2;F323;F2800;2000-02-05;;2;
>      ...
>
>      Do you know  a convenient way to create these kind of
"countvariables"? Thank you in  advance!
>
>      / David (david.gyllenberg  at  yahoo.com
>
>
> ---------------------------------
> Park yourself in front of a world of choices in alternative vehicles.
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Possibly Parallel Threads

Search for more maybe matching threads

R help - Aug 2007 - Countvariable for id by date

[R] Countvariable for id by date

[R] Countvariable for id by date

[R] Countvariable for id by date

Possibly Parallel Threads