Best R-users, Here’s a newbie question. I have tried to find an answer to this via help and the “ave(x,factor(),FUN=function(y) rank (z,tie=’first’)”-function, but without success. I have a dataframe (~8000 observations, registerdata) with four columns: id, dg1, dg2 and date(YYYY-MM-DD) of interest: id;dg1;dg2;date; 1;F28;;1997-11-04; 1;F20;F702;1998-11-09; 1;F20;;1997-12-03; 1;F208;;2001-03-18; 2;F32;;1999-03-07; 2;F29;F32;2000-01-06; 2;F32;;2003-07-05; 2;F323;F2800;2000-02-05; ... I would like o have two additional columns: 1. “countF20”: a “countvariable” that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F20* OR dg2 = F20*, where * means F201,F202... F2001,F2002...F20001,F20002... 2. “countF2129”: another “countvariable” that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F21*-F29* OR dg2 = F21*-F29*, where F21*-F29* means F21*, F22*...F29* and where * means F211,F212... F2101,F2102...F21001,F21002... ... so the dataframe would look like this, where 1 is the first observation for the id with the right condition, 2 is the second etc.: id;dg1;dg2;date;countF20;countF2129; 1;F28;;1997-11-04;;1; 1;F20;F702;1998-11-09;2;; 1;F20;;1997-12-03;1;; 1;F208;;2001-03-18;3;; 2;F32;;1999-03-07;;; 2;F29;F32;2000-01-06;;1; 2;F32;;2003-07-05;;; 2;F323;F2800;2000-02-05;;2; ... Do you know a convenient way to create these kind of “countvariables”? Thank you in advance! / David (david.gyllenberg at yahoo.com --------------------------------- Park yourself in front of a world of choices in alternative vehicles. [[alternative HTML version deleted]]
This should do what you want:> x <- read.table(textConnection("id;dg1;dg2;date;+ 1;F28;;1997-11-04; + 1;F20;F702;1998-11-09; + 1;F20;;1997-12-03; + 1;F208;;2001-03-18; + 2;F32;;1999-03-07; + 2;F29;F32;2000-01-06; + 2;F32;;2003-07-05; + 2;F323;F2800;2000-02-05;"), header=TRUE, sep=";", as.is=TRUE)> # convert dates > x$dateP <- unclass(as.POSIXct(x$date)) > # matches for F20 > F20 <- grep("F20", paste(x$dg1, x$dg2)) > # matches for F21 - F29 > F21 <- grep("F2[1-9]", paste(x$dg1, x$dg2)) > # grouping > x$F20 <- x$F21 <- NA > x$F20[F20] <- rank(x$dateP[F20]) > x$F21[F21] <- rank(x$dateP[F21]) > xid dg1 dg2 date X dateP F21 F20 1 1 F28 1997-11-04 NA 878601600 1 NA 2 1 F20 F702 1998-11-09 NA 910569600 NA 2 3 1 F20 1997-12-03 NA 881107200 NA 1 4 1 F208 2001-03-18 NA 984873600 NA 3 5 2 F32 1999-03-07 NA 920764800 NA NA 6 2 F29 F32 2000-01-06 NA 947116800 2 NA 7 2 F32 2003-07-05 NA 1057363200 NA NA 8 2 F323 F2800 2000-02-05 NA 949708800 3 NA On 8/9/07, David Gyllenberg <david.gyllenberg at yahoo.com> wrote:> Best R-users, > > Here's a newbie question. I have tried to find an answer to this via help and the "ave(x,factor(),FUN=function(y) rank (z,tie='first')"-function, but without success. > > I have a dataframe (~8000 observations, registerdata) with four columns: id, dg1, dg2 and date(YYYY-MM-DD) of interest: > > id;dg1;dg2;date; > 1;F28;;1997-11-04; > 1;F20;F702;1998-11-09; > 1;F20;;1997-12-03; > 1;F208;;2001-03-18; > 2;F32;;1999-03-07; > 2;F29;F32;2000-01-06; > 2;F32;;2003-07-05; > 2;F323;F2800;2000-02-05; > ... > > I would like o have two additional columns: > 1. "countF20": a "countvariable" that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F20* OR dg2 = F20*, > where * means F201,F202... F2001,F2002...F20001,F20002... > 2. "countF2129": another "countvariable" that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F21*-F29* OR dg2 = F21*-F29*, > where F21*-F29* means F21*, F22*...F29* and > where * means F211,F212... F2101,F2102...F21001,F21002... > > ... so the dataframe would look like this, where 1 is the first observation for the id with the right condition, 2 is the second etc.: > > id;dg1;dg2;date;countF20;countF2129; > 1;F28;;1997-11-04;;1; > 1;F20;F702;1998-11-09;2;; > 1;F20;;1997-12-03;1;; > 1;F208;;2001-03-18;3;; > 2;F32;;1999-03-07;;; > 2;F29;F32;2000-01-06;;1; > 2;F32;;2003-07-05;;; > 2;F323;F2800;2000-02-05;;2; > ... > > Do you know a convenient way to create these kind of "countvariables"? Thank you in advance! > > / David (david.gyllenberg at yahoo.com > > > --------------------------------- > Park yourself in front of a world of choices in alternative vehicles. > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Try this: Lines <- "id;dg1;dg2;date; 1;F28;;1997-11-04; 1;F20;F702;1998-11-09; 1;F20;;1997-12-03; 1;F208;;2001-03-18; 2;F32;;1999-03-07; 2;F29;F32;2000-01-06; 2;F32;;2003-07-05; 2;F323;F2800;2000-02-05; " # replace textConnection(Lines) with actual file name DF <- read.csv2(textConnection(Lines), as.is = TRUE, colClasses = list("numeric", "character", "character", "Date", NULL)) rk <- function(x, pat) { z <- regexpr(pat, x$dg1) > 0 | regexpr(pat, x$dg2) > 0 rank(ifelse(z, x$date, NA), na.last = "keep") } DF$countF20 <- unlist(by(DF, DF$id, rk, pat = "^F20")) DF$countF2129 <- unlist(by(DF, DF$id, rk, pat = "^F2[1-9]")) DF On 8/9/07, David Gyllenberg <david.gyllenberg at yahoo.com> wrote:> Best R-users, > > Here's a newbie question. I have tried to find an answer to this via help and the "ave(x,factor(),FUN=function(y) rank (z,tie='first')"-function, but without success. > > I have a dataframe (~8000 observations, registerdata) with four columns: id, dg1, dg2 and date(YYYY-MM-DD) of interest: > > id;dg1;dg2;date; > 1;F28;;1997-11-04; > 1;F20;F702;1998-11-09; > 1;F20;;1997-12-03; > 1;F208;;2001-03-18; > 2;F32;;1999-03-07; > 2;F29;F32;2000-01-06; > 2;F32;;2003-07-05; > 2;F323;F2800;2000-02-05; > ... > > I would like o have two additional columns: > 1. "countF20": a "countvariable" that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F20* OR dg2 = F20*, > where * means F201,F202... F2001,F2002...F20001,F20002... > 2. "countF2129": another "countvariable" that shows which in order (by date) the id has if it fulfils the following logical expression: dg1 = F21*-F29* OR dg2 = F21*-F29*, > where F21*-F29* means F21*, F22*...F29* and > where * means F211,F212... F2101,F2102...F21001,F21002... > > ... so the dataframe would look like this, where 1 is the first observation for the id with the right condition, 2 is the second etc.: > > id;dg1;dg2;date;countF20;countF2129; > 1;F28;;1997-11-04;;1; > 1;F20;F702;1998-11-09;2;; > 1;F20;;1997-12-03;1;; > 1;F208;;2001-03-18;3;; > 2;F32;;1999-03-07;;; > 2;F29;F32;2000-01-06;;1; > 2;F32;;2003-07-05;;; > 2;F323;F2800;2000-02-05;;2; > ... > > Do you know a convenient way to create these kind of "countvariables"? Thank you in advance! > > / David (david.gyllenberg at yahoo.com > > > --------------------------------- > Park yourself in front of a world of choices in alternative vehicles. > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >