Dear List, I ran into some problems with time-series-Data. Imagine a data-structure where observations (x) of test attendants (i) are made a four times (q) a year (y). The data is orderd the following way: I y q x 1 2006 1 1 1 2006 3 1 1 2006 4 1 1 2007 1 1 1 2007 2 1 1 2007 3 1 1 2007 4 1 2 2006 1 1 3 2007 1 1 3 2007 2 1 I am looking for a way to count the attendants that at least have attendend one time a year. In this case 2 persons, because i=2 has no observation in 2007. I thought about creating a subset with the duplicate function. But I can't find a way to control (i) and (y). subset(data, !duplicated(i[y])) Thanx so much Andreas Kunzler ____________________________ Bundeszahn?rztekammer (BZ?K) Chausseestra?e 13 10115 Berlin Tel.: 030 40005-113 Fax: 030 40005-119 E-Mail: a.kunzler at bzaek.de
On Thu, 11 Sep 2008, Kunzler, Andreas wrote:> Dear List, > > I ran into some problems with time-series-Data. > > Imagine a data-structure where observations (x) of test attendants (i) are made a four times (q) a year (y). The data is orderd the following way: > I y q x > 1 2006 1 1 > 1 2006 3 1 > 1 2006 4 1 > 1 2007 1 1 > 1 2007 2 1 > 1 2007 3 1 > 1 2007 4 1 > 2 2006 1 1 > 3 2007 1 1 > 3 2007 2 1 > > I am looking for a way to count the attendants that at least have > attendend one time a year. In this case 2 persons, because i=2 has no > observation in 2007.You might want to turn your data into an actual time series with one series per attendend and then aggregate. I've written a few short transformations based on the data above and using the "zoo" package. It's somewhat lengthy but might give you a few useful pointers. hth, Z ## read data x <- read.table(textConnection("I y q x 1 2006 1 1 1 2006 3 1 1 2006 4 1 1 2007 1 1 1 2007 2 1 1 2007 3 1 1 2007 4 1 2 2006 1 1 3 2007 1 1 3 2007 2 1"), header = TRUE) ## store year/qtr as "yearqtr" object library("zoo") x$yq <- as.yearqtr(x$y + (x$q-1)/4) x <- x[,-(2:3)] ## reshape data into wide format (one series per individual) x <- reshape(x, timevar = "I", idvar = "yq", direction = "wide") ## turn data into zoo series with zeros in quarters without observation z <- zoo(as.matrix(x[,-1]), x[,1]) z <- merge(zoo(,seq(from = start(z), to = end(z), by = 0.25)), z) z[is.na(z)] <- 0 ## aggregate from quarterly to annual observations zy <- aggregate(z, function(x) as.numeric(floor(x)), sum) ## aggregate over individuals rollapply(zy, 1, function(x) sum(x > 0), by.column = FALSE)> I thought about creating a subset with the duplicate function. But I > can't find a way to control (i) and (y). > > subset(data, !duplicated(i[y])) > > Thanx so much > > Andreas Kunzler > ____________________________ > Bundeszahn?rztekammer (BZ?K) > Chausseestra?e 13 > 10115 Berlin > > Tel.: 030 40005-113 > Fax: 030 40005-119 > > E-Mail: a.kunzler at bzaek.de > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
On Thu, Sep 11, 2008 at 3:37 AM, Kunzler, Andreas <a.kunzler at bzaek.de> wrote:> Dear List, > > I ran into some problems with time-series-Data. > > Imagine a data-structure where observations (x) of test attendants (i) are made a four times (q) a year (y). The data is orderd the following way: > I y q x > 1 2006 1 1 > 1 2006 3 1 > 1 2006 4 1 > 1 2007 1 1 > 1 2007 2 1 > 1 2007 3 1 > 1 2007 4 1 > 2 2006 1 1 > 3 2007 1 1 > 3 2007 2 1 > > I am looking for a way to count the attendants that at least have attendend one time a year. In this case 2 persons, because i=2 has no observation in 2007. >Don't you mean 1 person, not 2 persons, since - attendant 1 appears in both years but - attendant 2 appears only in 2006 - attendant 3 appears only in 2007 so only attendant 1 appears in both years, i.e. 1 person. Assuming DF is your data frame: u <- unique(DF[1:2]) with(u, sum(tapply(y, I, length) == length(unique(y)))) # 1> I thought about creating a subset with the duplicate function. But I can't find a way to control (i) and (y). > > subset(data, !duplicated(i[y])) > > Thanx so much > > Andreas Kunzler > ____________________________ > Bundeszahn?rztekammer (BZ?K) > Chausseestra?e 13 > 10115 Berlin > > Tel.: 030 40005-113 > Fax: 030 40005-119 > > E-Mail: a.kunzler at bzaek.de > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >