Ricardo Pietrobon
2008-Jun-09 16:45 UTC
[R] converting a data set to a format for time series analysis
I currently have a data set describing human subjects enrolled into an international clinical trial, the name of the hospital enrolling this human subject, the date when the subject was enrolled, and a vector with variables representing characteristics of the site (e.g., number of beds in a hospital). my data sets looks like this: subject hospital date_enrollment hospital_beds 1 hospitalA 1/3/2002 300 2 hospitalA 1/6/2002 300 3 hospitalB 2/4/2002 150 4 hospitalC 3/2/2002 200 to perform a time series analysis I am now trying to get to a format that would give me the following variables: month year site number_enrolled_subjects hospital_beds the data would be displayed on one-month intervals, and number of subjects clustered around sites. any help would be greatly appreciate thanks Ricardo
jim holtman
2008-Jun-09 17:04 UTC
[R] converting a data set to a format for time series analysis
Will something like this work for you:> x <- read.table(textConnection("subject hospital date_enrollmenthospital_beds + 1 hospitalA 1/3/2002 300 + 2 hospitalA 1/6/2002 300 + 3 hospitalB 2/4/2002 150 + 4 hospitalC 3/2/2002 200"), header=TRUE)> closeAllConnections() > y <- as.Date(x$date_enrollment, "%m/%d/%Y") > cbind(x, year=format(y, "%Y"), month=format(y, "%m"))subject hospital date_enrollment hospital_beds year month 1 1 hospitalA 1/3/2002 300 2002 01 2 2 hospitalA 1/6/2002 300 2002 01 3 3 hospitalB 2/4/2002 150 2002 02 4 4 hospitalC 3/2/2002 200 2002 03> >On Mon, Jun 9, 2008 at 12:45 PM, Ricardo Pietrobon <pietr007@gmail.com> wrote:> I currently have a data set describing human subjects enrolled into an > international clinical trial, the name of the hospital enrolling this > human subject, the date when the subject was enrolled, and a vector > with variables representing characteristics of the site (e.g., number > of beds in a hospital). my data sets looks like this: > > subject hospital date_enrollment hospital_beds > 1 hospitalA 1/3/2002 300 > 2 hospitalA 1/6/2002 300 > 3 hospitalB 2/4/2002 150 > 4 hospitalC 3/2/2002 200 > > to perform a time series analysis I am now trying to get to a format > that would give me the following variables: > > month year site number_enrolled_subjects hospital_beds > > the data would be displayed on one-month intervals, and number of > subjects clustered around sites. > > any help would be greatly appreciate > > thanks > > > Ricardo > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]
jim holtman
2008-Jun-10 00:54 UTC
[R] converting a data set to a format for time series analysis
Here is one way of doing it:> x <- read.table(textConnection("subject hospital date_enrollmenthospital_beds + 1 hospitalA 1/3/2002 300 + 2 hospitalA 1/6/2002 300 + 3 hospitalB 2/4/2002 150 + 4 hospitalC 3/2/2002 200"), header=TRUE)> closeAllConnections() > y <- as.Date(x$date_enrollment, "%m/%d/%Y") > z <- cbind(x, year=format(y, "%Y"), month=format(y, "%m")) > # partition the data > z.s <- split(z, list(z$year, z$month, z$hospital), drop=TRUE) > # now aggregate > do.call(rbind, lapply(z.s, function(a) data.frame(hospital=a$hospital[1],cases=nrow(a), + year=a$year[1], month=a$month[1], beds=a$hospital[1]))) hospital cases year month beds 2002.01.hospitalA hospitalA 2 2002 01 hospitalA 2002.02.hospitalB hospitalB 1 2002 02 hospitalB 2002.03.hospitalC hospitalC 1 2002 03 hospitalC> > >On Mon, Jun 9, 2008 at 12:45 PM, Ricardo Pietrobon <pietr007@gmail.com> wrote:> I currently have a data set describing human subjects enrolled into an > international clinical trial, the name of the hospital enrolling this > human subject, the date when the subject was enrolled, and a vector > with variables representing characteristics of the site (e.g., number > of beds in a hospital). my data sets looks like this: > > subject hospital date_enrollment hospital_beds > 1 hospitalA 1/3/2002 300 > 2 hospitalA 1/6/2002 300 > 3 hospitalB 2/4/2002 150 > 4 hospitalC 3/2/2002 200 > > to perform a time series analysis I am now trying to get to a format > that would give me the following variables: > > month year site number_enrolled_subjects hospital_beds > > the data would be displayed on one-month intervals, and number of > subjects clustered around sites. > > any help would be greatly appreciate > > thanks > > > Ricardo > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]]