I have a population of subjects each with a variable which has been captured at a baseline date. Then for many subjects (but not all) an intervention has occurred and the variable has changed at one or more time points after the baseline date. So my dataset consists of a subject ID (x), which may appear several times or just once, a measure (y), and a date of observation (z). I would like to be able to have some sort of animated plot with a slider representing time so I can show how the distribution of the variable has altered (say in a histogram or a box plot) from baseline up to the end of a period. I need each subject only to be counted once in this distribution using the measure recorded up to or including the current data on the slider. I have created a synthetic data set using the code below which kind of replicates the problem for a just a few data points over a couple of months. My real data set has about 30,000 subjects with multiple measures captured over 10 years. What I need for each date point is a summary chart such as a histogram, which shows me the distribution of my variable (in this case y) with just one observation per subject, that observation being the most up to date at the point at which the slider. I have tried to use the manipulate package, which I've used successfully for other simple applications, but hit two problems - firstly it doesn't like dates as a slider variable. I can work around this by making them numeric, but would like to work with dates if possible. Secondly I don't know how to restrict observations to the date on the slider - eg. Subject 100 has a baseline of 45.26, but on or after 26th April it becomes 56.96. So where the slider is set beyond this date I would want the earlier value for this subject to be excluded from the distribution. I'm not sure that manipulate was really made for this problem and perhaps I should be looking elsewhere. I guess a solution involves using the aggregate command to get the unique values at various time points and then using something like the TeachingDemos package? Not sure I'm on the right path with this though and as an R beginner I can't get to first base with this. I can't figure how to use aggregate to give me the value of y corresponding to the latest date z. I know this is basic stuff but I really can't see how to do it. My synthetic data can be generated like this (clunky I know, but I can't do this any slicker) #make my baseline observations for 100 subjects on 1st April 2013 set.seed(1) a<-data.frame(x=seq(1:100),y=rnorm(100,mean=50,sd=10),z=as.Date("2013-04-01")) #simulate 50 subsequent observations in the next 2 months resulting in some subjects having different future measurements of y Start <- as.Date("2013-04-02") End <- as.Date("2013-06-30") dates <- seq(from = Start, to = End, by = 1) set.seed(1) b<-data.frame(x=sample(0:100,50,replace=TRUE),y=rnorm(50,mean=50,sd=10),z=sample(dates,50,replace=FALSE)) #make one table of observations c<-merge(a,b,all=TRUE) Any suggestions much appreciated. Gavin. [[alternative HTML version deleted]]
Gavin, This works for numeric version of dates. I changed the name of your data frame from "c" to "df". Jean df <- c rm(a, Start, End, dates, b, c) library(rpanel) df$znum <- as.numeric(df$z) stopznum <- median(df$znum) if (interactive()) { hist.draw <- function(panel) { mydat <- data.frame(mypanel[c("x", "y", "znum")]) # select all records up to and including stopznum sub <- mydat[mydat$znum <= panel$stopznum, ] # select only the latest observation for each ID (x) # sort the data by ID (x) and date (znum) in reverse order sub2 <- sub[rev(order(sub$x, sub$znum)), ] # then choose the first observation matching each ID (x) sub3 <- sub2[match(unique(sub2$x), sub2$x), ] # draw a histogram of the selected measurements hist(sub3$y, xlab="Date", ylab="Measure", main=paste("Last observed up to and including", panel$stopznum)) panel } mypanel <- rp.control(title="Date Slider", x=df$x, y=df$y, znum=df$znum) rp.slider(panel=mypanel, variable=stopznum, from=min(df$znum), to=max(df$znum), action=hist.draw) } On Wed, May 15, 2013 at 1:21 PM, Gavin Rudge <g.rudge@bham.ac.uk> wrote:> I have a population of subjects each with a variable which has been > captured at a baseline date. Then for many subjects (but not all) an > intervention has occurred and the variable has changed at one or more time > points after the baseline date. So my dataset consists of a subject ID > (x), which may appear several times or just once, a measure (y), and a date > of observation (z). I would like to be able to have some sort of animated > plot with a slider representing time so I can show how the distribution of > the variable has altered (say in a histogram or a box plot) from baseline > up to the end of a period. I need each subject only to be counted once in > this distribution using the measure recorded up to or including the current > data on the slider. > > I have created a synthetic data set using the code below which kind of > replicates the problem for a just a few data points over a couple of > months. My real data set has about 30,000 subjects with multiple measures > captured over 10 years. What I need for each date point is a summary chart > such as a histogram, which shows me the distribution of my variable (in > this case y) with just one observation per subject, that observation being > the most up to date at the point at which the slider. > > I have tried to use the manipulate package, which I've used successfully > for other simple applications, but hit two problems - firstly it doesn't > like dates as a slider variable. I can work around this by making them > numeric, but would like to work with dates if possible. Secondly I don't > know how to restrict observations to the date on the slider - eg. Subject > 100 has a baseline of 45.26, but on or after 26th April it becomes 56.96. > So where the slider is set beyond this date I would want the earlier value > for this subject to be excluded from the distribution. I'm not sure that > manipulate was really made for this problem and perhaps I should be looking > elsewhere. > > I guess a solution involves using the aggregate command to get the unique > values at various time points and then using something like the > TeachingDemos package? Not sure I'm on the right path with this though > and as an R beginner I can't get to first base with this. I can't figure > how to use aggregate to give me the value of y corresponding to the latest > date z. I know this is basic stuff but I really can't see how to do it. > > My synthetic data can be generated like this (clunky I know, but I can't > do this any slicker) > > #make my baseline observations for 100 subjects on 1st April 2013 > set.seed(1) > > a<-data.frame(x=seq(1:100),y=rnorm(100,mean=50,sd=10),z=as.Date("2013-04-01")) > #simulate 50 subsequent observations in the next 2 months resulting in > some subjects having different future measurements of y > Start <- as.Date("2013-04-02") > End <- as.Date("2013-06-30") > dates <- seq(from = Start, to = End, by = 1) > set.seed(1) > > b<-data.frame(x=sample(0:100,50,replace=TRUE),y=rnorm(50,mean=50,sd=10),z=sample(dates,50,replace=FALSE)) > #make one table of observations > c<-merge(a,b,all=TRUE) > > Any suggestions much appreciated. > > Gavin. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Good! Glad to hear it. Post questions for further guidance to R-help. Jean On Thu, May 16, 2013 at 8:49 AM, Gavin Rudge <g.rudge@bham.ac.uk> wrote:> Hi Jean,**** > > ** ** > > Many thanks for your help. I’ve only been playing with your solution for > a few minutes and I’ve learned some really useful things. It’s kind of > working with a few errors. I’ll try to fix it myself, but I may come back > to you for a bit of further guidance.**** > > ** ** > > I’m still on the vertical part of my R learning curve, so every day’s a > school day right now!**** > > ** ** > > Kind regards,**** > > ** ** > > Gavin.**** > > ** ** > > ** ** > > Gavin Rudge, > Research Fellow, > Public Health, Epidemiology and Biostatistics Unit, > The College of Medical and Dental Sciences, > The University of Birmingham, > 90 Vincent Drive, > Edgbaston, > Birmingham, > B15 2SP, > United Kingdom. > 0044 (0)121 414 7852 > > > > **** > > ** ** > > ** ** > > **** > > ** ** > > *From:* Adams, Jean [mailto:jvadams@usgs.gov] > *Sent:* 16 May 2013 14:18 > *To:* Gavin Rudge > *Cc:* r-help@r-project.org > *Subject:* Re: [R] animating plots over time with a slider**** > > ** ** > > Gavin,**** > > ** ** > > This works for numeric version of dates. I changed the name of your data > frame from "c" to "df".**** > > ** ** > > Jean**** > > ** ** > > df <- c**** > > rm(a, Start, End, dates, b, c)**** > > library(rpanel)**** > > ** ** > > df$znum <- as.numeric(df$z)**** > > stopznum <- median(df$znum)**** > > ** ** > > if (interactive()) {**** > > hist.draw <- function(panel) {**** > > mydat <- data.frame(mypanel[c("x", "y", "znum")])* > *** > > # select all records up to and including stopznum* > *** > > sub <- mydat[mydat$znum <= panel$stopznum, ]**** > > # select only the latest observation for each ID > (x)**** > > # sort the data by ID (x) and date (znum) in > reverse order**** > > sub2 <- sub[rev(order(sub$x, sub$znum)), ]**** > > # then choose the first observation matching each > ID (x)**** > > sub3 <- sub2[match(unique(sub2$x), sub2$x), ]**** > > # draw a histogram of the selected measurements*** > * > > hist(sub3$y, xlab="Date", ylab="Measure", > main=paste("Last observed up to and including", panel$stopznum))**** > > panel**** > > }**** > > mypanel <- rp.control(title="Date Slider", x=df$x, y=df$y, > znum=df$znum)**** > > rp.slider(panel=mypanel, variable=stopznum, from=min(df$znum), > to=max(df$znum), action=hist.draw)**** > > }**** > > ** ** > > ** ** > > On Wed, May 15, 2013 at 1:21 PM, Gavin Rudge <g.rudge@bham.ac.uk> wrote:** > ** > > I have a population of subjects each with a variable which has been > captured at a baseline date. Then for many subjects (but not all) an > intervention has occurred and the variable has changed at one or more time > points after the baseline date. So my dataset consists of a subject ID > (x), which may appear several times or just once, a measure (y), and a date > of observation (z). I would like to be able to have some sort of animated > plot with a slider representing time so I can show how the distribution of > the variable has altered (say in a histogram or a box plot) from baseline > up to the end of a period. I need each subject only to be counted once in > this distribution using the measure recorded up to or including the current > data on the slider. > > I have created a synthetic data set using the code below which kind of > replicates the problem for a just a few data points over a couple of > months. My real data set has about 30,000 subjects with multiple measures > captured over 10 years. What I need for each date point is a summary chart > such as a histogram, which shows me the distribution of my variable (in > this case y) with just one observation per subject, that observation being > the most up to date at the point at which the slider. > > I have tried to use the manipulate package, which I've used successfully > for other simple applications, but hit two problems - firstly it doesn't > like dates as a slider variable. I can work around this by making them > numeric, but would like to work with dates if possible. Secondly I don't > know how to restrict observations to the date on the slider - eg. Subject > 100 has a baseline of 45.26, but on or after 26th April it becomes 56.96. > So where the slider is set beyond this date I would want the earlier value > for this subject to be excluded from the distribution. I'm not sure that > manipulate was really made for this problem and perhaps I should be looking > elsewhere. > > I guess a solution involves using the aggregate command to get the unique > values at various time points and then using something like the > TeachingDemos package? Not sure I'm on the right path with this though > and as an R beginner I can't get to first base with this. I can't figure > how to use aggregate to give me the value of y corresponding to the latest > date z. I know this is basic stuff but I really can't see how to do it. > > My synthetic data can be generated like this (clunky I know, but I can't > do this any slicker) > > #make my baseline observations for 100 subjects on 1st April 2013 > set.seed(1) > > a<-data.frame(x=seq(1:100),y=rnorm(100,mean=50,sd=10),z=as.Date("2013-04-01")) > #simulate 50 subsequent observations in the next 2 months resulting in > some subjects having different future measurements of y > Start <- as.Date("2013-04-02") > End <- as.Date("2013-06-30") > dates <- seq(from = Start, to = End, by = 1) > set.seed(1) > > b<-data.frame(x=sample(0:100,50,replace=TRUE),y=rnorm(50,mean=50,sd=10),z=sample(dates,50,replace=FALSE)) > #make one table of observations > c<-merge(a,b,all=TRUE) > > Any suggestions much appreciated. > > Gavin. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.**** > > ** ** >[[alternative HTML version deleted]]