On 5/16/08, Jim Price <price_ja at hotmail.com>
wrote:>
> I have stumbled across something in the Lattice package that is vexing me.
> Consider the code below:
> __________________________________________________________
>
> library(lattice)
>
>
> myData <- expand.grid(sub = factor(1:16), time = 1:10)
>
> myData$observed <- rnorm(nrow(myData))
> myData$fitted <- with(myData, ave(observed, sub, FUN = mean))
> myData$event.time <- with(myData, ave(observed, sub, FUN = function(.x)
10 *
> runif(1)))
>
> myData <- myData[order(myData$sub, myData$time),]
>
>
>
> # This version works...
> xyplot(
> fitted + observed ~ time | sub,
> data = myData,
> subscripts = TRUE,
> panel = function(..., groups = groups, subscripts = subscripts)
> {
> panel.xyplot(..., groups = groups, subscripts = subscripts)
>
> event.time <- unique(myData$event.time[subscripts])
> panel.abline(v = event.time, lty = 2, col =
'green')
> },
> type = c('l','p'),
> distribute.type = TRUE,
> as.table = TRUE
> )
>
>
>
> # ...but when you add the subset parameter it produces multiple index
lines
> per subject
> xyplot(
> fitted + observed ~ time | sub,
> data = myData,
> subset = sub %in% sample(unique(sub), 9),
> subscripts = TRUE,
> panel = function(..., groups = groups, subscripts = subscripts)
> {
> panel.xyplot(..., groups = groups, subscripts = subscripts)
>
> event.time <- unique(myData$event.time[subscripts])
> # print(event.time)
> panel.abline(v = event.time, lty = 2, col =
'green')
> },
> type = c('l','p'),
> distribute.type = TRUE,
> as.table = TRUE
> )
>
> ___________________________________________________________________
>
>
> The (commented out) print statement I think indicates that there is a data
> reordering going on for the second example, that is causing the multiple
> index lines issue. Is there a neat solution to get the correct index lines
> per subject, or do I need a workaround? Or am I missing something
> fundamental in the code above that is causing issues?
The short answer is that the value of 'subscripts' in the panel
function has an unusual interpretation when you use the extended
formula notation (as in 'fitted + observed ~ time')
You would get better insight if you also printed out the 'subscripts'
variable in each panel function. Basically, when you use the extended
notation, lattice creates an artificial 'groups' variable that is
longer than the original data frame, and consequently, the
'subscripts' variable must also have indices that exceed the number of
rows of the original data frame. This is not a problem in your first
call, because event.time[subscripts] returns NA for out-of-bound
indices.
When using 'subset', things get complicated further because this
artificial 'groups' variable is computed after the subsetting, and
'subscripts' no longer even partially refer to the original data
frame. There is no good solution to this; it is the price you pay for
the usefulness of the extended formula notation.
Your reasonable options are
(1) to subset your data beforehand; e.g.,
myDataSub <- subset(myData, sub %in% sample(unique(sub), 9))
and then use myDataSub in the call.
(2) not use the extended formula (this is conceptually cleaner, I think); e.g.,
xyplot(observed ~ time | sub, data = myData,
subset = sub %in% sample(unique(sub), 9),
fitted = myData$fitted,
event.time = myData$event.time,
panel = function(x, y, fitted, event.time, subscripts, ...)
{
panel.lines(x, fitted[subscripts], col = "black")
panel.xyplot(x, y, ...)
event.time <- unique(event.time[subscripts])
panel.abline(v = event.time, lty = 2, col = 'green')
},
as.table = TRUE)
-Deepayan