Jessica M Pearce
2006-May-31 15:40 UTC
[R] Nesting in Cox proportional hazards survivorship analysis
Hello, My advisor and I have been working on some survivorship analyses in R and we are hoping to get some feedback on a particular issue involving nesting. We are interested in patterns of food discovery by ant species. Our observations consist of time to discovery by an ant for three different food types, each of two different sizes. These data were collected at 6 plots located in each of two states. Every plot is divided into 25 stations, at which the observations were made. In a repeated measures style design, all stations received all levels of food type and size over the course of 6 sampling periods. So multiple measurements are drawn from each station and site; however, each individual bait item is only discovered once. We also have vapor pressure deficit measurements (a measure that combines temperature and relative humidity) for each discovery time. Each state is being analyzed separately and we are using the Cox proportional hazards approach. It is clear from preliminary analysis that there is a strong influence of spatial heterogenity as evidenced by significant contributions of stations and plots to discovery. However, we are not necessarily interested in the details of this heterogenity and simply wish to control for it in examining the other factors of the model. Thus we employed what we think to be the appropriate nesting syntax in the model we are running (as gleaned from Venables and Ripley 1999, 3rd edition), with stations being nested within sites. To provide an example of the syntax, the full model with which we began is: TXa <- coxph(Surv(dt, status)~site/station+foodtype*foodsize+vpd, data=TXbait) This obviously generates a large number of terms, even as we work down to the reduced model. Is this syntax testing what we think it is testing, i.e. are we controlling for station effects in our results? Are there potential problems with our approach to the analysis of which we should be aware in our interpretation? We have looked at many sources of survivorship analysis literature and haven’t seen this nesting issue discussed, besides briefly in Venables and Ripley. We recognize that this is an unusual use of survivorship analysis and would appreciate any insight provided. Jessica Pearce Biology Department University of Utah Salt Lake City, UT [[alternative HTML version deleted]]
Greg Snow
2006-May-31 22:31 UTC
[R] Nesting in Cox proportional hazards survivorship analysis
The site/station syntax is mainly useful for situations where you have the same station id's withing different sites (e.g. there is a station #1 in site #1 and also a different station still labeled #1 within site #2). It is unclear whether you really need the /station part or not (it generally does not hurt if you include it and it is not needed). To take into account the heterogenity you can add +cluster(site) or +cluster(station) to the model to tell it that there is correlation within sites or stations (see the help on cluster and possibly on frailty). Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jessica M Pearce Sent: Wednesday, May 31, 2006 9:40 AM To: r-help at stat.math.ethz.ch Subject: [R] Nesting in Cox proportional hazards survivorship analysis Hello, My advisor and I have been working on some survivorship analyses in R and we are hoping to get some feedback on a particular issue involving nesting. We are interested in patterns of food discovery by ant species. Our observations consist of time to discovery by an ant for three different food types, each of two different sizes. These data were collected at 6 plots located in each of two states. Every plot is divided into 25 stations, at which the observations were made. In a repeated measures style design, all stations received all levels of food type and size over the course of 6 sampling periods. So multiple measurements are drawn from each station and site; however, each individual bait item is only discovered once. We also have vapor pressure deficit measurements (a measure that combines temperature and relative humidity) for each discovery time. Each state is being analyzed separately and we are using the Cox proportional hazards approach. It is clear from preliminary analysis that there is a strong influence of spatial heterogenity as evidenced by significant contributions of stations and plots to discovery. However, we are not necessarily interested in the details of this heterogenity and simply wish to control for it in examining the other factors of the model. Thus we employed what we think to be the appropriate nesting syntax in the model we are running (as gleaned from Venables and Ripley 1999, 3rd edition), with stations being nested within sites. To provide an example of the syntax, the full model with which we began is: TXa <- coxph(Surv(dt, status)~site/station+foodtype*foodsize+vpd, data=TXbait) This obviously generates a large number of terms, even as we work down to the reduced model. Is this syntax testing what we think it is testing, i.e. are we controlling for station effects in our results? Are there potential problems with our approach to the analysis of which we should be aware in our interpretation? We have looked at many sources of survivorship analysis literature and haven't seen this nesting issue discussed, besides briefly in Venables and Ripley. We recognize that this is an unusual use of survivorship analysis and would appreciate any insight provided. Jessica Pearce Biology Department University of Utah Salt Lake City, UT [[alternative HTML version deleted]]