There are two ways to view weights. One is to treat them as case weights, i.e., a weight of 3 means that there were actually three identical observations in the primary data, which were collapsed to a single observation in the data frame to save space. This is the assumption of survfit. (Most readers of this list will be too young to remember when computer memory was so small that we had to use tricks like this.) The second assumption is that the weights are sampling weights and a Horvitz-Thompsen like estimator should be used for variance. This is the assumption of the svykm program in the survey package. It appears you want the second behavior. Terry Therneau On 03/26/2013 06:00 AM, r-help-request at r-project.org wrote:> As part of a research paper, I would like to draw both weighted and > unweighted Kaplan-Meier estimates, the weight being the ?importance? of the > each project to the mass of projects whose survival I?m trying to estimate. > > I know that the function survfit in the package survival accepts weights and > produces confidence intervals. However, I suspect that the confidence > intervals may not be correct. The reason why I suspect this is that > depending on how I define the weights, I get very different confidence > intervals, e.g. > > require(survival) > s<- Surv(c(50,100),c(1,1)) > sf<- survfit(s~1,weights=c(1,2)) > plot(sf) > > vs. > > require(survival) > s<- Surv(c(50,100),c(1,1)) > sf<- survfit(s~1,weights=c(100,200)) > plot(sf) > > Any suggestions would be more than welcome! >
Thanks to all of you for your very helpful comments! As Terry suggested, svykm is what I was looking for. While testing that I get the same results with the package survey as with the package survival, I encountered the issue of how to draw survival curves. Apparently the implementations in the two packages differ, as I show below. I would very much welcome your views, since the tail of the survival curve has a major impact on the interpretation of my results. In my data, the last ?death? occurs at 2094 days, while the last censoring time is 3297 days. If possible, I would like to say something about the probability between 2100 days and 3300 days. So, my question is that after the last observed death, in the very simple example below at 883 days, how should one draw the survival curve? The graph produced by svykm (package survey) ends at 883 days, whereas survfit (package survival) continues the graph all the way to the last censoring time, which is at 1022 days. Please run the code below to see my point. There are no weights. With weights, I face the same issue. require(survival) lungSubSet <- lung S <- Surv(lung$time,lung$status) sKm <- survfit(S~1) ## require(survey) svyDesignObject<- svydesign(id=~1,weights=~1,data=lungSubSet) svyKm <- svykm(S~1,design=svyDesignObject,se=T) ## plot(svyKm,xlim=c(0,1200)) lines(sKm,conf.int=T,mark.time=F,col='green') -- View this message in context: http://r.789695.n4.nabble.com/Re-Weighted-Kaplan-Meier-estimates-with-R-tp4662494p4662619.html Sent from the R help mailing list archive at Nabble.com.
Seemingly Similar Threads
- Question on weighted Kaplan-Meier analysis of case-cohort design
- Difference in Kaplan-Meier estimates plus CI
- plotting Kaplan Meier using ggplot2 returns class function error
- How to get the 5th percentile( with a 95% CI )of the Kaplan-meier estimator?
- help drawing kaplan-meier plot starting from 0