Dear All, I hope to run some simple survival analysis using the cox-proportional hazard models in R, my command will look like below: cox <- summary( coxph( Surv( mortality , TIME ) ~ Independent variables ) ) My query is about specifying a range of independnt variables in R, such that each independent variable is included as the main defining variable independently of other variables in the variable list. I have around 10,000 independent variables or groups by which I hope to study differences in mortality rates over a period of time. All the 10,000 variables have one thing in common, i.e. their names start with the same alphabets rs followed by unique 6-8 digit numbers. Regards, Sajjad [[alternative HTML version deleted]]
You could loop through the list using lapply() or the like. One way would be to subset the data and use . Notation which expands to all variables in the dataset not used elsewhere in the model (ie not the outcomes). Cheers, Josh On Mar 1, 2012, at 14:29, sajjad R <sajjad_r at hotmail.com> wrote:> > Dear All, > > I hope to run some simple survival analysis using the cox-proportional hazard models in R, my command will look like below: > > cox <- summary( coxph( Surv( mortality , TIME ) ~ Independent variables ) ) > > My query is about specifying a range of independnt variables in R, > such that each independent variable is included as the main defining variable independently of other variables in the variable list. > I have around 10,000 independent variables or groups by which I hope to study differences in mortality rates over a period of time. > All the 10,000 variables have one thing in common, i.e. their names start with the same alphabets rs followed by unique 6-8 digit numbers. > > Regards, > > Sajjad > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Fri, Mar 2, 2012 at 11:29 AM, sajjad R <sajjad_r at hotmail.com> wrote:> > Dear All, > > I hope to run some simple survival analysis using the cox-proportional hazard models in R, my command will look like below: > > cox <- summary( coxph( Surv( mortality , TIME ) ~ Independent variables ) ) > > My query is about specifying a range of independnt variables in R, > such that each independent variable is included as the main defining variable independently of other variables in the variable list. > I have around 10,000 independent variables or groups by which I hope to study differences in mortality rates over a period of time. > All the 10,000 variables have one thing in common, i.e. their names start with the same alphabets rs followed by unique 6-8 digit numbers.Ah yes. SNP data. Ideally, you want to use coxph.fit() rather than coxph(). This is significantly faster and takes a model matrix rather than a formula, so you can write a loop with index, say, i and construct the model matrix as X<-cbind(adjustmentvariables, snp[ , i]) Also, it will help to provide starting values for the coefficients of the adjustment variables. And, if you initially specify just one iteration of the model you can filter out nearly all the SNPs and then go back and refit the model properly for the few that might be important. If you need to use coxph() and the formula interface, the simplest approach is probably to paste together the formula as a character string and then use as.formula() to convert it to a formula. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland