Sarah Bonnin
2012-Aug-23 15:41 UTC
[R] party package: ctree - survival data - extracting statistics/predictors
Dear R users, I am trying to apply the analysis processed in a paper, on the data I'm working with. The data is: 80 patients for which I have survival data (time - days, and event - binary), and microarray expression data for 200 genes (predictor continuous variables). My data matrix "data.test" has ncol: 202 and nrow: 80. What I want to do is: - run recursive partitioning on this data to get groups of patients homogenous in terms of survival/prognosis. - extract the "correlation" of single gene expression (each of the 200 genes) with recurrence-free survival (time and event): i want to know which variables can predict best a poor/good prognosis based on survival data. I am using function "ctree" from the "party" package. I came up with this command: test <- ctree(Surv(time, event)~., data =data.test, controls=ctree_control(teststat="max", testtype="Bonferroni", mincriterion=0.95,savesplitstats = TRUE), ytrafo = function(data)trafo(data, numeric_trafo = rank), xtrafo=function(data)trafo(data, surv_trafo=logrank_trafo(data, ties.method = "logrank")) ) which works well but as I am not a statistician it is quite confusing and i might not run it properly. My technical problem is that I would like to extract the statistics output from my "test" object (BinaryTree class), i.e. P-value of each of the 200 comparisons (survival data versus each gene): i would like to know which of them can be really correlated to each node of the tree. I tried: test at tree$criterion$statistic but the maximum value of this is 16, so I assume it is not a p-value as such: what is it? and: test at tree$criterion$criterion maximum value is 0.96 and minimum value is 0; only one is > 0.95 str(test) gives quite some information, but it is more confusing than helping me at the moment. I want to know: - if my command for "ctree" makes sense to people who have more experience than me with this kind of data... - which elements of "test" represent which statistics and how to interpret them: as I understood, setting "mincriterion" to 0.95 equals to setting up a P-value threshold of 0.05 (ctree help: "when 'mincriterion = 0.95', the p-value must be smaller than $0.05$ in order to split this node.") I hope my explanation is clear, I might be completely mistaken: any tip or guidance are more than welcome... Thanks! Sarah sessionInfo() R version 2.14.2 (2012-02-29) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats4 grid splines stats graphics grDevices utils datasets methods [10] base other attached packages: [1] biomaRt_2.10.0 party_1.0-2 vcd_1.2-13 colorspace_1.1-1 MASS_7.3-20 [6] strucchange_1.4-7 sandwich_2.2-9 zoo_1.7-7 coin_1.0-21 mvtnorm_0.9-9992 [11] modeltools_0.2-19 survival_2.36-14 loaded via a namespace (and not attached): [1] lattice_0.20-6 RCurl_1.91-1.1 tools_2.14.2 XML_3.9-4.1 ------------------ Sarah Bonnin Bioinformatician Centre for Genomic Regulation C/ Dr. Aiguader, 88 08003 Barcelona, Spain ------------------ Sarah Bonnin Bioinformatician Genomics Unit - Office 439.01 Centre for Genomic Regulation C/ Dr. Aiguader, 88 08003 Barcelona, Spain Tel. +34 93-316-0373 www.crg.eu