Polwart Calum (County Durham and Darlington NHS Foundation Trust)
2009-Jul-28 18:28 UTC
[R] Summarising Data for Forrest Plots
I tried to post this a few times last week and it seems to have got stuck somehow so I'm trying from a different email in the hope that works. If somehow this has appeared on the list 20 tiems and I never saw any of them I apologize ;-) I'm basically an R-newbie. But I am VERY computer literate. But this has me stumped... All the examples for using the rmeta package to create a forest plot or simillar seem to use the catheter data: Name n.trt n.ctrl col.trt col.ctrl inf.trt inf.ctrl 1 Ciresi 124 127 15 21 13 14 2 George 44 35 10 25 1 3 3 Hannan 68 60 22 22 5 7 4 Heard 151 157 60 82 5 6 ... As I see it thats a summary of data from several published trials. What I want to do is do a forrest (forest) plot for subgroups within my single dataset as a test of heterogeniety. I have a dataset who received either full dose(FD) or reduced dose(RD) treatment, and a number of characteristics about those subjects: age, sex, renal function, weight, toxicity. And I have survival data (censored). they are in standard columnar data. Is there an *easy* way to transform them into something like this: SubGroup n.FD n.RD surv.FD surv.RD 1 Age >65 2 Age <= 65 3 Male ... 9 Grade 0-2 Tox 10 Grade 3/4 Tox Which rmeta will then let me use to create a forest plot from? This is a reasonably standard approach in biomedical studies these days so it seems odd that I can't find any "How-To" that tells me how to short cut it. Otherwise I have to manually calculate each of the parameters :-( Which is a real pain as we are awaiting more mature data which would need the same process re-run. Thanks in advance C ******************************************************************************************************************** This message may contain confidential information. If yo...{{dropped:21}}
Are n.FD and n.RD the number of people who received the full/reduced dose and surv.FD and surv.RD the number of people that survived? And are the people who received the full dose different from the people who received the reduced dose? And what exactly is it that you want to plot in the forest plot? From the way you have arranged the table, it seems as if you want some kind of effect size measure that contrasts the survival rate of the full versus reduced dose in the various subgroups. Is that correct? And are you just trying to figure out how to draw the forest plot once you have the data in the table form as shown in your post or are you also trying to figure out how to create that table to begin with? -- Wolfgang Viechtbauer Department of Methodology and Statistics University of Maastricht, The Netherlands http://www.wvbauer.com/ ----Original Message---- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Polwart Calum (County Durham and Darlington NHS Foundation Trust) Sent: Tuesday, July 28, 2009 20:28 To: r-help at r-project.org Subject: [R] Summarising Data for Forrest Plots> I tried to post this a few times last week and it seems to have got > stuck somehow so I'm trying from a different email in the hope that > works. If somehow this has appeared on the list 20 tiems and I > never saw any of them I apologize ;-) > > I'm basically an R-newbie. But I am VERY computer literate. But > this has me stumped... > > All the examples for using the rmeta package to create a forest plot > or simillar seem to use the catheter data: > > Name n.trt n.ctrl col.trt col.ctrl inf.trt inf.ctrl > 1 Ciresi 124 127 15 21 13 14 > 2 George 44 35 10 25 1 3 > 3 Hannan 68 60 22 22 5 7 > 4 Heard 151 157 60 82 5 6 > ... > > As I see it thats a summary of data from several published trials. > > What I want to do is do a forrest (forest) plot for subgroups within > my single dataset as a test of heterogeniety. I have a dataset who > received either full dose(FD) or reduced dose(RD) treatment, and a > number of characteristics about those subjects: age, sex, renal > function, weight, toxicity. And I have survival data (censored). > they are in standard columnar data. > > Is there an *easy* way to transform them into something like this: > > SubGroup n.FD n.RD surv.FD surv.RD > 1 Age >65 > 2 Age <= 65 > 3 Male > ... > 9 Grade 0-2 Tox > 10 Grade 3/4 Tox > > Which rmeta will then let me use to create a forest plot from? This > is a reasonably standard approach in biomedical studies these days so > it seems odd that I can't find any "How-To" that tells me how to > short cut it. Otherwise I have to manually calculate each of the > parameters :-( Which is a real pain as we are awaiting more mature > data which would need the same process re-run. > > Thanks in advance > > C > > ******************************************************************************************************************** > > This message may contain confidential information. If > yo...{{dropped:21}} > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Polwart Calum (County Durham and Darlington NHS Foundation Trust)
2009-Jul-29 16:44 UTC
[R] Summarising Data for Forrest Plots
> Are n.FD and n.RD the number of people who received the full/reduced doseYes - but I don't have the data structured like that YET - thats what I want to get to because thats what forest plot seems to be wanting.> and surv.FD and surv.RD the number of people that survived?Mmm... was more thinking of something like median survival? ALthough the brain hasn't kicked into gear yet tonight and it might actually be mean to be a hazard ratio?>And are the people who received the full dose different from the people who received the reduced dose?Yes> And what exactly is it that you want to plot in the forest plot?Subgroups - see below>From the way you have arranged the table, it seems as if you want some kind of effect size measure that contrasts the survival rate of the full versus reduced dose in the various subgroups. Is that correct?Yip that sounds right>And are you just trying to figure out how to draw the forest plot once you have the data in the table form as shown in your post or are you also trying to figure out how to create that table to begin with?I *think* I can draw the plot once I have the data structured right. But at the moment my data is structured like this: PatientID FullDose Survival Censored Age Sex Normal Renal Func Grade of Toxicity 001 Y 125 N 75 F Y 1 002 N 55 Y 55 M N 4 003 N 65 Y 78 F Y 2 I want to eventually get to a forest plot that looks a bit like this: Age: < 65 |-------------#---------------|----|>= 65 |-------------#----------------| || Sex: | M |-----#--|---| F |---------------#---------------------| | | Renal Fucn: | Normal |---------------#-------------| Abnormal |---------------#-------------| | Grade of Toxicity: | 0-1 | |-------#-------| 2 |-----#------| | 3-4 |----------#------------| | | Overall: <> | Which I believe I can achieve using the metaplot or forrest plot functions, replacing the studies with the relevant sub groups. But my challenge has been converting the patient data above down to list subgroups. Other than by running a survival analysis individually on an individual subgroup recording the results and building up a table. Calum ******************************************************************************************************************** This message may contain confidential information. If yo...{{dropped:21}}
Polwart Calum (County Durham and Darlington NHS Foundation Trust)
2009-Jul-29 17:06 UTC
[R] Summarising Data for Forrest Plots
>> What I want to do is do a forrest (forest) plot for subgroups within my single dataset as a test of heterogeniety. I have a dataset who received either full dose(FD) or reduced dose(RD) treatment, and a number of characteristics about those subjects: age, sex, renal function, weight, toxicity. And I have survival data (censored). they are in standard columnar data. >> > Is there an *easy* way to transform them into something like this: >> >> SubGroup n.FD n.RD surv.FD surv.RD >> 1 Age >65 >> 2 Age <= 65 >> 3 Male >> ... >> 9 Grade 0-2 Tox >> 10 Grade 3/4 Tox>> >Hi Calum, >Have you tried subsetting the dataset like this: > >meta.DSL(...,data=mydataset[mydataset$age <= 65,],...) > >JimHi Jim, I'm not sure that I understand! But my understanding was that meta.DSL wants 4 bits of information number treated (Full Dose in my case), Number in control (reduced dose in my case), Number of events in the twoi groups... which is what I was trying to describe above - although possibly not very well.. Then it will do the work for me. My challenge is taking a load of data in columns and getting it summarised by the subgroups so that it takes Age > 65 and counts how many had full dose, howmany had reduced dose and populates the field then does the same for Age < 65 etc etc... (I may be back with questions about the survival value - but even knowing how to get it to summarise like I describe would be a start. I guess its a bit like a pivot table in excel? But perhaps its something to do with the mydataset[mydataset$age <=65,] bit? That seems to give me a data table with only the 65 and unders which makes sense. But then how do I get it to populate a table with the numbers in the two groups? ******************************************************************************************************************** This message may contain confidential information. If yo...{{dropped:21}}
Polwart Calum (County Durham and Darlington NHS Foundation Trust)
2009-Jul-30 20:43 UTC
[R] Summarising Data for Forrest Plots
>Ah, I think I see what you want. Try this on each pair of exclusive sets: >n_total<-dim(mydataset)[1] under65<-mydataset$age <= 65 n_under65<-sum(under65) under65row<-c(sum(mydataset$dose[under65] == "FD"), sum(mydataset$dose[under65] == "RD"), sum(mydataset$vitalstatus[under65] == "dead" & mydataset$dose[under65] == "FD"), sum(mydataset$vitalstatus[under65] == "dead" & mydataset$dose[under65] == "RD")) over65row<-c(sum(mydataset$dose[!under65] == "FD"), sum(mydataset$dose[!under65] == "RD"), sum(mydataset$vitalstatus[!under65] == "dead" & mydataset$dose[!under65] == "FD"), sum(mydataset$vitalstatus[!under65] == "dead" & mydataset$dose[!under65] == "RD"))> >Then under65row and over65row should be the first two rows of your result. >Can't test this at the moment, but I don't think it's too far wrong.Thanks Jim. Yes it looks like that code should do the job. I was really hoping for a code like "SummariseForSubsetAnalysis(mydataset, by=mydatatset$dose, subsets=c(age, renal, sex, toxicity), event=survival )" which would magically do it for me ;-) I guess if this is something I start having to do lots I might have to write one. Surprised one doesn't seem to exist - perhaps the number of variations in what people want would be too complex. Calum ******************************************************************************************************************** This message may contain confidential information. If yo...{{dropped:21}}
Polwart Calum (County Durham and Darlington NHS Foundation Trust)
2009-Jul-30 22:35 UTC
[R] Summarising Data for Forrest Plots
>Ah, I think I see what you want. Try this on each pair of exclusive sets:<snip>>Then under65row and over65row should be the first two rows of your result. >Can't test this at the moment, but I don't think it's too far wrong. >I knew this shouldn't need so much work ;-) Not cracked it yet - because as I see it I need a 2 x 4 table and at the moment I only cracked a 2 x 2 table. ( Or really I need something like a 10 x 4 - but the 4 is the bit that I haven't cracked) First option is something like this: with(mydataset, table(Sex, Dose)) I can get: Dose Sex FD RD F 6 15 M 16 23 For non catagorical data its slightly trickier... but quite achievable in two lines (for the 2 x 2 table) factor(cut(mydatasetl$Age, breaks = c(0,65,100))) -> AgeBands table (AgeBands, mydataset$Dose) Which gives: AgeBands FD RD (0,65] 15 6 (65,100] 13 26 Although - I'm not yet sure if I can actually call that data back by column names. ie x <- table (AgeBands, mydataset$Dose) x$FD produces an error. :-( But getting there. ******************************************************************************************************************** This message may contain confidential information. If yo...{{dropped:21}}