Paul Miller
2010-Dec-22 21:26 UTC
[R] Seeking feedback on my first attempt at R programming
Hello Everyone, Below is my first attempt at R programming. The code replicates example 5.1 from Common Statistical Methods for Clinical Research with SAS Examples. I was hoping that people more experienced than myself would be willing to take a look and let me know what I did well and what could have been done better. I'd be particularly grateful if anyone could tell me why the two user defined functions commented out at the bottom of the code don't work whereas the one I ultimately went with did. Thanks, Paul ###################################### #### Chapter 5: The Two-Sample T-Test #### ###################################### #### Example 5.1: Two-Sample t-Test using FEV1 Changes Data #### connection <- textConnection(" 101 A 1.35 <NA> 103 A 3.22 3.55 106 A 2.78 3.15 108 A 2.45 2.30 109 A 1.84 2.37 110 A 2.81 3.20 113 A 1.90 2.65 116 A 3.00 3.96 118 A 2.25 2.97 120 A 2.86 2.28 121 A 1.56 2.67 124 A 2.66 3.76 102 P 3.01 3.90 104 P 2.24 3.01 105 P 2.25 2.47 107 P 1.65 1.99 111 P 1.95 <NA> 112 P 3.05 3.26 114 P 2.75 2.55 115 P 1.60 2.20 117 P 2.77 2.56 119 P 2.06 2.90 122 P 1.71 <NA> 123 P 3.54 2.92 ") fev <- data.frame(scan(connection, list(patno = 0, trtgrp = "", fev0 = 0, fev6 = 0), na.strings="<NA>")) fev <- within(fev,trtgrp <- factor(trtgrp, labels = c("ABC-123","Placebo"))) fev <- within(fev,chg <- fev6 - fev0) summaryfn <- function(x) data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x), t.value=mean(x)/(sd(x)/sqrt(sum(complete.cases(x)))), p.value=2*pt(-abs(mean(x)/(sd(x)/sqrt(sum(complete.cases(x))))),df=sum(complete.cases(x))-1)) cfev <- na.omit(fev) by(cfev[3:5],cfev[2],summaryfn) fligner.test(chg ~ trtgrp, data=fev) t.test(chg ~ trtgrp, data=fev, var.equal=TRUE) #summaryfn <- function(x) # data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x), # t.value=t.test(x)$statistic, # p.value=t.test(x)$p.value) #cfev <- na.omit(fev) #by(cfev[3:5],cfev[2],summaryfn) #summaryfn <- function(x) # data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x), # t.value=as.numeric(t.test(x)[1]), # p.value=as.numeric(t.test(x)[3])) #cfev <- na.omit(fev) #by(cfev[3:5],cfev[2],summaryfn) [[alternative HTML version deleted]]
Gabor Grothendieck
2010-Dec-23 05:53 UTC
[R] Seeking feedback on my first attempt at R programming
On Wed, Dec 22, 2010 at 4:26 PM, Paul Miller <pjmiller_57 at yahoo.com> wrote:> Hello Everyone, > > Below is my first attempt at R programming. The code replicates example 5.1 from Common Statistical Methods for Clinical Research with SAS Examples. I was hoping that people more experienced than myself would be willing to take a look and let me know what I did well and what could have been done better. I'd be particularly grateful if anyone could tell me why the two user defined functions commented out at the bottom of the code don't work whereas the one I ultimately went with did. > > Thanks, > > Paul > > ###################################### > #### Chapter 5: The Two-Sample T-Test #### > ###################################### > > #### Example 5.1: Two-Sample t-Test using FEV1 Changes Data #### > > connection <- textConnection(" > 101 A 1.35 <NA>?? 103 A 3.22 3.55?? 106 A 2.78 3.15 > 108 A 2.45 2.30?? 109 A 1.84 2.37?? 110 A 2.81 3.20 > 113 A 1.90 2.65?? 116 A 3.00 3.96?? 118 A 2.25 2.97 > 120 A 2.86 2.28?? 121 A 1.56 2.67?? 124 A 2.66 3.76 > 102 P 3.01 3.90?? 104 P 2.24 3.01?? 105 P 2.25 2.47 > 107 P 1.65 1.99?? 111 P 1.95 <NA>?? 112 P 3.05 3.26 > 114 P 2.75 2.55?? 115 P 1.60 2.20?? 117 P 2.77 2.56 > 119 P 2.06 2.90?? 122 P 1.71 <NA>?? 123 P 3.54 2.92 > ") > > fev <- data.frame(scan(connection, > list(patno = 0, trtgrp = "", fev0 = 0, fev6 = 0),?na.strings="<NA>")) > > fev <- within(fev,trtgrp <- factor(trtgrp, labels = c("ABC-123","Placebo"))) > fev <- within(fev,chg <- fev6 - fev0) > > summaryfn <- function(x) > data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x), > t.value=mean(x)/(sd(x)/sqrt(sum(complete.cases(x)))), > p.value=2*pt(-abs(mean(x)/(sd(x)/sqrt(sum(complete.cases(x))))),df=sum(complete.cases(x))-1)) > > cfev <- na.omit(fev) > by(cfev[3:5],cfev[2],summaryfn) > > fligner.test(chg ~ trtgrp, data=fev) > t.test(chg ~ trtgrp, data=fev, var.equal=TRUE) > > #summaryfn <- function(x) > #?? data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x), > #????t.value=t.test(x)$statistic, > #????p.value=t.test(x)$p.value) > #cfev <- na.omit(fev) > #by(cfev[3:5],cfev[2],summaryfn) > > #summaryfn <- function(x) > #?? data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x), > #????t.value=as.numeric(t.test(x)[1]), > #????p.value=as.numeric(t.test(x)[3])) > #cfev <- na.omit(fev) > #by(cfev[3:5],cfev[2],summaryfn) >1. Better to stay away from within unless you really need it. Use with or transform: fev <- transform(fev, trtgrp = factor(trtgrp, labels = c("ABC-123","Placebo")), chg = fev6 - fev0 ) 2. mean and sd act column-wise but t.test does not. A minimal change to make it work would be to add an apply to the t.value and p.value: summaryfn2a <- function(x) data.frame(n=sum(complete.cases(x)),mean=mean(x),std.dev=sd(x), t.value=apply(x, 2, function(x) (t.test)(x)$statistic), p.value=apply(x, 2, function(x) (t.test)(x)$p.value)) cfev <- na.omit(fev) by(cfev[3:5],cfev[2],summaryfn2a) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com