David Hajage
2008-Jul-25 08:54 UTC
[R] Package Hmisc, functions summary.formula() and latex(), options pdig, pctdig, eps and prmsd
Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?... Nom : non disponible URL : <https://stat.ethz.ch/pipermail/r-help/attachments/20080725/bcafe250/attachment.pl>
Frank E Harrell Jr
2008-Jul-25 12:25 UTC
[R] Package Hmisc, functions summary.formula() and latex(), options pdig, pctdig, eps and prmsd
David Hajage wrote:> Hello R users, > > I have several problems with the functions summary.formula and latex in > package Hmisc. Here an example : > > #### > library(Hmisc) > sex <- factor(sample(c("m","f", "?"), 50, rep=TRUE)) > age <- rnorm(50, 50, 5) > treatment <- factor(sample(c("Drug","Placebo"), 50, rep=TRUE)) > symp <- c('Headache','Stomach Ache','Hangnail', > 'Muscle Ache','Depressed') > symptom1 <- sample(symp, 50,TRUE) > symptom2 <- sample(symp, 50,TRUE) > symptom3 <- sample(symp, 50,TRUE) > Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms') > > f <- summary(treatment ~ age + sex + Symptoms, method="reverse", test=TRUE) > print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F) > print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = T) > latex(f, long = T, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F, file = "") > ### > > Here the problems : > - The first print(f, ...) doesn't replace all p-value <0.5 by "P<0.5" : > > +----------------------------+-------------------------+-------------------------+-----------------------------+ > | |Drug > |Placebo | Test | > | |(N=31) > |(N=19) |Statistic | > +----------------------------+-------------------------+-------------------------+-----------------------------+ > |age | 45.926/48.750/54.019| > 47.344/50.728/53.696| F=0.9 d.f.=1,48 P<0.5 | > +----------------------------+-------------------------+-------------------------+-----------------------------+ > |sex : ? | 25.806% ( 8) | 26.316% ( > 5) | Chi-square=0.3 d.f.=2 P=0.86| > +----------------------------+-------------------------+-------------------------+-----------------------------+ > | f | 38.710% (12) | 31.579% ( > 6) | | > +----------------------------+-------------------------+-------------------------+-----------------------------+ > | m | 35.484% (11) | 42.105% ( > 8) | | > +----------------------------+-------------------------+-------------------------+-----------------------------+ > |Primary Symptoms : Depressed| 41.935% (13) | 63.158% > (12) | Chi-square=2.12 d.f.=1 P<0.5| > +----------------------------+-------------------------+-------------------------+-----------------------------+ > | Hangnail | 48.387% (15) | 42.105% ( > 8) |Chi-square=0.19 d.f.=1 P=0.67| > +----------------------------+-------------------------+-------------------------+-----------------------------+ > | Stomach Ache | 45.161% (14) | 68.421% > (13) | Chi-square=2.57 d.f.=1 P<0.5| > +----------------------------+-------------------------+-------------------------+-----------------------------+ > | Muscle Ache | 54.839% (17) | 26.316% ( > 5) | Chi-square=3.89 d.f.=1 P<0.5| > +----------------------------+-------------------------+-------------------------+-----------------------------+ > | Headache | 51.613% (16) | 36.842% ( > 7) | Chi-square=1.03 d.f.=1 P<0.5| > +----------------------------+-------------------------+-------------------------+-----------------------------+I did not see an instance of P < 0.5 that was not replaced by "P<0.5". This is an unusual cutoff, and you are printing many more digits of precision than offered by the data.> > - The second print(f, ..., prmsd = T) has the same problem for p-value. > There is also 4 decimals in the first line instead of 3 :Try modifying the digits argument.> > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > | |Drug > |Placebo | > Test | > | |(N=31) > |(N=19) > |Statistic | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > |age |45.9263/48.7501/54.0194 49.1817+/- > 5.2751|47.3436/50.7276/53.6963 51.1313+/- 5.5868| F=0.9 d.f.=1,48 > P<0.5 | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > |sex : ? | 25.806% ( 8) > | 26.316% ( 5) | Chi-square=0.3 d.f.=2 > P=0.86 | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > | f | 38.710% (12) > | 31.579% ( 6) > | | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > | m | 35.484% (11) > | 42.105% ( 8) > | | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > |Primary Symptoms : Depressed| 41.935% (13) > | 63.158% (12) | Chi-square=2.12 d.f.=1 > P<0.5 | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > | Hangnail | 48.387% (15) > | 42.105% ( 8) | Chi-square=0.19 d.f.=1 > P=0.67| > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > | Stomach Ache | 45.161% (14) > | 68.421% (13) | Chi-square=2.57 d.f.=1 > P<0.5 | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > | Muscle Ache | 54.839% (17) > | 26.316% ( 5) | Chi-square=3.89 d.f.=1 > P<0.5 | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > | Headache | 51.613% (16) > | 36.842% ( 7) | Chi-square=1.03 d.f.=1 > P<0.5 | > +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+ > > - In the latex(f, ...), only the first-line p-value is replaced by "< 0.5", > and the quantiles for age have too much decimal :This is a bug. An override source is below, to use until the next release.> > % latex.default(cstats, title = title, caption = caption, rowlabel > rowlabel, col.just = col.just, numeric.dollar = FALSE, insert.bottom > legend, rowname = lab, dcolumn = dcolumn, extracolheads > extracolheads, extracolsize = Nsize, ...) > % > \begin{table}[!tbp] > \caption{Descriptive Statistics by treatment\label{f}} > \begin{center} > \begin{tabular}{lccc}\hline\hline > \multicolumn{1}{l}{}& > \multicolumn{1}{c}{Drug}& > \multicolumn{1}{c}{Placebo}& > \multicolumn{1}{c}{Test Statistic} > \\ &\multicolumn{1}{c}{{\scriptsize > $N=31$}}&\multicolumn{1}{c}{{\scriptsize $N=19$}}&\\ \hline > age&{\scriptsize 45.92627049~}{48.75014585 }{\scriptsize 54.01942542} > &{\scriptsize 47.34358486~}{50.72757132 }{\scriptsize 53.69634575} &$ > F_{1,48}=0.9 ,~ P<0.5 ^{1} $\\ > sex&&&$ \chi^{2}_{2}=0.3 ,~ P=0.859 ^{2} $\\ > ~~~~?&25.806\%~{\scriptsize~(~8)}&26.316\%~{\scriptsize~(~5)}&\\ > ~~~~f&38.710\%~{\scriptsize~(12)}&31.579\%~{\scriptsize~(~6)}&\\ > ~~~~m&35.484\%~{\scriptsize~(11)}&42.105\%~{\scriptsize~(~8)}&\\ > Primary~Symptoms&&&\\ > ~~~~Depressed&41.935\%~{\scriptsize~(13)}&63.158\%~{\scriptsize~(12)}&$ > \chi^{2}_{1}=2.12 ,~ P=0.145 ^{2} $\\ > ~~~~Hangnail&48.387\%~{\scriptsize~(15)}&42.105\%~{\scriptsize~(~8)}&$ > \chi^{2}_{1}=0.19 ,~ P=0.665 ^{2} $\\ > ~~~~Stomach~Ache&45.161\%~{\scriptsize~(14)}&68.421\%~{\scriptsize~(13)}&$ > \chi^{2}_{1}=2.57 ,~ P=0.109 ^{2} $\\ > ~~~~Muscle~Ache&54.839\%~{\scriptsize~(17)}&26.316\%~{\scriptsize~(~5)}&$ > \chi^{2}_{1}=3.89 ,~ P=0.049 ^{2} $\\ > ~~~~Headache&51.613\%~{\scriptsize~(16)}&36.842\%~{\scriptsize~(~7)}&$ > \chi^{2}_{1}=1.03 ,~ P=0.309 ^{2} $\\ > \hline > \end{tabular} > > \end{center} > > \noindent {\scriptsize $a$\ }{$b$\ }{\scriptsize $c$\ } represent the lower > quartile $a$, the median $b$, and the upper quartile $c$\ for continuous > variables.\\ > Numbers after percents are frequencies.\\ > \indent Tests used: $^{1}$Wilcoxon test; $^{2}$Pearson test > \end{table} > > I would like to understand : > - why does option 'pctdig' affect both percentages and quantiles in the > 'print' command (pctdig : 'number of digits to the right of the decimal > place for *printing percentages*. The default is zero, so percents will be > rounded to the nearest percent'); > - how can I change the number of decimal for continuous variables in the > 'latex' command ?I hope to address these 2 questions soon. Frank> - why doesn't option 'eps' affect all p-value ? > > I've just discover these wonderfull functions. Thank you to Frank Harrell > for this package. > > DH >latex.summary.formula.reverse <- function(object, title=first.word(deparse(substitute(object))), digits, prn = any(n!=N), pctdig=0, npct=c('numerator','both','denominator','none'), npct.size='scriptsize', Nsize='scriptsize', exclude1=TRUE, vnames=c("labels","names"), prUnits=TRUE, middle.bold=FALSE, outer.size="scriptsize", caption, rowlabel="", insert.bottom=TRUE, dcolumn=FALSE, formatArgs=NULL, round=NULL, prtest=c('P','stat','df','name'), prmsd=FALSE, msdsize=NULL, long=dotchart, pdig=3, eps=.001, auxCol=NULL, dotchart=FALSE, ...) { x <- object npct <- match.arg(npct) vnames <- match.arg(vnames) if(is.logical(prtest) && !prtest) prtest <- 'none' stats <- x$stats nv <- length(stats) cstats <- lab <- character(0) nn <- integer(0) type <- x$type n <- x$n N <- x$N nams <- names(stats) labels <- x$labels Units <- x$units nw <- if(lg <- length(x$group.freq)) lg else 1 #23Nov98 gnames <- names(x$group.freq) test <- x$testresults if(!length(test)) prtest <- 'none' gt1.test <- if(all(prtest=='none')) FALSE else length(unique(sapply(test,function(a)a$testname))) > 1 if(!missing(digits)) { #.Options$digits <- digits 6Aug00 oldopt <- options(digits=digits) on.exit(options(oldopt)) } if(missing(caption)) caption <- paste("Descriptive Statistics", if(length(x$group.label)) paste(" by",x$group.label) else paste(" $(N=",x$N,")$",sep=""), sep="") bld <- if(middle.bold) '\\bf ' else '' cstats <- NULL testUsed <- auxc <- character(0) for(i in 1:nv) { if(length(auxCol)) auxc <- c(auxc, auxCol[[1]][i]) nn <- c(nn, n[i]) ## 12aug02 nam <- if(vnames=="names") nams[i] else labels[i] if(prUnits && nchar(Units[i]) > 0) nam <- paste(nam, '~\\hfill\\tiny{',translate(Units[i],'*',' '),'}',sep='') tr <- if(length(test) && all(prtest!='none')) test[[nams[i]]] else NULL if(length(test) && all(prtest!='none')) testUsed <- unique(c(testUsed, tr$testname)) if(type[i]==1 || type[i]==3) { cs <- formatCats(stats[[i]], nam, tr, type[i], if(length(x$group.freq)) x$group.freq else x$n[i], npct, pctdig, exclude1, long, prtest, latex=TRUE, testUsed=testUsed, npct.size=npct.size, pdig=pdig, eps=eps, footnoteTest=gt1.test, dotchart=dotchart) nn <- c(nn, rep(NA, nrow(cs)-1)) } else cs <- formatCons(stats[[i]], nam, tr, x$group.freq, prmsd, prtest=prtest, formatArgs=formatArgs, round=round, latex=TRUE, testUsed=testUsed, middle.bold=middle.bold, outer.size=outer.size, msdsize=msdsize, pdig=pdig, eps=eps, footnoteTest=gt1.test) cstats <- rbind(cstats, cs) if(length(auxc) && nrow(cstats) > 1) auxc <- c(auxc, rep(NA, nrow(cs)-1)) } lab <- dimnames(cstats)[[1]] gl <- names(x$group.freq) ##gl <- if(length(gl)) paste(gl, " $(N=",x$group.freq,")$",sep="") else " " ## Thanks: Eran Bellin <ebellin at montefiore.org> 3Aug01 if(!length(gl)) gl <- " " lab <- sedit(lab,c(" ","&"),c("~","\\&")) #was format(lab) 21Jan99 lab <- latexTranslate(lab, greek=.R.) gl <- latexTranslate(gl, greek=.R.) ## if(any(gl != " ")) gl <- paste(gl, " $(N=",x$group.freq,")$",sep="") # 3Aug01 ## Added any( ) 26Mar02 21jan03 extracolheads <- if(any(gl != " ")) c(if(prn)'', paste('$N=',x$group.freq,'$',sep='')) else NULL # 21jan03 if(length(test) && !all(prtest=='none')) { gl <- c(gl, if(length(prtest)==1 && prtest!='stat') if(prtest=='P') 'P-value' else prtest else 'Test Statistic') if(length(extracolheads)) extracolheads <- c(extracolheads,'') # 21jan03 } dimnames(cstats) <- list(NULL,gl) ## was dimnames(cstats) <- list(lab, gl) 12aug02 cstats <- data.frame(cstats, check.names=FALSE) ## Added row.names=lab below 10jul02 - S+ was dropping dimnames[[1]] ##attr(cstats,'row.names') <- lab 12aug02 col.just <- rep("c",length(gl)) if(dcolumn && all(prtest!='none') && gl[length(gl)] %in% c('P-value','Test Statistic')) col.just[length(col.just)] <- '.' if(prn) { cstats <- data.frame(N=nn, cstats, check.names=FALSE) col.just <- c("r",col.just) } if(!insert.bottom) legend <- NULL else { legend <- paste(if(any(type==2)) { paste("\\noindent {\\",outer.size," $a$\\ }{",bld,"$b$\\ }{\\", outer.size," $c$\\ } represent the lower quartile $a$, the median $b$, and the upper quartile $c$\\ for continuous variables.", if(prmsd) '~~$x\\pm s$ represents $\\bar{X}\\pm 1$ SD.' else '', '\\\\', sep="") } else NULL, if(prn) '$N$\\ is the number of non--missing values.\\\\', if(any(type==1) && npct=='numerator') 'Numbers after percents are frequencies.\\\\', sep="\n") legend <- NULL if(any(type==2)) { legend <- paste("\\noindent {\\", outer.size, " $a$\\ }{", bld, "$b$\\ }{\\", outer.size, " $c$\\ } represent the lower quartile $a$, the median $b$, and the upper quartile $c$\\ for continuous variables.", if(prmsd) '~~$x\\pm s$ represents $\\bar{X}\\pm 1$ SD.' else '', '\\\\\n', sep="") } if(prn) { legend <- paste(legend, '$N$\\ is the number of non--missing values.\\\\\n', sep='') } if(any(type==1) && npct=='numerator') { legend <- paste(legend, 'Numbers after percents are frequencies.\\\\\n', sep='') } if(length(testUsed)) legend <-paste(legend, if(length(testUsed)==1)'\\noindent Test used:' else '\\indent Tests used:', if(length(testUsed)==1) paste(testUsed,'test') else paste(paste('$^{',1:length(testUsed),'}$',testUsed, ' test',sep=''),collapse='; ')) ## added rowname=lab 12aug02 added '\n\n' 4mar03 for ctable=T } if(length(auxc)) { if(length(auxc) != nrow(cstats)) stop(paste('length of auxCol (',length(auxCol[[1]]), ') is not equal to number or variables in table (', nv,').', sep='')) auxcc <- format(auxc) auxcc[is.na(auxc)] <- '' cstats <- cbind(auxcc, cstats) nax <- names(auxCol) heads <- get2rowHeads(nax) names(cstats)[1] <- heads[[1]] if(length(col.just)) col.just <- c('r', col.just) if(length(extracolheads)) extracolheads <- c(heads[2], extracolheads) } resp <- latex.default(cstats, title=title, caption=caption, rowlabel=rowlabel, col.just=col.just, numeric.dollar=FALSE, insert.bottom=legend, rowname=lab, dcolumn=dcolumn, extracolheads=extracolheads, extracolsize=Nsize, ...) if(dotchart) resp$style <- unique(c(resp$style, 'calc', 'epic', 'color')) resp } -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
David Hajage
2008-Jul-25 12:53 UTC
[R] Package Hmisc, functions summary.formula() and latex(), options pdig, pctdig, eps and prmsd
Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?... Nom : non disponible URL : <https://stat.ethz.ch/pipermail/r-help/attachments/20080725/f5e7525a/attachment.pl>