Hi useRs - I was wondering if anyone out there can tell me where to find R-code to do mixes of tables and graphics. I am thinking of something similar to this: http://yost.com/information-design/powerpoint-corrupts/ or like the excel routines people are demonstrating: http://infosthetics.com/archives/2006/08/excel_in_cell_graphing.html My aim is to provide small graphics to illustrate numbers directly beside or behind their position in the table. Maybe there is a way to do it with lattice? Thanks for any help you may be able to provide. Sam Ferguson
Sam Ferguson wrote:> Hi useRs - > > I was wondering if anyone out there can tell me where to find R-code > to do mixes of tables and graphics. I am thinking of something > similar to this: > http://yost.com/information-design/powerpoint-corrupts/ > or like the excel routines people are demonstrating: > http://infosthetics.com/archives/2006/08/excel_in_cell_graphing.html > > My aim is to provide small graphics to illustrate numbers directly > beside or behind their position in the table. Maybe there is a way to > do it with lattice? > > Thanks for any help you may be able to provide. > Sam FergusonThe mixtures of tables and graphics we've produced are a bit different from the examples you gave but demonstrate the value of combining R and LaTeX. R can produce LaTeX code containing LaTeX picture environments, for example. That's how we put tiny high-resolution histograms inside tabular output showing descriptive statistics in the describe function and its latex method latex.describe in the Hmisc package. Charles Thomas Dupont is working on a more impressive graphic inside a table by adding tiny dot charts showing proportions and confidence limits for differences in probabilities to the output produced by the latex method for Hmisc's summary.formula function. An example of the first type may be found in http://biostat.mc.vanderbilt.edu/StatGraphCourse under "Mixing Text and Graphics" and we'll add an example of the second type soon. LaTeX offers another approach: tables (matrices) of graphics in the tabular environment. -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
On 31-Aug-06 Sam Ferguson wrote:> Hi useRs - > > I was wondering if anyone out there can tell me where to find > R-code to do mixes of tables and graphics. I am thinking of > something similar to this: > http://yost.com/information-design/powerpoint-corrupts/ > or like the excel routines people are demonstrating: > http://infosthetics.com/archives/2006/08/excel_in_cell_graphing.html > > My aim is to provide small graphics to illustrate numbers directly > beside or behind their position in the table. Maybe there is a way > to do it with lattice? > > Thanks for any help you may be able to provide. > Sam FergusonI dare say there may be a way to do that kind of thing directy within R, and if so then the graphics experts will no doubt tell us how! But your examples are just one kind of combined tabular/graphic layout (and somewhat similar to each other). In a more general context of combining tables of numerical results with graphic displays, it is perhaps better to think in terms of using R to produce the numerical results in the first instance, and then handing these over to software designed for general-purpose graphical/textual layout. You then have complete control, and full flexivility of design. Indeed, in your second (Excel) example, the method of production is just a nasty kludge -- and it was a happy coincidence that the "REPT" function was available in Excel at all! As Frank Harrell has just posted (just as I was completing this one!), you can do this sort of thing in LaTex (his example shows little histograms of the data, above each different tabular section). LaTex is an example of software which allows you to create precisely formatted graphics within precisely formatted text. However, I'm no expert on LaTex, preferring what I've been used to for too many years, namely Unix 'troff' and its more recent GNU implementation 'groff'. As a preliminary, you will need to get R to output a suitable data file, or a suitably composed data file with 'groff' formatting tags interspersed. The latter should not be difficult, though my own approach would be to simply take a data file of the form (for your first example as taken from your URL): "% survival / standard error" "5 year" "10 year" "15 year" "20 year" "Prostate" 98.8 0.4 95.2 0.9 87.1 1.7 81.3 3.0 "Thyroid" 96.0 0.8 95.8 1.2 94.0 1.6 95.4 2.1 "Testis" 94.7 1.1 94.0 1.3 91.1 1.8 88.2 2.3 [...] (which would be very straightforward in R) and then use say 'awk' to compute 'groff' data with embedded tags (see below). The file which I would then submit to 'groff' would look like .ds RED "\X'ps: exec 1 0 0 setrgbcolor' .ds GREY "\X'ps: exec 0.5 0.5 0.5 setrgbcolor' .ds BLACK "\X'ps: exec 0 0 0 setrgbcolor' .ds bx \x'-0.2m'\x'-0.2m'\v'0.2m'\Z'\ \*[RED]\D'P \\$1p 0 0 -1m -\\$1p 0 0 1m'\ '\ \Z'\ \h'\\$1p'\ \*[GREY]\D'P 0.5i-\\$1p 0 0 -1m \\$1p-0.5i 0 0 1m'\ '\h'0.5i'\ \v'-0.2m'\*[BLACK] .LP .TS box tab(#); c3 s1 s1w(0.5i) s s1 s1w(0.5i) s s1 s1w(0.5i) s s1 s1w(0.5i) s. \f[BMB]\s[15]Estimated survival rates by cancer site\s0\fP .T& l c s s s s s s s s s s s. #\fB\s[12]% survival / standard error\s0\fP #\_ .T& l c s s c s s c s s c s s. #5 year#10 year#15 year#20 year #\_#\_#\_#\_ .T& l n l n n c n n c n n c n. Prostate#98.8#\*[bx 35.6]#0.4#95.2#\*[bx 34.3]#0.9#87.1#\ \*[bx 31.4]#1.7#81.3#\*[bx 29.3]#3.0 Thyroid#96.0#\*[bx 34.6]#0.8#95.8#\*[bx 34.5]#1.2#94.0#\ \*[bx 33.8]#1.6#95.4#\*[bx 34.3]#2.1 Testis#94.7#\*[bx 34.1]#1.1#94.0#\*[bx 33.8]#1.3#91.1#\ \*[bx 32.8]#1.8#88.2#\*[bx 31.8]#2.3 [...] Pancreas#4.0#\*[bx 1.4]#0.5#3.0#\*[bx 1.1]#1.5#2.7#\ \*[bx 1.0]#0.6#2.7#\*[bx 1.0]#0.8 .TE The key here is to define a "parametrised string" which will be invoked as "\*[bx <number>]". The is the main "embedded tag". Each box is 0.5 inch wide (36 points), and consists of a lefthand section in Red which width is 36*percent/100 points, with a rigthand section in Grey whose width is 36*(1 - percent/100) points. The height of the box is 1 em (which, in points, is the point-size of the current font), and the box has been shifted downwards slightly (0.2 2m) to align it nicely with the text. The parameter "<number>" in "\*[bx <number>]" is the value of 36*percent/100. So this can, for instance, be easily computed in an 'awk' run. The block of "code" .ds bx \x'-0.2m'\x'-0.2m'\v'0.2m'\Z'\ \*[RED]\D'P \\$1p 0 0 -1m -\\$1p 0 0 1m'\ '\ \Z'\ \h'\\$1p'\ \*[GREY]\D'P 0.5i-\\$1p 0 0 -1m \\$1p-0.5i 0 0 1m'\ '\h'0.5i'\ \v'-0.2m'\*[BLACK] defines the tag "\*[bx ...]", which is responsible for drawing the graphical item ion the table wherever it is invoked. Initailly it is padded above an below with a bit of extra space ("\x...") and moved down slightly ("\v'0.2m'"), then colour changes to Red and a filled Red polygon is drawn; then the drawing point is shifted and a filled Grey polygon is drawn. Finally the colour is changed back to Black for the text part of the Table. The value of "<number>" is substituted for "\\$1" wherever this occurs in the definition of "bx". The line ".TS" leads in to a Table definition, which ends with ".TE". The next few lines specifiy table layout (types, spacings and widths of columns, cell separator "#", etc.); and then come the data for each line of the table, in which the box tag "\*[bx ...]" occurs where needed. As indicated above, the full table data could probably be easily computed in R and can certainly be easily done in 'awk' or 'perl'. After all that, the result is quite pleasing -- and, when I compare it with the graph shown on Sam's URL, it seems to me to represent the numbers much more accurately, as well as being visually slightly more expressive. It would also be quite feasible to "complicate" the graphics with indications of SE etc., by adding more to the definition of \*[bx ...]. I have looked at the "LaTeX file produced by lstex.describe" for Frank Harrell's example. Granting that it has no doubt been automatically produced, it is enormous and, for practical purposes, uneditable if you want to tweak features of the display. It would be interesting to see what had to be down further back up the line to produce it; this might be, of course, much easier to tweak. On the other hand, my 'groff source' file above is compact and easily changed. If anyone would like to look at the output I have produced by the above method (PDF file), and the full groff source file, drop me a line (I'll send them privately to Sam anyway). Best wishes to all, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 01-Sep-06 Time: 15:56:46 ------------------------------ XFMail ------------------------------
The LaTeX or other solutions suggested are probably best, but here is a way to do it using only R base graphics (the below code is to get you started, some graphical parameters need to be set to get the spacing to look better): tmp <- structure(list(Cancer = structure(as.integer(c(19, 23, 22, 13, 2, 7, 5, 24, 3, 9, 20, 8, 4, 15, 16, 17, 10, 1, 14, 21, 12, 6, 11, 18)), .Label = c("Brain, nervous system", "Breast", "Cervis, uteri", "Colon", "Corpus uteri, uterus", "Esophagus", "Hodgkin's Disease", "Kidney, renal pelvis", "Larynx", "Leukemia", "Liver, bile duct", "Lung and bronchus", "Melanomas", "Multiple myeloma", "Non-Hodgkin's", "Oral cavity, pharynx", "Ovary", "Pancreas", "Prostate", "Rectum", "Stomach", "Testis", "Thyroid", "Urinary, bladder"), class = "factor"), p5 = c(98.8, 96, 94.7, 89, 86.4, 85.1, 84.3, 82.1, 70.5, 68.8, 62.6, 61.8, 61.7, 57.8, 56.7, 55, 42.5, 32, 29.5, 23.8, 15, 14.2, 7.5, 4), s5 = c(0.4, 0.8, 1.1, 0.8, 0.4, 1.7, 1, 1, 1.6, 2.1, 1.2, 1.3, 0.8, 1, 1.3, 1.3, 1.2, 1.4, 1.6, 1.3, 0.4, 1.4, 1.1, 0.5), p10 = c(95.2, 95.8, 94, 86.7, 78.3, 79.8, 83.2, 76.2, 64.1, 56.7, 55.2, 54.4, 55.4, 46.3, 44.2, 49.3, 32.4, 29.2, 12.7, 19.4, 10.6, 7.9, 5.8, 3), s10 = c(0.9, 1.2, 1.3, 1.1, 0.6, 2, 1.3, 1.4, 1.8, 2.5, 1.4, 1.6, 1, 1.2, 1.4, 1.6, 1.3, 1.5, 1.5, 1.4, 0.4, 1.3, 1.2, 1.5), p15 = c(87.1, 94, 91.1, 83.5, 71.3, 73.8, 80.8, 70.3, 62.8, 45.8, 51.8, 49.8, 53.9, 38.3, 37.5, 49.9, 29.7, 27.6, 7, 19, 8.1, 7.7, 6.3, 2.7), s15 = c(1.7, 1.6, 1.8, 1.5, 0.7, 2.4, 1.7, 1.9, 2.1, 2.8, 1.8, 2, 1.2, 1.4, 1.6, 1.9, 1.5, 1.6, 1.3, 1.7, 0.4, 1.6, 1.5, 0.6), p20 = c(81.3, 95.4, 88.2, 82.8, 65, 67.1, 79.2, 67.9, 60, 37.8, 49.2, 47.3, 52.3, 34.3, 33, 49.6, 26.2, 26.1, 4.8, 14.9, 6.5, 5.4, 7.6, 2.7), s20 = c(3, 2.1, 2.3, 1.9, 0.7, 2.8, 2, 2.4, 2.4, 3.1, 2.3, 2.6, 1.6, 1.7, 1.8, 2.4, 1.7, 1.9, 1.5, 1.9, 0.4, 2, 2, 0.8)), .Names = c("Cancer", "p5", "s5", "p10", "s10", "p15", "s15", "p20", "s20"), row.names c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24" ), class = "data.frame") layout( matrix( c(1,1,2,3,4,5), nrow=1) ) barplot( rep(0, length(tmp$Cancer)), horiz=T, xaxt='n', space=.5, name=as.character(tmp$Cancer),las=1,cex.names=1) ypos <- barplot( rbind(tmp$p5, 100-tmp$p5), horiz=T, xaxt='n', space=.5, names=tmp$p5, cex.names=1, las=1) title('5 year Survival',cex=.9) axis(4, at=ypos, labels=tmp$s5, las=1, cex=.7,tick=F) ypos <- barplot( rbind(tmp$p10, 100-tmp$p10), horiz=T, xaxt='n', space=.5, names=tmp$p10, cex.names=1, las=1) title('10 year Survival',cex=.9) axis(4, at=ypos, labels=tmp$s10, las=1, cex=.7,tick=F) ypos <- barplot( rbind(tmp$p15, 100-tmp$p15), horiz=T, xaxt='n', space=.5, names=tmp$p15, cex.names=1, las=1) title('15 year Survival',cex=.9) axis(4, at=ypos, labels=tmp$s15, las=1, cex=.7,tick=F) ypos <- barplot( rbind(tmp$p20, 100-tmp$p20), horiz=T, xaxt='n', space=.5, names=tmp$p20, cex.names=1, las=1) title('20 year Survival',cex=.9) axis(4, at=ypos, labels=tmp$s20, las=1, cex=.7,tick=F) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Sam Ferguson Sent: Thursday, August 31, 2006 5:50 PM To: r-help at stat.math.ethz.ch Subject: [R] Tables with Graphical Representations Hi useRs - I was wondering if anyone out there can tell me where to find R-code to do mixes of tables and graphics. I am thinking of something similar to this: http://yost.com/information-design/powerpoint-corrupts/ or like the excel routines people are demonstrating: http://infosthetics.com/archives/2006/08/excel_in_cell_graphing.html My aim is to provide small graphics to illustrate numbers directly beside or behind their position in the table. Maybe there is a way to do it with lattice? Thanks for any help you may be able to provide. Sam Ferguson ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
First Graphic in the initial posting: I think this graphic need to be scaled in a manner so it can be interpreted correctly while going across rows, columns, and non-contguous cells, or the correct interpretation and way to read this provided. For example, in the last row one has to read the numbers to get the correct information out. I it will be good to have documentation that explains how to read/interpret this graph, otherwise fixed length boxes are visually confusing. Anupam.
On 02-Sep-06 Anupam Tyagi wrote:> First Graphic in the initial posting: I think this graphic need > to be scaled in a manner so it can be interpreted correctly while > going across rows, columns, and non-contguous cells, or the correct > interpretation and way to read this provided. For example, in the > last row one has to read the numbers to get the correct information > out. It will be good to have documentation that explains how to > read/interpret this graph, otherwise fixed length boxes are > visually confusing. Anupam.You are perhaps asking too much from this kind of graphic. All graphical displays have both merits and limitations. The design of the display (if it has been thought out) will be chosen so as to exhibit what the writer wants the reader to see "immediately", along with "deeper" detail which can be perceived by taking a longer and closer look but without demanding too dispersed an attention which can confuse and overload the reader. In this particular case, one can very quickely see that some 4 cancers (Prostate-Melanomas) have quite good survival rates over all 4 5-year periods. For the next four (Breast-Urinary), though survival is good for the first 5-year period, it can be seen that it is more variable for subsequent periods. The next 8 (Cervix-Ovary) have a broadly similar initial survival rate (50%-75%) with subsequent survival very variable between different cancers. Then there is a somwhat suddent jump to the final group of eight (Leukemia-Pancreas) where initial survival (and therefore longer-term survival) is low. I think the above summary is all that can be directly derived from the graphical information, and it may be what the designer wanted to convey, Or, at least, I hope so -- for, if the deigner wanted to convey something different then the design has failed. For instance, one important question is what are the chances of survival over period 5-10 years, given that one has survived the first 5 years. One can only get a very approximate and qualitative idea of this from the graphic (see for instance the above comparison between cancers 1-4 and cancers 5-8). So this design is bad for conveying information about this question. A design appropriate for this would show similar Red/Grey boxes, but now the proportion of Red would be the probability of survival through the current 5-year period, conditional on having survived to the beginning of it. But then it would be difficult to interpret the graphic relative to the question "what is the survival rate to 5 years, to 10 years, ... ?" Of course one could combine the two kinds of graphic in the one display -- a top row of boxes for each cancer as now, and a second row giving conditional survival rates. But then the eye has trouble comparing different cancers for one of these two, since there is visual distraction from the other (a case of requiring dispersed attention). This could be alleviated by off-setting the second row to the right of the first row, so that as well as running horizontally along each row the eye can also run down vertically along the column for the particular type of survival (unconditional or conditional). But then the table would become much wider, so there would be problems about how best to fit it on the page (maybe in landscape). And so it goes on ... As to your point about not being able to perceive the numerical variations in (say) the last row, you have to think about the technology here. When I view that web page on my screen, I see boxes about 1cm wide (a little less in fact). A computer screen has about 5 pixels/mm, so 50 pixels/cm. But the percentages in the last row: 4.0%, 3.0%, 2.7%, 2.7%, vary over a range 1.3%. Now a percentage difference of less than 2% simply cannot be perceived when 1 pixel is at least 2% of the width of the box. You might argue that this would be helped by having wider boxes, but they would have to be much wider (by a factor of say 10) before you could make detailed sense of the last row. Which goes to show that the main message which can be perceived in this graphic is in the rather coarse comparisons which can be made between cancers (and periods) where the rates differ by fairly substantial amounts -- say at least 10%. For anything else, you have to look at the numbers anyway. The merit of this particular design is that you can look at the boxes without being seriously distracted by the numbers, or look at the numbers without being seriously distracted by the boxes, yet both are present at the same time. In summary: design of a graphic display is literally an art. What one display can reveal, another will conceal. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 02-Sep-06 Time: 10:29:20 ------------------------------ XFMail ------------------------------