I apologise for starting a new thread, but we had a mail problem and I don't have the original message to refer to. Someone mentioned the new "Diamond Graphs" invented at Johns Hopkins. I haven't see the August 2003 issue of The American Statistician yet, but I _have_ read the press release. The press release is a bit of a stunner. I quote: "Who would have thought we would still be inventing new methods of graphing in the twenty-first century?" A1: anyone with a functioning brain? A2: anyone who didn't sleep through the twentieth century? I can summarise diamond graphs this way: (1) Write a 2D table. - very old idea (2) Instead of numbers, put blobs of some kind where the size shows you the importance. - at least 3000 years old. (3) Rotate the table widdershins 45 degrees - swiped from 3D displays (4) Replace the blobs by truncated diamonds; height of bar or area of polygon or something shows value. - NOVELTY (5) Notice that it's not all that readable, so put the numbers back. The fact that someone would try to patent this strikes me as outrageous; the actual amount of novelty is so tiny. For R, I don't think it matters, because I think that diamond graphs are a bad idea. Let me try to explain why. In effect, you have a 2D bar chart, where each bar occupies a rather small diamond-shaped cell. The bars are sort of squashed to fit in. ASCII graphic of a typical bar: | _______ / \ __ / \ __ \ / \_______/ | These bars have two axes of symmetry: a vertical mirror axis and a horizontal mirror axis. The lines outside the hexagon above show the symmetry axes. I shall use horizontal and vertical coordinates running form -1 to +1, and define the height of the bar to be the amount that the bar extends above the horizontal: 0 <= h <= 1. When h = 0, no polygon is shown. When h = 1, the polygon is a square occupying the whole cell. (I don't actually understand this; in the illustration in http://www.jhu.edu/~gazette/2003/18aug03/18graph.html there is _always_ a margin, except when the square is full. It doesn't really spoil my point.) Total height of bar: 2h Width of bar top: 2 - 2h Total width of bar: 2 Diagonal (corner to corner): 2.sqrt(2h^2 - 2h + 1) Area of bar: 4h - 2h^2 The information-bearing units don't just change *size*, and they don't just *stretch* in one dimension (like normal histogram bars), they change shape much more drastically than that. I don't see how this can make it easy to relate one bar to another; your impression of how *much* bigger one bar is than another depends on which visual aspect you attend to. As Tufte (Visual Display of Quantitative Information) puts it: There are considerable ambiguities in how people perceive a two-dimensional surface and then convert that perception into a one-dimensional number. (p71) Combine this with the small "dynamic range" available (because you have lots of cells to fit in), and there doesn't seem to be any advantage over just using discs of various sizes (which is a fairly old technique; you'll find a similar idea in ABCs of EDA). Oh yes, you'll find tables with entries shown by amount of ink on page 174 of Tufte's Visual Display... I'm assuming here that the use of truncated squares ("hexagons") is considered important. The Gazette web page above has some other examples showing (A) plain diamonds that change size, not shape (B) "diamonds" without margins, that change shape as described above (C) "diamonds" with margins, that change shape as described above (D) diamonds with fixed width spanning the cell, where the height changes (E) something with a rectangle, a cell, and two "bow ties", each in a cell. I have no idea what that the different shapes mean. Since the text says "The researcher experimented with other shapes but sound that the six-sided polygon was the only shape to represent the outcomes equally within the grid as it expanded", I surmise that A, B, D, and E are meant to be understood as "bad examples" that the diamonds of C improve on. It is not clear to me what the advantage of turning the diagram 45 degrees widdershins is supposed to be. I'm assuming here (and I don't even play an expert on TV) that vertical patterns are easier to grasp than diagonal ones. Now, the main example on that web page (and in the PDF file you can get to from the URI posted in the original message) can be summarised as Systolic >= 180 BIG Systolic 160..179 moderate Everything else pretty much small which is quite easy to see if "systolic" is the horizontal or vertical axis, but when the vertical axis is "systolic + diastolic" and the horizontal is "diastolic - systolic" (a bit of hand-waving here, because the buckets aren't the same width, so + and - are a bit dodgy), it gets rather harder to see. Turn to the examples on page 174 of Tufte again, where the number of values for one variable (6) is not the same as the number for the other (16). Would that look good if you turned it 45 degrees? The diamond graph appears to rely on the two explanatory variables having nearly the same number of values, which would seem to limit its usefulness. What would happen if we turn the diagram back so that the axes are horizontal and vertical? Well, with square (or rectangular) cells we could put _several_ vertical bars in each cell, and so display 2 or 3 variables on the same 2d grid, something which would be very hard to do in a diamond graph. In short, it looks to me as though "diamond graphs" are something R is better off without.
Richard A. O'Keefe <ok at cs.otago.ac.nz> wrote:> Someone mentioned the new "Diamond Graphs" invented at Johns Hopkins. > I haven't see the August 2003 issue of The American Statistician yet, > but I _have_ read the press release.Same here.> The fact that someone would try to patent this strikes me as outrageous; > the actual amount of novelty is so tiny.Agree again. [Richards points edited for space]> For R, I don't think it matters, because I think that diamond graphs > are a bad idea. > In short, it looks to me as though "diamond graphs" are something R > is better off without.A few points to add to Richards comments. The proposed "diamond graph" is not innovative, more intuitive, or more accurate than existing graph forms. It is applicable to one limited graphing problem: a continuous (outcome) dimension and two discrete categorical dimensions. Ironically, the example http://www.jhu.edu/~gazette/2003/18aug03/18graph.html uses artificially imposed discrete categories on two continuous variables! Why not treat them as continuous? This specific problem (2 categorical, 1 continuous) presents the challenge of representing 3 dimensions on a two dimensional plane. The "traditional" solution is the "3D bar chart" which uses perspective to represent the third dimension. There are many problems with that compromise. The two greatest being that the fixed perspective can obscure bars further back in the z (depth) dimension, and that perception of the relative size (height) of the bars is less precise due to projection of the third dimension through perspective. The perspective distortion can be corrected through stereoscopic presentation, the obstruction of bars can be corrected through animation. These solutions complete the third dimension, but will not work on a monochromatic printed page. Less expensive and more practical would be to present the data in a two dimensional matrix (as proposed in the "diamond") but not to use an odd shape to convey the third dimension. The third dimension could be represented by hue (color) or brightness (shade). I suspect that actual psychometric tests would show that color or other visual representations of density would be more accurate and reliable than their proposed solution which confounds area with shape. As a caveat, I have not read the American Statistician article. I will be surprised if they present data showing that users can more accurately perceive variation in the continuous variable through their odd shape solution in contrast to either color or shade. Harold Baize, Ph.D. Research and Evaluation Youth Services Division Butte County Department of Behavioral Health hbaize at buttecounty.net
I read with interest comments about "diamond graphs" recently described in the American Statistician by my colleagues in the Johns Hopkins Department of Epidemiology led by Dr. Alvaro Munoz. Permit three brief reactions. First, "diamond graphs" were developed as part of the Multi-center Aids Cohort Study, a seminal study of HIV infection in the U.S. in which these authors have been key co-investigators. The graphs were created to better address a real scientific objective and that usually bodes well for their longer-term value. Second, non-technical descriptions of statistical work written by public affairs people, such as the Johns Hopkins web-page article commented on, tend to be enthusiastic; such is the nature of public relations. I, for one, am delighted to see statistical work noticed and discussed by non-statisticians within my University and beyond. Third, this University leaves it to individual faculty whether or not to pursue a patent for a discovery. That Dr. Munoz and colleagues have decided to do so does reflects their preference, not a University or Department policy. In fact, the Johns Hopkins Department of Biostatistics faculty and graduates are active participants in and enthusiastic supporters of open source software development. For recent examples, see: http://www.biostat.jhsph.edu/biostat/research/software.shtml Scott L. Zeger Department of Biostatistics Johns Hopkins University
Scott Zeger <szeger at jhsph.edu> commented: First, "diamond graphs" were developed as part of the Multi-center Aids Cohort Study, a seminal study of HIV infection in the U.S. in which these authors have been key co-investigators. The graphs were created to better address a real scientific objective and that usually bodes well for their longer-term value. I've invented a couple of graphic techniques myself. They were devised to deal with problems of actual practical interest at the time. "That usually bodes well for their longer-term value"? No, I am these days glad that I never published them, because R is chock full of *better* methods than mine. As yet I have not had a chance to see the actual article. (Living in the Southern Hemisphere has advantages, but also disadvantages, like the time it takes periodicals to arrive.) The one example of a diamond graph I've seen did make a certain pattern in the data easy to spot, but it made it harder to spot than other graphs would have. Amongst other things, it would be very interesting to see some sort of 2d density plot with log(diastolic) and log(systolic) as axes. Perhaps this was already done in the article. First Lispstat and now R have impressed on my mind the importance of moving beyond paper. The possibility of displaying the same data in _several_ ways, simultaneously or in quick succession, means that computer graphics can be a qualitatively different medium from paper. Just this afternoon I was talking with a 4th-year CS student who is working on a project to try to find features which will enable him to find patterns in a certain kind of data. Using R, I generated some synthetic data in a couple of lines of code. Then I plotted it several different ways, scratched my head a bit, rummaged through a list of smoothing functions found using help.search, and tried something, plotted it, changed a scale factor, tried again, settled on a scale factor that seemed to work well, switched back to thinking about calculations, and in about 15 minutes, there was a technique for finding interesting change points in the data. I confused him a bit because I was switching plots faster than he could follow, so I spent the next 45 minutes explaining what I'd done. The point was that *changing* plots was qualitatively different from looking at a single plot. Now, the data displayed in the one example in the press release seemed to be (diastolic pressure bucket) x (systolic pressure bucket) -> count. As noted above, that suggests a 2d density estimate as an interesting thing. It also suggests a scatter plot (possibly with rugs). Most importantly, it suggests BOTH of them, and several others as well (such as hexbin), each of which may provide some insight that the others don't. It's very VERY hard for any one graph, especially one with a cramped dynamic range, to beat that. The real competition for the diamond graph is not some other graph, but a wide choice of graphs that can be quickly flicked through and creatively combined. This also means that a new graphic technique, if it _is_ good, is even _better_ when it can be freely creatively combined with other graphic techniques. Having diamond graphs locked out of R is bad *for* diamond graphs. In fact, the Johns Hopkins Department of Biostatistics faculty and graduates are active participants in and enthusiastic supporters of open source software development. For recent examples, see: http://www.biostat.jhsph.edu/biostat/research/software.shtml Not only that, at least one of them, R/qtl, is an R package.