thr3ads.net - R help - [R] Diamond graphs [Aug 2003]

If this information is useful, please help other people find it:
Share via:

Richard A. O'Keefe

2003-Aug-21 05:21 UTC

[R] Diamond graphs

I apologise for starting a new thread, but we had a mail problem and I
don't have the original message to refer to.

Someone mentioned the new "Diamond Graphs" invented at Johns Hopkins.
I haven't see the August 2003 issue of The American Statistician yet,
but I _have_ read the press release.

The press release is a bit of a stunner.  I quote:
    "Who would have thought we would still be inventing
     new methods of graphing in the twenty-first century?"

A1: anyone with a functioning brain?
A2: anyone who didn't sleep through the twentieth century?

I can summarise diamond graphs this way:
    (1) Write a 2D table.		- very old idea
    (2) Instead of numbers, put blobs of
	some kind where the size shows you
	the importance.			- at least 3000 years old.
    (3) Rotate the table widdershins 45
	degrees				- swiped from 3D displays
    (4) Replace the blobs by truncated diamonds;
        height of bar or area of polygon or something
        shows value.			- NOVELTY
    (5) Notice that it's not all that readable,
	so put the numbers back.
	
The fact that someone would try to patent this strikes me as outrageous;
the actual amount of novelty is so tiny.

For R, I don't think it matters, because I think that diamond graphs
are a bad idea.  Let me try to explain why.

In effect, you have a 2D bar chart, where each bar occupies a rather
small diamond-shaped cell.  The bars are sort of squashed to fit in.

ASCII graphic of a typical bar:
             |
          _______
         /       \
     __ /         \ __
        \         /
         \_______/
             |

These bars have two axes of symmetry: a vertical mirror axis and
a horizontal mirror axis.  The lines outside the hexagon above
show the symmetry axes.  I shall use horizontal and vertical coordinates
running form -1 to +1, and define the height of the bar to be the amount
that the bar extends above the horizontal: 0 <= h <= 1.  When h = 0,
no polygon is shown.  When h = 1, the polygon is a square occupying the
whole cell.  (I don't actually understand this; in the illustration in
http://www.jhu.edu/~gazette/2003/18aug03/18graph.html
there is _always_ a margin, except when the square is full.  It doesn't
really spoil my point.)

Total height of bar:		2h
Width of bar top:		2 - 2h
Total width of bar:		2
Diagonal (corner to corner):	2.sqrt(2h^2 - 2h + 1)
Area of bar:			4h - 2h^2

The information-bearing units don't just change *size*, and they don't
just *stretch* in one dimension (like normal histogram bars), they change
shape much more drastically than that.  I don't see how this can make it
easy to relate one bar to another; your impression of how *much* bigger
one bar is than another depends on which visual aspect you attend to.

As Tufte (Visual Display of Quantitative Information) puts it:
    There are considerable ambiguities in how people perceive
    a two-dimensional surface and then convert that perception
    into a one-dimensional number.
(p71)  

Combine this with the small "dynamic range" available (because you
have
lots of cells to fit in), and there doesn't seem to be any advantage
over just using discs of various sizes (which is a fairly old technique;
you'll find a similar idea in ABCs of EDA).

Oh yes, you'll find tables with entries shown by amount of ink
on page 174 of Tufte's Visual Display...

I'm assuming here that the use of truncated squares ("hexagons")
is considered important.  The Gazette web page above has some other
examples showing
(A) plain diamonds that change size, not shape
(B) "diamonds" without margins, that change shape as described above
(C) "diamonds" with margins, that change shape as described above
(D) diamonds with fixed width spanning the cell, where the height
    changes
(E) something with a rectangle, a cell, and two "bow ties", each in a
cell.
    I have no idea what that the different shapes mean.
Since the text says "The researcher experimented with other shapes but
sound that the six-sided polygon was the only shape to represent the
outcomes equally within the grid as it expanded", I surmise that A, B,
D, and E are meant to be understood as "bad examples" that the
diamonds
of C improve on.

It is not clear to me what the advantage of turning the diagram 45
degrees widdershins is supposed to be.  I'm assuming here (and I don't
even play an expert on TV) that vertical patterns are easier to grasp
than diagonal ones.  Now, the main example on that web page (and in the
PDF file you can get to from the URI posted in the original message)
can be summarised as
    Systolic >= 180	BIG
    Systolic 160..179	moderate
    Everything else	pretty much small
which is quite easy to see if "systolic" is the horizontal or vertical
axis, but when the vertical axis is "systolic + diastolic" and the
horizontal is "diastolic - systolic" (a bit of hand-waving here,
because
the buckets aren't the same width, so + and - are a bit dodgy), it gets
rather harder to see.

Turn to the examples on page 174 of Tufte again, where the number of
values for one variable (6) is not the same as the number for the other (16).
Would that look good if you turned it 45 degrees?  The diamond graph appears
to rely on the two explanatory variables having nearly the same number of
values, which would seem to limit its usefulness.

What would happen if we turn the diagram back so that the axes are
horizontal and vertical?  Well, with square (or rectangular) cells
we could put _several_ vertical bars in each cell, and so display
2 or 3 variables on the same 2d grid, something which would be very
hard to do in a diamond graph.

In short, it looks to me as though "diamond graphs" are something R
is better off without.

Baize, Harold

2003-Aug-21 16:45 UTC

head link

[R] Diamond graphs

Richard A. O'Keefe  <ok at cs.otago.ac.nz> wrote:
> Someone mentioned the new "Diamond Graphs" invented at Johns
Hopkins.
> I haven't see the August 2003 issue of The American Statistician yet,
> but I _have_ read the press release.  
Same here.
 	> The fact that someone would try to patent this strikes me as outrageous;
> the actual amount of novelty is so tiny. 
Agree again. [Richards points edited for space]
> For R, I don't think it matters, because I think that diamond graphs
> are a bad idea. 
> In short, it looks to me as though "diamond graphs" are something
R
> is better off without.
A few points to add to Richards comments. The proposed "diamond graph"
is not innovative, more intuitive, or more accurate than existing 
graph forms.  It is applicable to one limited graphing problem: a 
continuous (outcome) dimension and two discrete categorical dimensions. 
Ironically, the example
http://www.jhu.edu/~gazette/2003/18aug03/18graph.html  uses artificially
imposed discrete categories on two continuous variables! 
Why not treat them as continuous? 

This specific problem (2 categorical, 1 continuous) presents the 
challenge of representing 3 dimensions on a two dimensional plane. 
The "traditional" solution is the "3D bar chart" which uses 
perspective to represent the third dimension. There are many 
problems with that compromise. The two greatest being that the 
fixed perspective can obscure bars further back in the z (depth) 
dimension, and that perception of the relative size (height) of 
the bars is less precise due to projection of the third dimension 
through perspective. The perspective distortion can be corrected 
through stereoscopic presentation, the obstruction of bars can 
be corrected through animation. These solutions complete the third 
dimension, but will not work on a monochromatic printed page. 

Less expensive and more practical would be to present the data in 
a two dimensional matrix (as proposed in the "diamond") but not 
to use an odd shape to convey the third dimension. The third 
dimension could be represented by hue (color) or brightness (shade). 
I suspect that actual psychometric tests would show that color 
or other visual representations of density would be more accurate 
and reliable than their proposed solution which confounds area with 
shape. 

As a caveat, I have not read the American Statistician article. 
I will be surprised if they present data showing that users 
can more accurately perceive variation in the continuous variable 
through their odd shape solution in contrast to either color or 
shade.

Harold Baize, Ph.D. 
Research and Evaluation  
Youth Services Division
Butte County Department of Behavioral Health
hbaize at buttecounty.net

Scott Zeger

2003-Aug-25 19:37 UTC

head link

[R] diamond graphs

I read with interest comments about "diamond graphs" recently
described in
the American Statistician by my colleagues in the Johns Hopkins Department
of Epidemiology led by Dr. Alvaro Munoz.

Permit three brief reactions.

First, "diamond graphs" were developed as part of the Multi-center
Aids
Cohort Study, a seminal study of HIV infection in the U.S. in which these
authors have been key co-investigators. The graphs were created to better
address a real scientific objective and that usually bodes well for their
longer-term value.

Second, non-technical descriptions of statistical work written by public
affairs people, such as the Johns Hopkins web-page article commented on,
tend to be enthusiastic; such is the nature of public relations. I, for one,
am delighted to see statistical work noticed and discussed by
non-statisticians within my University and beyond.

Third, this University leaves it to individual faculty whether or not to
pursue a patent for a discovery. That Dr. Munoz and colleagues have decided
to do so does reflects their preference, not a University or Department
policy. In fact, the Johns Hopkins Department of Biostatistics faculty and
graduates are active participants in and enthusiastic supporters of open
source software development. For recent examples, see:
http://www.biostat.jhsph.edu/biostat/research/software.shtml

Scott L. Zeger
Department of Biostatistics
Johns Hopkins University

Richard A. O'Keefe

2003-Aug-26 02:09 UTC

head link

[R] diamond graphs

Scott Zeger <szeger at jhsph.edu> commented:
	First, "diamond graphs" were developed as part of the Multi-center
Aids
	Cohort Study, a seminal study of HIV infection in the U.S. in which these
	authors have been key co-investigators. The graphs were created to better
	address a real scientific objective and that usually bodes well for their
	longer-term value.
	
I've invented a couple of graphic techniques myself.  They were devised
to deal with problems of actual practical interest at the time.  "That
usually bodes well for their longer-term value"?  No, I am these days
glad that I never published them, because R is chock full of *better*
methods than mine.

As yet I have not had a chance to see the actual article.  (Living in
the Southern Hemisphere has advantages, but also disadvantages, like
the time it takes periodicals to arrive.)  The one example of a diamond
graph I've seen did make a certain pattern in the data easy to spot, but
it made it harder to spot than other graphs would have.  Amongst other
things, it would be very interesting to see some sort of 2d density
plot with log(diastolic) and log(systolic) as axes.  Perhaps this was
already done in the article.

First Lispstat and now R have impressed on my mind the importance of
moving beyond paper.  The possibility of displaying the same data in
_several_ ways, simultaneously or in quick succession, means that
computer graphics can be a qualitatively different medium from paper.

Just this afternoon I was talking with a 4th-year CS student who is
working on a project to try to find features which will enable him to
find patterns in a certain kind of data.  Using R, I generated some
synthetic data in a couple of lines of code.  Then I plotted it several
different ways, scratched my head a bit, rummaged through a list of
smoothing functions found using help.search, and tried something, plotted
it, changed a scale factor, tried again, settled on a scale factor that
seemed to work well, switched back to thinking about calculations, and
in about 15 minutes, there was a technique for finding interesting change
points in the data.  I confused him a bit because I was switching plots
faster than he could follow, so I spent the next 45 minutes explaining
what I'd done.  The point was that *changing* plots was qualitatively
different from looking at a single plot.

Now, the data displayed in the one example in the press release seemed
to be (diastolic pressure bucket) x (systolic pressure bucket) -> count.
As noted above, that suggests a 2d density estimate as an interesting
thing.  It also suggests a scatter plot (possibly with rugs).  Most
importantly, it suggests BOTH of them, and several others as well (such
as hexbin), each of which may provide some insight that the others don't.

It's very VERY hard for any one graph, especially one with a cramped
dynamic range, to beat that.  The real competition for the diamond graph
is not some other graph, but a wide choice of graphs that can be quickly
flicked through and creatively combined.

This also means that a new graphic technique, if it _is_ good, is even
_better_ when it can be freely creatively combined with other graphic
techniques.  Having diamond graphs locked out of R is bad *for* diamond
graphs.

	In fact, the Johns Hopkins Department of Biostatistics faculty and
	graduates are active participants in and enthusiastic supporters of open
	source software development. For recent examples, see:
	http://www.biostat.jhsph.edu/biostat/research/software.shtml
	
Not only that, at least one of them, R/qtl, is an R package.

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Aug 2003 - Diamond graphs

[R] Diamond graphs

[R] Diamond graphs

[R] diamond graphs

[R] diamond graphs

Possibly Parallel Threads