Ted.Harding at manchester.ac.uk
2007-Aug-08 21:52 UTC
[R] Subject: Re: how to include bar values in a barplot?
Greg, I'm going to join issue with your here! Not that I'll go near advocating "Excel-style" graphics (abominable, and the Patrick Burns URL which you cite is remarkable in its restraint). Also, I'm aware that this is potential flame-war territory -- again, I want to avoid that too. However, this is the second time you have intervened on this theme (previously Mon 6 August), along with John Kane on Wed 1 August and again today on similar lines, and I think it's time an alternative point of view was presented, to counteract (I hope usefully) what seems to be a draconianly prescriptive approach to the presentation of information. On 07-Aug-07 21:37:50, Greg Snow wrote:> Generally adding the numbers to a graph accomplishes 2 things: > > 1) it acts as an admission that your graph is a failureGenerally, I disagree. Different elements in a display serve different purposes, according to the psychological aspects of visual preception. Sizes, proportions, colours etc. of shapes (bars in a histogram, the marks representing points in a scatterplot, ... ) are interpreted, so to speak, "intuitively" -- the resulting perception is formed by processes which are hard to ascertain consciously, and the overall effect can only be ascertained by looking at it, and noting what impression one has formed. They stimulate mental responses in the domain of perception of spatial relationships. Numbers, and text, on the other hand, while still shapes from the optical point of view, up to the point of their impact on the retina, provoke different perceptions. They are interpreted "analytically" stimulating mental responses in the domains of language and number. There is no Law whatever which requires that the two must be separated. It may be that adding any annotation to a graph or diagram will interfere with the "intuitive" imterpretation that the diagram is intended to stimulate, with no associated benefit. It may be that presenting numerical/textual information within a graphical/diagrammatic context will interfere with the "analytic" interpretation wich is desired, with no associated benefit. In such cases, it is clearly (and as a matter of fact to be decided in each case) better to separate the two apsects. It may, however, be that both can be combined in such a way that each enhances the other; and also the simultaneous perception of both aspects induces a "cartesian-product" richness of interpretation where each element of the graphical presentation combines with each element of the textual/numerical presentation to generate a perception which could not possibly have been realised if they had been presented separately. This, too, is a matter to be decided in each case. On that basis, if a graph without numbers fails to stimulate a desired impression which could have been stimulated by adding the numbers to the graph, then the graph without numbers is a failure.> 2) it converts the graph into a poorly laid out table (with a > colorful and distracting background) > > In general it is better to find an appropriate graph that does > convey the information that is intended or if a table is more > appropriate, then replace it with a well laid out table (or both).There is an implication here that the information conveyed by a graph, and the information conveyed by a table, are mutually exclusive. And that it then follows: Thou Shalt Not Allow The One To Corrupt The Other. While this has the appearance of a Law, it is (for reasons I have sketched above) a Law which is not *generally* applicable.> Remember that the role of tables is to look up specific values > and the role of graphs is to give a good overview.I would agree with this only to the following extent: Tables allow *only* the look-up of values. Graphs (modulo the capacity of the eye/brain to more or less precisely judge relative magnitudes) only allow a "good overview". I would not agree that these are their exclusive roles. The role of Hamlet is to agonise over revenge for his father's death. The role of Ophelia is to embody the "love interest" in the play. This does not imply that there should be parallel performances of "Hamlet" on two different stages, with the audience trooping from one to the other according to which character is currently at the centre of the action. It actually works better when they're all up there at once, interacting!> The books by William Cleveland and Tufte have a lot of good advice > on these issues.Since you mention Tufte, I commend the admiring discussion in his book "The Visual Display of Quantitative Information", Chapter 1 (Graphical Excellence), section "Narrative Graphics of Space and Time" (pp. 40-41 in the edition which I have) of Minard's graphical representation of what happened to Napoleon's army in the course of its advance on, and retreat from, Moscow. An impression of the original can be formed from the rather small version displayed on Tufte's website at the top of http://www.edwardtufte.com/tufte/posters The version in the book is much clearer. Here we see the two aspects of "intuitive" and spatial perception, and textual/numeric "analytical" perception, happily combined on the one display in such a way that the two interact richly. Overlaid on the geographical pathway of the army is a broad band, like a river (with branches), whose breadth at any point represents the surviving numbers of the army. The advancing part is cross-hatched, the retreating part is solid black. Place-names and rivers are marked in text. Every so often, the numerical values of the surviving numbers are written in at the positions they apply to: 422,000 -> 400,000 -> 175,000 -> 145,000 -> 121,000 -> 100,000 [MOSCOW]. Then, on the retreat: [MOSCOW] 100,000 -> 96,000 -> 87,000 -> 55,000 -> 37,000 -> 24,000 -> 20,000 -> 50,000 [picking up 30,000 out of an original 50,000 who'd peeled off from the original advance early on and were now in retreat] -> 28,000 -> 12,000 -> 14,000 -> 8,000 -> 4,000 -> 10,000. (The increments in the final leg are due to gathering up other remnants in retreat). Along the retreating arm, selected points are linked to a graph below the main graphic which shows -- as a graph -- the temperature (the final ingredient in the disaster) in degrees C (decreasing fairly steadily from 0degC to -20degC). The graph itself is also annotated with the value of the temperature at each relevant point, along with the date, and linked to the "army graphic" by a line. This is a complex (but, after a few minutes thought, clear) combination of graphical and textual/numerical information. It succeeds brilliantly in its intention, which would have been unachievable if any principle that graphical and numerical information should be separated had been adhered to. had been adopted, then (at most) places on the graphic would be marked with say letters "A", "B", "C", and on other pages would be tables associating with each letter the residual size of the army, the date, the temperature, and the placename. Nothing more distracting, in terms of expecting the user to reconstruct the impression intended to be conveyed, can be imagined. One can, with an "editor's eye", criticise some details of the implementation of Minard's design. The hatching on the "advancing" section interferes with the legibility of the placenames on it (but of course Minard would not have had nice easy colour backgrounds available to him in 1861). The "typeface" is poorly legible in itself. The orientations of many of the numerical annotations are so variable that it requires unnecessary effort to read them. But these are details which can be (at least now) put right, with enhanced clarity, thus vindicating even more strongly the original concept. They are issues of detail in style and implementation. For a re-working which does attend to such details in a modern style, see: http://www.ddg.com/LIS/InfoDesignF96/Kelvin/Napoleon/map.html and then see how attention to such details improves the effect.> Before asking how to get R to produce a graph that looks like one > from a spreadsheet, you should study: > http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html and > some of the links from there. You may also want to run the following > in > R: > >> library(fortunes) >> fortune(120) > > In general I like OpenOffice, my one main complaint is that when faced > with the decision between doing something right or the same way as > microsoft, they have not always made the right decision.If anything, there should be a Law: Thou Shalt Not Even Think Of Producing A Graph That Looks Like Anything From A Spreadsheet. At any rate, not until spreadsheets give you much finer control and choice of the details of their graphics.> Hope this gives you something to think about,It did indeed! I would add that graphics I produce myself (with or without numeric/textual annotations) are hand-crafted. On this approach, even R's good graphical output is treated as "draft". The ultimate end result is composed directly from the numerical data associated with the elements in the graphic, as exported from R. It takes time, of course. Whether to add such annotations, and, if so, how; and whether and how to embellish the graphics with colour, etc., are decided at the time in terms of the information which it is desired to communicate, and evaluated by trying to look at it with an "new eye", to judge what another viewer's impression might be. In short, it is a matter of careful and thoughtful *design*. Where, of course "thoughtful" means "thinking about it" -- one thing that spreadsheets inhibit, because a) Even if you do think about it, you're not going to find it easy to implement the results of your thoughts (if they're any good); b) Spreadsheets readily induce the naive (especially beginning) user into the habit of trusting that the writers of spreadsheet software have thought through all those nasty implementation technicalities and have created an "expert system" which looks after drawing the graph according to best practice and with all necessary sophistication. Look! Isn't it clever!! This habit, once (all too easily) acquired, is difficult to kick. Patrick Burns's deliberate use of "addiction" is apt. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 08-Aug-07 Time: 22:40:19 ------------------------------ XFMail ------------------------------
Frank E Harrell Jr
2007-Aug-09 12:35 UTC
[R] Subject: Re: how to include bar values in a barplot?
Ted.Harding at manchester.ac.uk wrote:> Greg, I'm going to join issue with your here! Not that I'll go near > advocating "Excel-style" graphics (abominable, and the Patrick Burns > URL which you cite is remarkable in its restraint). Also, I'm aware > that this is potential flame-war territory -- again, I want to avoid > that too. > > However, this is the second time you have intervened on this theme > (previously Mon 6 August), along with John Kane on Wed 1 August and > again today on similar lines, and I think it's time an alternative > point of view was presented, to counteract (I hope usefully) what > seems to be a draconianly prescriptive approach to the presentation > of information.---snip--- Ted, You make many excellent points and provide much food for thought. I still think that Greg's points are valid too, and in this particular case, bar plots are a bad choice and adding numbers at variable heights causes a perception error as I wrote previously. Thanks for your elaboration on this important subject. Frank> > On 07-Aug-07 21:37:50, Greg Snow wrote: >> Generally adding the numbers to a graph accomplishes 2 things: >> >> 1) it acts as an admission that your graph is a failure > > Generally, I disagree. Different elements in a display serve different > purposes, according to the psychological aspects of visual preception.. . . -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
Ted, Thanks for your thoughts. I don't take it as the start of a flame war (I don't want that either). My original intent was to get the original posters out of the mode of thinking they want to match what the spreadsheet does and into thinking about what message they are trying to get across. To get them (and possibly others) thinking I made the statements a bit more bold than my actual position (I did include a couple of qualifiers). Now that there has been a couple of days to think about it, your post adds some good depth to the discussion. I think the most important point (which I think we agree on) is not to just add something to a graph because you can (or someone else did), but to think through if it is benificial or not (which will depend on the graph, data, questions, etc.). There are ways to combine graphs and tables, sparklines are an upcoming way of including the power of graphs into a table. Another approach for the bar graph example would be to first replace the bargraph with a dotplot, then put the numbers into the margin so that they are properly lined up and not distracting from the points. I still think that anytime anyone is tempted to add data values to a graph they should ask themselves if that is an admission that the graph is not appropriate and would be better replaced by either a table (if the goal really is to look up specific values) or a better graph. Sometimes the answer will be yes, the question of interest, or the obvious follow-up question, will be answered by adding some additional information. Then the next question should be: which information to include? And where to put it? Can you imagine what Minard's graph would have looked like if he had included the numbers every time the total changed by 100, and put the temperatures as numbers instead of a line graph in the main plot at every 1 degree change? Thanks for adding depth to the discussion, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at intermountainmail.org (801) 408-8111> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Ted.Harding at manchester.ac.uk > Sent: Wednesday, August 08, 2007 3:53 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Subject: Re: how to include bar values in a barplot? > > Greg, I'm going to join issue with your here! Not that I'll > go near advocating "Excel-style" graphics (abominable, and > the Patrick Burns URL which you cite is remarkable in its > restraint). Also, I'm aware that this is potential flame-war > territory -- again, I want to avoid that too. > > However, this is the second time you have intervened on this > theme (previously Mon 6 August), along with John Kane on Wed > 1 August and again today on similar lines, and I think it's > time an alternative point of view was presented, to > counteract (I hope usefully) what seems to be a draconianly > prescriptive approach to the presentation of information. > > On 07-Aug-07 21:37:50, Greg Snow wrote: > > Generally adding the numbers to a graph accomplishes 2 things: > > > > 1) it acts as an admission that your graph is a failure > > Generally, I disagree. Different elements in a display serve > different purposes, according to the psychological aspects of > visual preception. > > Sizes, proportions, colours etc. of shapes (bars in a > histogram, the marks representing points in a scatterplot, > ... ) are interpreted, so to speak, "intuitively" -- the > resulting perception is formed by processes which are hard to > ascertain consciously, and the overall effect can only be > ascertained by looking at it, and noting what impression one > has formed. They stimulate mental responses in the domain of > perception of spatial relationships. > > Numbers, and text, on the other hand, while still shapes from > the optical point of view, up to the point of their impact on > the retina, provoke different perceptions. They are > interpreted "analytically" > stimulating mental responses in the domains of language and number. > > There is no Law whatever which requires that the two must be > separated. > > It may be that adding any annotation to a graph or diagram > will interfere with the "intuitive" imterpretation that the > diagram is intended to stimulate, with no associated benefit. > > It may be that presenting numerical/textual information > within a graphical/diagrammatic context will interfere with > the "analytic" > interpretation wich is desired, with no associated benefit. > > In such cases, it is clearly (and as a matter of fact to be > decided in each case) better to separate the two apsects. > > It may, however, be that both can be combined in such a way > that each enhances the other; and also the simultaneous > perception of both aspects induces a "cartesian-product" > richness of interpretation where each element of the > graphical presentation combines with each element of the > textual/numerical presentation to generate a perception which > could not possibly have been realised if they had been > presented separately. This, too, is a matter to be decided in > each case. > > On that basis, if a graph without numbers fails to stimulate > a desired impression which could have been stimulated by > adding the numbers to the graph, then the graph without > numbers is a failure. > > > 2) it converts the graph into a poorly laid out table (with > a colorful > > and distracting background) > > > > In general it is better to find an appropriate graph that > does convey > > the information that is intended or if a table is more appropriate, > > then replace it with a well laid out table (or both). > > There is an implication here that the information conveyed by > a graph, and the information conveyed by a table, are > mutually exclusive. > And that it then follows: Thou Shalt Not Allow The One To > Corrupt The Other. While this has the appearance of a Law, it > is (for reasons I have sketched above) a Law which is not > *generally* applicable. > > > Remember that the role of tables is to look up specific > values and the > > role of graphs is to give a good overview. > > I would agree with this only to the following extent: > > Tables allow *only* the look-up of values. > Graphs (modulo the capacity of the eye/brain to more or less > precisely judge relative magnitudes) only allow a "good overview". > > I would not agree that these are their exclusive roles. > > The role of Hamlet is to agonise over revenge for his father's death. > The role of Ophelia is to embody the "love interest" in the play. > > This does not imply that there should be parallel > performances of "Hamlet" on two different stages, with the > audience trooping from one to the other according to which > character is currently at the centre of the action. It > actually works better when they're all up there at once, interacting! > > > The books by William Cleveland and Tufte have a lot of good > advice on > > these issues. > > Since you mention Tufte, I commend the admiring discussion in > his book "The Visual Display of Quantitative Information", > Chapter 1 (Graphical Excellence), section "Narrative Graphics > of Space and Time" (pp. 40-41 in the edition which I have) of > Minard's graphical representation of what happened to > Napoleon's army in the course of its advance on, and retreat > from, Moscow. > > An impression of the original can be formed from the rather > small version displayed on Tufte's website at the top of > http://www.edwardtufte.com/tufte/posters > The version in the book is much clearer. > > Here we see the two aspects of "intuitive" and spatial > perception, and textual/numeric "analytical" perception, > happily combined on the one display in such a way that the > two interact richly. > > Overlaid on the geographical pathway of the army is a broad > band, like a river (with branches), whose breadth at any > point represents the surviving numbers of the army. The > advancing part is cross-hatched, the retreating part is solid > black. Place-names and rivers are marked in text. Every so > often, the numerical values of the surviving numbers are > written in at the positions they apply to: > 422,000 -> 400,000 -> 175,000 -> 145,000 -> 121,000 -> > 100,000 [MOSCOW]. > > Then, on the retreat: > [MOSCOW] 100,000 -> 96,000 -> 87,000 -> 55,000 -> 37,000 -> 24,000 > -> 20,000 -> 50,000 [picking up 30,000 out of an original 50,000 > who'd peeled off from the original advance early on and were > now in retreat] -> 28,000 -> 12,000 -> 14,000 -> 8,000 -> > 4,000 -> 10,000. > > (The increments in the final leg are due to gathering up > other remnants in retreat). > > Along the retreating arm, selected points are linked to a > graph below the main graphic which shows -- as a graph -- the > temperature (the final ingredient in the disaster) in degrees > C (decreasing fairly steadily from 0degC to -20degC). > > The graph itself is also annotated with the value of the > temperature at each relevant point, along with the date, and > linked to the "army graphic" by a line. > > This is a complex (but, after a few minutes thought, clear) > combination of graphical and textual/numerical information. > It succeeds brilliantly in its intention, which would have > been unachievable if any principle that graphical and > numerical information should be separated had been adhered to. > > had been adopted, then (at most) places on the graphic would > be marked with say letters "A", "B", "C", and on other pages > would be tables associating with each letter the residual > size of the army, the date, the temperature, and the > placename. Nothing more distracting, in terms of expecting > the user to reconstruct the impression intended to be > conveyed, can be imagined. > > One can, with an "editor's eye", criticise some details of > the implementation of Minard's design. The hatching on the "advancing" > section interferes with the legibility of the placenames on > it (but of course Minard would not have had nice easy colour > backgrounds available to him in 1861). The "typeface" is > poorly legible in itself. > The orientations of many of the numerical annotations are so > variable that it requires unnecessary effort to read them. > But these are details which can be (at least now) put right, > with enhanced clarity, thus vindicating even more strongly > the original concept. They are issues of detail in style and > implementation. > > For a re-working which does attend to such details in a modern style, > see: > > http://www.ddg.com/LIS/InfoDesignF96/Kelvin/Napoleon/map.html > > and then see how attention to such details improves the effect. > > > Before asking how to get R to produce a graph that looks > like one from > > a spreadsheet, you should study: > > > http://www.burns-stat.com/pages/Tutor/spreadsheet_addiction.html and > > some of the links from there. You may also want to run the > following > > in > > R: > > > >> library(fortunes) > >> fortune(120) > > > > In general I like OpenOffice, my one main complaint is that > when faced > > with the decision between doing something right or the same way as > > microsoft, they have not always made the right decision. > > If anything, there should be a Law: Thou Shalt Not Even Think > Of Producing A Graph That Looks Like Anything From A Spreadsheet. > At any rate, not until spreadsheets give you much finer > control and choice of the details of their graphics. > > > Hope this gives you something to think about, > > It did indeed! I would add that graphics I produce myself > (with or without numeric/textual annotations) are > hand-crafted. On this approach, even R's good graphical > output is treated as "draft". > The ultimate end result is composed directly from the > numerical data associated with the elements in the graphic, > as exported from R. It takes time, of course. > > Whether to add such annotations, and, if so, how; and whether > and how to embellish the graphics with colour, etc., are > decided at the time in terms of the information which it is > desired to communicate, and evaluated by trying to look at it > with an "new eye", to judge what another viewer's impression might be. > > In short, it is a matter of careful and thoughtful *design*. > > Where, of course "thoughtful" means "thinking about it" -- > one thing that spreadsheets inhibit, because > > a) Even if you do think about it, you're not going to find it easy > to implement the results of your thoughts (if they're any good); > b) Spreadsheets readily induce the naive (especially beginning) > user into the habit of trusting that the writers of spreadsheet > software have thought through all those nasty implementation > technicalities and have created an "expert system" which looks > after drawing the graph according to best practice and with all > necessary sophistication. Look! Isn't it clever!! > > This habit, once (all too easily) acquired, is difficult to kick. > Patrick Burns's deliberate use of "addiction" is apt. > > Best wishes, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 08-Aug-07 Time: 22:40:19 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Quoting Greg Snow <Greg.Snow at intermountainmail.org>:> My original intent was to get the original posters out of the mode of > thinking they want to match what the spreadsheet does and into thinking > about what message they are trying to get across. To get them (and > possibly others) thinking I made the statements a bit more bold than my > actual position (I did include a couple of qualifiers).As an original poster (and a brand new user of R), I would like to comment on the educational experience I have just received. ;) The discussion was interesting and enlightening, and gives some good ideas about the ways (tables, graphs, graphs with numbers etc.) to get the data accross to the ones one is presenting to. I see some of you guys do feel quite strongly about it, which is fine for me. I do not. I usually care for barplot aesthetics and informativeness more than for visual simplicity. That may change in time :) I see R graphical capabilities are huge but hard to access at times - that is when spreadsheet seems preferrable. For example, as a user of Linux I still cannot figure out why the fonts (and graphics in general) look much more ugly on R in Linux than they do in R on Windows - no smoothing, sub-pixell hinting, anything like that. That is what my next free time homework on R will be about :) Sincerely Donatas Glodenis PhD candidate Department of Sociology of the Faculty of Philosophy Vilnius University Lithuania