Antony Unwin
2011-Feb-11 13:55 UTC
[R] Re. When is *interactive* data visualization useful to use?
Hello Tal, You asked *When is it helpful to use interactive plots? Either for data exploration (for ourselves) and data presentation (for a "client")?* My answer: It's helpful for checking data quality, for exploration with and without "clients", for checking results, and for data presenting. Notes: (1) It's difficult to explain interactive data visualization in print, demonstrations are so much more effective. (2) Interactive data visualization is fun, both for the analyst, and more important, for the dataset owners. You not only get better interaction with the data, you get better interaction with the scientists you cooperate with. They are prepared to contribute, because they can understand what is going on. That is not always the case with statistical models. (3) The key is not "animation" but "direct manipulation". The aim is to be able to directly interact with all statistical objects in a graphic: querying, linking, reordering, reformatting, zooming, whatever. (4) You write of point-based graphics, what about area-based graphics like histograms, barcharts and mosaicplots? For categorical data the ability to select groups and look at spineplots of other variables to compare proportions is very effective. (And don't forget linking to maps for spatial data.) (5) You mention outliers. How do you decide what is an outlier? Interactive parallel coordinate plots are extremely useful, either for identifying outliers or for checking ones found with an analytic approach. (6) Interactive data visualization is not in competition with other approaches, it complements them. Results found with models should be checked graphically and results found graphically should be checked analytically. Your comment about data dredging is important, though why people think this only happens with graphics and not with modelling approaches always puzzles me! (7) There are often interesting features of a dataset (not just errors and outlier groups) that can be found graphically that would be difficult or impossible to find analytically. Have a look at Interactive Graphics for Data Analysis: Principles and Examples by Martin Theus and Simon Urbanek (Chapman & Hall). There are some excellent explanations and case studies there. I could go on (and on), but what you really need is a good demo. Best regards Antony PS Have you reported the bugs in GGobi and Mondrian you have found to the software authors? Antony Unwin Professor of Computer-Oriented Statistics and Data Analysis, Mathematics Institute, University of Augsburg, 86135 Augsburg, Germany
Tal Galili
2011-Feb-12 09:48 UTC
[R] Re. When is *interactive* data visualization useful to use?
Hello Antony, Thank you very much for your detailed answer! All of your points are valid and interesting to reflect upon. Regarding your note number (6), it's a very good point - I didn't think of it. It might be argued that because interactive data visualization can be faster then analytical programming, it might invite more "visual hypothesis testing" without "counting your hypothesis" were as when doing it with printed output - it's much clearer what the number of your hypothesis tests were. Regarding your note number (7) - I'd be happy for (non spatial) examples for interesting patterns found by interactive methods. Regarding the bug reports. So far I've filed one for ggobi: http://code.google.com/p/ggobi-documentation/issues/detail?id=37 Although since the last time the software was built was 2008, I'm not sure if anyone is even going to respond to the ticket... Cheers, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Fri, Feb 11, 2011 at 3:55 PM, Antony Unwin <unwin@math.uni-augsburg.de>wrote:> Hello Tal, > > You asked *When is it helpful to use interactive plots? Either for data > exploration (for ourselves) and data presentation (for a "client")?* > > My answer: It's helpful for checking data quality, for exploration with and > without "clients", for checking results, and for data presenting. > > Notes: > (1) It's difficult to explain interactive data visualization in print, > demonstrations are so much more effective. > (2) Interactive data visualization is fun, both for the analyst, and more > important, for the dataset owners. You not only get better interaction with > the data, you get better interaction with the scientists you cooperate with. > They are prepared to contribute, because they can understand what is going > on. That is not always the case with statistical models. > (3) The key is not "animation" but "direct manipulation". The aim is to be > able to directly interact with all statistical objects in a graphic: > querying, linking, reordering, reformatting, zooming, whatever. > (4) You write of point-based graphics, what about area-based graphics like > histograms, barcharts and mosaicplots? For categorical data the ability to > select groups and look at spineplots of other variables to compare > proportions is very effective. (And don't forget linking to maps for spatial > data.) > (5) You mention outliers. How do you decide what is an outlier? > Interactive parallel coordinate plots are extremely useful, either for > identifying outliers or for checking ones found with an analytic approach. > (6) Interactive data visualization is not in competition with other > approaches, it complements them. Results found with models should be > checked graphically and results found graphically should be checked > analytically. Your comment about data dredging is important, though why > people think this only happens with graphics and not with modelling > approaches always puzzles me! > (7) There are often interesting features of a dataset (not just errors and > outlier groups) that can be found graphically that would be difficult or > impossible to find analytically. > > Have a look at Interactive Graphics for Data Analysis: Principles and > Examples by Martin Theus and Simon Urbanek (Chapman & Hall). There are some > excellent explanations and case studies there. > > I could go on (and on), but what you really need is a good demo. > > Best regards > > Antony > > PS Have you reported the bugs in GGobi and Mondrian you have found to the > software authors? > > Antony Unwin > Professor of Computer-Oriented Statistics and Data Analysis, > Mathematics Institute, > University of Augsburg, > 86135 Augsburg, Germany > > > >[[alternative HTML version deleted]]