Hi, is there a way to take a data frame with 100+ columns and large data set to do efficient exploratory analysis in R with pairs? I find using pairs on the whole matrix is slow and the resulting matrix is tiny. Also the variable of interest for me is a binary var Y or N . Is there an efficient way to graphically view many variable relationships that does not look teeny ? I could do pairs 10 at a time but this seems too brute force. thanks Dhruv [[alternative HTML version deleted]]
One idea: if the primary variable of interest is a categorical (binary), I would rather look at univariate plots for each of your 100 variables, grouped by the primary one. e.g. library(latticeExtra) marginal.plot(~ myBigDat, data = myBigData, groups = myBinaryVar, auto.key = TRUE, layout = c(4, 4)) (This is a convenient interface to lattice::densityplot and lattice::dotplot) If you view 16 such densityplots per page, that still gives you 7 pages. You could use playwith() (from playwith package) to scroll through the pages. -Felix 2008/10/20 Sharma, Dhruv <Dhruv.Sharma at penfed.org>:> Hi, > is there a way to take a data frame with 100+ columns and large data set to do efficient exploratory analysis in R with pairs? > > I find using pairs on the whole matrix is slow and the resulting matrix is tiny. > > Also the variable of interest for me is a binary var Y or N . > > Is there an efficient way to graphically view many variable relationships that does not look teeny ? > > I could do pairs 10 at a time but this seems too brute force. > > thanks > Dhruv > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Felix Andrews / ??? http://www.neurofractal.org/felix/ 3358 543D AAC6 22C2 D336 80D9 360B 72DD 3E4C F5D8
If you want to do efficient exploratory data analysis on this kind of dataset, then interactive graphics with parallel coordinate plots (ipcp in iplots) should help. Of course, it depends what you mean by large. It might be worth looking at the book "Graphics of Large Datasets" for some ideas. Antony Unwin Professor of Computer-Oriented Statistics and Data Analysis, Mathematics Institute, University of Augsburg, 86135 Augsburg, Germany Tel: + 49 821 5982218> > From: "Sharma, Dhruv" <Dhruv.Sharma@PenFed.org> > Date: 19 October 2008 10:58:53 pm GMT+02:00 > To: <r-help@r-project.org> > Subject: [R] pairs plots in R > > > Hi, > is there a way to take a data frame with 100+ columns and large > data set to do efficient exploratory analysis in R with pairs? > > I find using pairs on the whole matrix is slow and the resulting > matrix is tiny. > > Also the variable of interest for me is a binary var Y or N . > > Is there an efficient way to graphically view many variable > relationships that does not look teeny ? > > I could do pairs 10 at a time but this seems too brute force. > > thanks > Dhruv[[alternative HTML version deleted]]
Thanks Felix. Regards, Dhruv -----Original Message----- From: foolish.android at gmail.com [mailto:foolish.android at gmail.com] On Behalf Of Felix Andrews Sent: Sunday, October 19, 2008 11:37 PM To: Sharma, Dhruv Cc: r-help at r-project.org Subject: Re: [R] pairs plots in R One idea: if the primary variable of interest is a categorical (binary), I would rather look at univariate plots for each of your 100 variables, grouped by the primary one. e.g. library(latticeExtra) marginal.plot(~ myBigDat, data = myBigData, groups = myBinaryVar, auto.key = TRUE, layout = c(4, 4)) (This is a convenient interface to lattice::densityplot and lattice::dotplot) If you view 16 such densityplots per page, that still gives you 7 pages. You could use playwith() (from playwith package) to scroll through the pages. -Felix 2008/10/20 Sharma, Dhruv <Dhruv.Sharma at penfed.org>:> Hi, > is there a way to take a data frame with 100+ columns and large data set to do efficient exploratory analysis in R with pairs? > > I find using pairs on the whole matrix is slow and the resulting matrix is tiny. > > Also the variable of interest for me is a binary var Y or N . > > Is there an efficient way to graphically view many variable relationships that does not look teeny ? > > I could do pairs 10 at a time but this seems too brute force. > > thanks > Dhruv > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Felix Andrews / ??? http://www.neurofractal.org/felix/ 3358 543D AAC6 22C2 D336 80D9 360B 72DD 3E4C F5D8