I get tables with millions of rows. For plotting to a screen-size jpg, obviously just about 1000 points are enough. Instead of feeding plot() the original millions of rows, I'd rather shrink the original dataframe, using some kind of the following interpolation: -- split dataframe into chunks of N rows each, e.g. 1000 rows each -- compute average for each column -- issue one new row of those averages into the shrunk result Is there any existing package to do that in R? Otherwise, which R idioms are most effective to achieve that? Cheers, Alexy
Alexy Khrabrov wrote:>I get tables with millions of rows. For plotting to a screen-size >jpg, obviously just about 1000 points are enough. Instead of feeding >plot() the original millions of rows, I'd rather shrink the original >dataframe, using some kind of the following interpolation: > >-- split dataframe into chunks of N rows each, e.g. 1000 rows each >-- compute average for each column >-- issue one new row of those averages into the shrunk result > >Is there any existing package to do that in R? Otherwise, which R >idioms are most effective to achieve that? > >Cheers, >Alexy > > > >Hi, if you want to extract relevant information from such a table, splitting rows in arbitrary chuncks may not solve your problem. Ordinations in reduced space are designed for that kind of task, but hierachical clustering may also help. See Legendre & Legendre (1998, Numerical Ecology, Elsevier) for examples of such methods in Ecology, and the R packages ade4, vegan and hclust. Regards, Thibaut. -- ###################################### Thibaut JOMBART CNRS UMR 5558 - Laboratoire de Biom?trie et Biologie Evolutive Universite Lyon 1 43 bd du 11 novembre 1918 69622 Villeurbanne Cedex T?l. : 04.72.43.29.35 Fax : 04.72.43.13.88 jombart at biomserv.univ-lyon1.fr http://lbbe.univ-lyon1.fr/-Jombart-Thibaut-.html?lang=en http://pbil.univ-lyon1.fr/software/adegenet/
For me the largest challenge with such data sets is the extra time that it takes to develop the appropriate graph, given the time it takes to plot each prototype. Once I have got the graph scale decorations etc correct then the time for the final plot is almost irrelevant. For this reason I often take a random subset of the data rows using sample and use this reduced set to develop the graph before switching to the full dataset. Since your data are monotonocally decreasing however I suggest that you take every 100th row instead-this should produce a graph indistinguishable from the original at that resolution. -Alex Brown On 21 Nov 2007, at 10:24, Thibaut Jombart <jombart at biomserv.univ-lyon1.fr > wrote:> Alexy Khrabrov wrote: > >> I get tables with millions of rows. For plotting to a screen-size >> jpg, obviously just about 1000 points are enough. Instead of feeding >> plot() the original millions of rows, I'd rather shrink the original >> dataframe, using some kind of the following interpolation: >> >> -- split dataframe into chunks of N rows each, e.g. 1000 rows each >> -- compute average for each column >> -- issue one new row of those averages into the shrunk result >> >> Is there any existing package to do that in R? Otherwise, which R >> idioms are most effective to achieve that? >> >> Cheers, >> Alexy >> >> >> >> > Hi, > > if you want to extract relevant information from such a table, > splitting > rows in arbitrary chuncks may not solve your problem. Ordinations in > reduced space are designed for that kind of task, but hierachical > clustering may also help. See Legendre & Legendre (1998, Numerical > Ecology, Elsevier) for examples of such methods in Ecology, and the R > packages ade4, vegan and hclust. > > Regards, > > Thibaut. > > -- > ###################################### > Thibaut JOMBART > CNRS UMR 5558 - Laboratoire de Biom?trie et Biologie Evolutive > Universite Lyon 1 > 43 bd du 11 novembre 1918 > 69622 Villeurbanne Cedex > T?l. : 04.72.43.29.35 > Fax : 04.72.43.13.88 > jombart at biomserv.univ-lyon1.fr > http://lbbe.univ-lyon1.fr/-Jombart-Thibaut-.html?lang=en > http://pbil.univ-lyon1.fr/software/adegenet/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Try 'hexbin' for plotting this many points. On Nov 21, 2007 5:05 AM, Alexy Khrabrov <alexy.khrabrov at gmail.com> wrote:> I get tables with millions of rows. For plotting to a screen-size > jpg, obviously just about 1000 points are enough. Instead of feeding > plot() the original millions of rows, I'd rather shrink the original > dataframe, using some kind of the following interpolation: > > -- split dataframe into chunks of N rows each, e.g. 1000 rows each > -- compute average for each column > -- issue one new row of those averages into the shrunk result > > Is there any existing package to do that in R? Otherwise, which R > idioms are most effective to achieve that? > > Cheers, > Alexy > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?