thr3ads.net - R help - [R] shrink a dataframe for plotting [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Alexy Khrabrov

2007-Nov-21 10:05 UTC

[R] shrink a dataframe for plotting

I get tables with millions of rows.  For plotting to a screen-size  
jpg, obviously just about 1000 points are enough.  Instead of feeding  
plot() the original millions of rows, I'd rather shrink the original  
dataframe, using some kind of the following interpolation:

-- split dataframe into chunks of N rows each, e.g. 1000 rows each
-- compute average for each column
-- issue one new row of those averages into the shrunk result

Is there any existing package to do that in R?  Otherwise, which R  
idioms are most effective to achieve that?

Cheers,
Alexy

Thibaut Jombart

2007-Nov-21 10:24 UTC

head link

[R] shrink a dataframe for plotting

Alexy Khrabrov wrote:
>I get tables with millions of rows.  For plotting to a screen-size  
>jpg, obviously just about 1000 points are enough.  Instead of feeding  
>plot() the original millions of rows, I'd rather shrink the original  
>dataframe, using some kind of the following interpolation:
>
>-- split dataframe into chunks of N rows each, e.g. 1000 rows each
>-- compute average for each column
>-- issue one new row of those averages into the shrunk result
>
>Is there any existing package to do that in R?  Otherwise, which R  
>idioms are most effective to achieve that?
>
>Cheers,
>Alexy
>
>
>  
>Hi,

if you want to extract relevant information from such a table, splitting 
rows in arbitrary chuncks may not solve your problem. Ordinations in 
reduced space are designed for that kind of task, but hierachical 
clustering may also help. See Legendre & Legendre (1998, Numerical 
Ecology, Elsevier) for examples of such methods in Ecology, and the R 
packages ade4, vegan and hclust.

Regards,

Thibaut.

-- 
######################################
Thibaut JOMBART
CNRS UMR 5558 - Laboratoire de Biom?trie et Biologie Evolutive
Universite Lyon 1
43 bd du 11 novembre 1918
69622 Villeurbanne Cedex
T?l. : 04.72.43.29.35
Fax : 04.72.43.13.88
jombart at biomserv.univ-lyon1.fr
http://lbbe.univ-lyon1.fr/-Jombart-Thibaut-.html?lang=en
http://pbil.univ-lyon1.fr/software/adegenet/

Alex Brown

2007-Nov-21 11:14 UTC

head link

[R] shrink a dataframe for plotting

For me the largest challenge with such data sets is the extra time  
that it takes to develop the appropriate graph, given the time it  
takes to plot each prototype. Once I have got the graph scale  
decorations etc correct then the time for the final plot is almost  
irrelevant.

For this reason I often take a random subset of the data rows using  
sample and use this reduced set to develop the graph before switching  
to the full dataset.

Since your data are monotonocally decreasing however I suggest that  
you take every 100th row instead-this should produce a graph  
indistinguishable from the original at that resolution.

-Alex Brown

On 21 Nov 2007, at 10:24, Thibaut Jombart <jombart at biomserv.univ-lyon1.fr 
 > wrote:
> Alexy Khrabrov wrote:
>
>> I get tables with millions of rows.  For plotting to a screen-size
>> jpg, obviously just about 1000 points are enough.  Instead of feeding
>> plot() the original millions of rows, I'd rather shrink the
original
>> dataframe, using some kind of the following interpolation:
>>
>> -- split dataframe into chunks of N rows each, e.g. 1000 rows each
>> -- compute average for each column
>> -- issue one new row of those averages into the shrunk result
>>
>> Is there any existing package to do that in R?  Otherwise, which R
>> idioms are most effective to achieve that?
>>
>> Cheers,
>> Alexy
>>
>>
>>
>>
> Hi,
>
> if you want to extract relevant information from such a table,  
> splitting
> rows in arbitrary chuncks may not solve your problem. Ordinations in
> reduced space are designed for that kind of task, but hierachical
> clustering may also help. See Legendre & Legendre (1998, Numerical
> Ecology, Elsevier) for examples of such methods in Ecology, and the R
> packages ade4, vegan and hclust.
>
> Regards,
>
> Thibaut.
>
> -- 
> ######################################
> Thibaut JOMBART
> CNRS UMR 5558 - Laboratoire de Biom?trie et Biologie Evolutive
> Universite Lyon 1
> 43 bd du 11 novembre 1918
> 69622 Villeurbanne Cedex
> T?l. : 04.72.43.29.35
> Fax : 04.72.43.13.88
> jombart at biomserv.univ-lyon1.fr
> http://lbbe.univ-lyon1.fr/-Jombart-Thibaut-.html?lang=en
> http://pbil.univ-lyon1.fr/software/adegenet/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

jim holtman

2007-Nov-21 15:57 UTC

head link

[R] shrink a dataframe for plotting

Try 'hexbin' for plotting this many points.

On Nov 21, 2007 5:05 AM, Alexy Khrabrov <alexy.khrabrov at gmail.com>
wrote:> I get tables with millions of rows.  For plotting to a screen-size
> jpg, obviously just about 1000 points are enough.  Instead of feeding
> plot() the original millions of rows, I'd rather shrink the original
> dataframe, using some kind of the following interpolation:
>
> -- split dataframe into chunks of N rows each, e.g. 1000 rows each
> -- compute average for each column
> -- issue one new row of those averages into the shrunk result
>
> Is there any existing package to do that in R?  Otherwise, which R
> idioms are most effective to achieve that?
>
> Cheers,
> Alexy
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Nov 2007 - shrink a dataframe for plotting

[R] shrink a dataframe for plotting

[R] shrink a dataframe for plotting

[R] shrink a dataframe for plotting

[R] shrink a dataframe for plotting

Seemingly Similar Threads