thr3ads.net - R devel - [Rd] Improved Data Aggregation and Summary Statistics in R [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Sebastian Martin Krantz

2019-Feb-26 13:25 UTC

[Rd] Improved Data Aggregation and Summary Statistics in R

Dear Developers,

Having spent time developing and thinking about how data aggregation and
summary statistics can be enhanced in R, I would like to present my
ideas/efforts in the form of two commands:

The first, which for now I called 'collap', is an upgrade of aggregate
that
accommodates and extends the functionality of aggregate in various
respects, most importantly to work with multilevel and multi-type data,
multiple function calls, highly customized aggregation tasks, a much
greater flexibility in the passing of inputs and tidy output.

The second function, 'qsu', is an advanced and flexible summary command
for
cross-sectional and multilevel (panel) data (i.e. it can provide overall,
between and within entities statistics, and allows for grouping, custom
functions and transformations). It also provides a quick method to compute
and output within-transformed data.

Both commands are efficiently built from core R, but provide for optional
integration with data.table, which renders them extremely fast on large
datasets. An explanation of the syntax, a demonstration and benchmark
results are provided in the attached vignette.

Since both commands accommodate existing functionality while adding
significant basic functionality, I though that their addition to the stats
package would be a worthwhile consideration. I am happy for your feedback.

Best regards,

Sebastian Krantz

-------------- next part --------------
A non-text attachment was scrubbed...
Name: collap & qsu vignette.pdf
Type: application/pdf
Size: 569278 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20190226/eb4dd92d/attachment.pdf>

Iñaki Ucar

2019-Feb-27 09:23 UTC

head link

[Rd] Improved Data Aggregation and Summary Statistics in R

On Wed, 27 Feb 2019 at 09:02, Sebastian Martin Krantz
<sebastian.krantz at graduateinstitute.ch> wrote:>
> Dear Developers,
>
> Having spent time developing and thinking about how data aggregation and
> summary statistics can be enhanced in R, I would like to present my
> ideas/efforts in the form of two commands:
>
> The first, which for now I called 'collap', is an upgrade of
aggregate that
> accommodates and extends the functionality of aggregate in various
> respects, most importantly to work with multilevel and multi-type data,
> multiple function calls, highly customized aggregation tasks, a much
> greater flexibility in the passing of inputs and tidy output.
>
> The second function, 'qsu', is an advanced and flexible summary
command for
> cross-sectional and multilevel (panel) data (i.e. it can provide overall,
> between and within entities statistics, and allows for grouping, custom
> functions and transformations). It also provides a quick method to compute
> and output within-transformed data.
>
> Both commands are efficiently built from core R, but provide for optional
> integration with data.table, which renders them extremely fast on large
> datasets. An explanation of the syntax, a demonstration and benchmark
> results are provided in the attached vignette.
Looks interesting. Sorry if it's there and I didn't find it: is there
any package implementing these functions so that we can try them?

I?aki

Joris Meys

2019-Feb-27 10:17 UTC

head link

[Rd] Improved Data Aggregation and Summary Statistics in R

Dear Sebastian,

Initially I was a bit hesitant to think about yet another way to summarize
data, but your illustrations convinced me this is actually a great addition
to the toolset currently available in different R packages. Many of us have
written custom functions to get the required tables for specific data sets,
but this would reduce that effort to simply using the right collap() call.

Like Inaki, I'm very interested in trying it out if you have the code
available somewhere.

Cheers
Joris





On Wed, Feb 27, 2019 at 9:01 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
> Dear Developers,
>
> Having spent time developing and thinking about how data aggregation and
> summary statistics can be enhanced in R, I would like to present my
> ideas/efforts in the form of two commands:
>
> The first, which for now I called 'collap', is an upgrade of
aggregate that
> accommodates and extends the functionality of aggregate in various
> respects, most importantly to work with multilevel and multi-type data,
> multiple function calls, highly customized aggregation tasks, a much
> greater flexibility in the passing of inputs and tidy output.
>
> The second function, 'qsu', is an advanced and flexible summary
command for
> cross-sectional and multilevel (panel) data (i.e. it can provide overall,
> between and within entities statistics, and allows for grouping, custom
> functions and transformations). It also provides a quick method to compute
> and output within-transformed data.
>
> Both commands are efficiently built from core R, but provide for optional
> integration with data.table, which renders them extremely fast on large
> datasets. An explanation of the syntax, a demonstration and benchmark
> results are provided in the attached vignette.
>
> Since both commands accommodate existing functionality while adding
> significant basic functionality, I though that their addition to the stats
> package would be a worthwhile consideration. I am happy for your feedback.
>
> Best regards,
>
> Sebastian Krantz
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2018-2019
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]

Sebastian Martin Krantz

2019-Feb-27 10:44 UTC

head link

[Rd] Improved Data Aggregation and Summary Statistics in R

Dear I?aki and Joris,

thank you for the positive feedback! I had attached a code file to the
post, but apparently it was removed.
I will attach it again to this e-mail, otherwise both vignette and code can
be downloaded from the following link:
https://www.dropbox.com/sh/s0k1tiz7el55g1q/AACpri-nruXjcMwUnNcHoycKa?dl=0
Best,
Sebastian

On Wed, 27 Feb 2019 at 11:14, Joris Meys <jorismeys at gmail.com> wrote:
> Dear Sebastian,
>
> Initially I was a bit hesitant to think about yet another way to summarize
> data, but your illustrations convinced me this is actually a great addition
> to the toolset currently available in different R packages. Many of us have
> written custom functions to get the required tables for specific data sets,
> but this would reduce that effort to simply using the right collap() call.
>
> Like Inaki, I'm very interested in trying it out if you have the code
> available somewhere.
>
> Cheers
> Joris
>
>
>
>
>
> On Wed, Feb 27, 2019 at 9:01 AM Sebastian Martin Krantz <
> sebastian.krantz at graduateinstitute.ch> wrote:
>
>> Dear Developers,
>>
>> Having spent time developing and thinking about how data aggregation
and
>> summary statistics can be enhanced in R, I would like to present my
>> ideas/efforts in the form of two commands:
>>
>> The first, which for now I called 'collap', is an upgrade of
aggregate
>> that
>> accommodates and extends the functionality of aggregate in various
>> respects, most importantly to work with multilevel and multi-type data,
>> multiple function calls, highly customized aggregation tasks, a much
>> greater flexibility in the passing of inputs and tidy output.
>>
>> The second function, 'qsu', is an advanced and flexible summary
command
>> for
>> cross-sectional and multilevel (panel) data (i.e. it can provide
overall,
>> between and within entities statistics, and allows for grouping, custom
>> functions and transformations). It also provides a quick method to
compute
>> and output within-transformed data.
>>
>> Both commands are efficiently built from core R, but provide for
optional
>> integration with data.table, which renders them extremely fast on large
>> datasets. An explanation of the syntax, a demonstration and benchmark
>> results are provided in the attached vignette.
>>
>> Since both commands accommodate existing functionality while adding
>> significant basic functionality, I though that their addition to the
stats
>> package would be a worthwhile consideration. I am happy for your
feedback.
>>
>> Best regards,
>>
>> Sebastian Krantz
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
> --
> Joris Meys
> Statistical consultant
>
> Department of Data Analysis and Mathematical Modelling
> Ghent University
> Coupure Links 653, B-9000 Gent (Belgium)
>
>
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>
>
> -----------
> Biowiskundedagen 2018-2019
> http://www.biowiskundedagen.ugent.be/
>
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>

Duncan Murdoch

2019-Feb-27 11:48 UTC

head link

[Rd] Improved Data Aggregation and Summary Statistics in R

On 26/02/2019 8:25 a.m., Sebastian Martin Krantz wrote:> Dear Developers,
> 
> Having spent time developing and thinking about how data aggregation and
> summary statistics can be enhanced in R, I would like to present my
> ideas/efforts in the form of two commands:
> 
> The first, which for now I called 'collap', is an upgrade of
aggregate that
> accommodates and extends the functionality of aggregate in various
> respects, most importantly to work with multilevel and multi-type data,
> multiple function calls, highly customized aggregation tasks, a much
> greater flexibility in the passing of inputs and tidy output.
> 
> The second function, 'qsu', is an advanced and flexible summary
command for
> cross-sectional and multilevel (panel) data (i.e. it can provide overall,
> between and within entities statistics, and allows for grouping, custom
> functions and transformations). It also provides a quick method to compute
> and output within-transformed data.
> 
> Both commands are efficiently built from core R, but provide for optional
> integration with data.table, which renders them extremely fast on large
> datasets. An explanation of the syntax, a demonstration and benchmark
> results are provided in the attached vignette.
> 
> Since both commands accommodate existing functionality while adding
> significant basic functionality, I though that their addition to the stats
> package would be a worthwhile consideration. I am happy for your feedback.
Generally the R Core group is reluctant to incorporate new functions 
into the base packages.  Each function that is added adds to their work, 
and they already have too much to do.  (I am no longer a member of R 
Core, but I don't think things have changed since I retired.)

It is much easier for them if volunteers publish functions themselves, 
via contributed packages.

Nowadays Github provides a very convenient platform on which you can 
develop a package containing your functions.  If other users find bugs 
or have suggested improvements, it's very easy for them to send those to 
you, and you can make the fixes available immediately.  Once you are 
satisfied that it is stable, you can submit it to CRAN, and anyone using 
R can easily install it.

If you find the prospect of writing a package daunting, you shouldn't. 
It's actually quite easy, especially if you are using RStudio or ESS (or 
some other helpful front-end.)  Hadley Wickham's book
<http://r-pkgs.had.co.nz/> is a pretty accessible description of a 
development strategy.  (It's not the only strategy, but lots of people 
use it.)

Duncan Murdoch

Sebastian Martin Krantz

2019-Feb-28 09:06 UTC

head link

[Rd] Improved Data Aggregation and Summary Statistics in R

Thanks to all who gave feedback so far, there is now a version of the
package on Github, it can be installed by

remotes::install_github("SebKrantz/collapse")

further feedback is still very welcome!


On Wed, 27 Feb 2019 at 12:48, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
> On 26/02/2019 8:25 a.m., Sebastian Martin Krantz wrote:
> > Dear Developers,
> >
> > Having spent time developing and thinking about how data aggregation
and
> > summary statistics can be enhanced in R, I would like to present my
> > ideas/efforts in the form of two commands:
> >
> > The first, which for now I called 'collap', is an upgrade of
aggregate
> that
> > accommodates and extends the functionality of aggregate in various
> > respects, most importantly to work with multilevel and multi-type
data,
> > multiple function calls, highly customized aggregation tasks, a much
> > greater flexibility in the passing of inputs and tidy output.
> >
> > The second function, 'qsu', is an advanced and flexible
summary command
> for
> > cross-sectional and multilevel (panel) data (i.e. it can provide
overall,
> > between and within entities statistics, and allows for grouping,
custom
> > functions and transformations). It also provides a quick method to
> compute
> > and output within-transformed data.
> >
> > Both commands are efficiently built from core R, but provide for
optional
> > integration with data.table, which renders them extremely fast on
large
> > datasets. An explanation of the syntax, a demonstration and benchmark
> > results are provided in the attached vignette.
> >
> > Since both commands accommodate existing functionality while adding
> > significant basic functionality, I though that their addition to the
> stats
> > package would be a worthwhile consideration. I am happy for your
> feedback.
>
> Generally the R Core group is reluctant to incorporate new functions
> into the base packages.  Each function that is added adds to their work,
> and they already have too much to do.  (I am no longer a member of R
> Core, but I don't think things have changed since I retired.)
>
> It is much easier for them if volunteers publish functions themselves,
> via contributed packages.
>
> Nowadays Github provides a very convenient platform on which you can
> develop a package containing your functions.  If other users find bugs
> or have suggested improvements, it's very easy for them to send those
to
> you, and you can make the fixes available immediately.  Once you are
> satisfied that it is stable, you can submit it to CRAN, and anyone using
> R can easily install it.
>
> If you find the prospect of writing a package daunting, you shouldn't.
> It's actually quite easy, especially if you are using RStudio or ESS
(or
> some other helpful front-end.)  Hadley Wickham's book
> <http://r-pkgs.had.co.nz/> is a pretty accessible description of a
> development strategy.  (It's not the only strategy, but lots of people
> use it.)
>
> Duncan Murdoch
>
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more maybe matching threads

R devel - Feb 2019 - Improved Data Aggregation and Summary Statistics in R

[Rd] Improved Data Aggregation and Summary Statistics in R

[Rd] Improved Data Aggregation and Summary Statistics in R

[Rd] Improved Data Aggregation and Summary Statistics in R

[Rd] Improved Data Aggregation and Summary Statistics in R

[Rd] Improved Data Aggregation and Summary Statistics in R

[Rd] Improved Data Aggregation and Summary Statistics in R

Reasonably Related Threads