thr3ads.net - R help - [R] Best Programming Practices regarding data frames [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Ramiro Barrantes

2012-Aug-01 21:19 UTC

[R] Best Programming Practices regarding data frames

Hello,

I come from using different programming languages (C++, Mathematica, Perl) but
have been using R extensively for several months.  I see the data frame as a key
piece of the language and wanted to inquire people's experience regarding
its use.

Say you have a data frame D

D <- data.frame(some columns)

and you define a function that needs the information from this data frame and is
supposed to return a calculation based on some columns of such data frame D.

func <- function(d) {}
#EFFECT: Does calculation X from some columns of d

QUESTION: Would you consider better practice to return the same data.frame but
expanded, or would you return a small data frame that consists of the newly
computed columns?

Some might say, either way, personal preference.  But after using and seeing
other's code for some time, I am thinking that returning the result that
consists of ONLY the relevant columns is a better practice as it defines the
function as only returning what it was intended to return, and leaves it up to
the user of the function to do whatever they were intending to do with it
(including naming of the new columns, adding them to a data frame, etc.).  This
might be a question for a computer programming theory group, but if anybody has
any insight from their experience please share.

Thanks in advance,

Ramiro

	[[alternative HTML version deleted]]

Bert Gunter

2012-Aug-01 21:31 UTC

head link

[R] Best Programming Practices regarding data frames

I have no answer to your question, but note:

1. You do not need to return a data frame at all, of course. Most
functions do not -- e.g., say, lm() .
2. See ?with and ?within for perhaps relevant functionality.

-- Bert

On Wed, Aug 1, 2012 at 2:19 PM, Ramiro Barrantes
<ramiro at precisionbioassay.com> wrote:> Hello,
>
> I come from using different programming languages (C++, Mathematica, Perl)
but have been using R extensively for several months.  I see the data frame as a
key piece of the language and wanted to inquire people's experience
regarding its use.
>
> Say you have a data frame D
>
> D <- data.frame(some columns)
>
> and you define a function that needs the information from this data frame
and is supposed to return a calculation based on some columns of such data frame
D.
>
> func <- function(d) {}
> #EFFECT: Does calculation X from some columns of d
>
> QUESTION: Would you consider better practice to return the same data.frame
but expanded, or would you return a small data frame that consists of the newly
computed columns?
>
> Some might say, either way, personal preference.  But after using and
seeing other's code for some time, I am thinking that returning the result
that consists of ONLY the relevant columns is a better practice as it defines
the function as only returning what it was intended to return, and leaves it up
to the user of the function to do whatever they were intending to do with it
(including naming of the new columns, adding them to a data frame, etc.).  This
might be a question for a computer programming theory group, but if anybody has
any insight from their experience please share.
>
> Thanks in advance,
>
> Ramiro
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

R. Michael Weylandt

2012-Aug-01 22:13 UTC

head link

[R] Best Programming Practices regarding data frames

On Wed, Aug 1, 2012 at 4:19 PM, Ramiro Barrantes
<ramiro at precisionbioassay.com> wrote:> Hello,
>
> I come from using different programming languages (C++, Mathematica, Perl)
but have been using R extensively for several months.  I see the data frame as a
key piece of the language and wanted to inquire people's experience
regarding its use.
>
> Say you have a data frame D
>
> D <- data.frame(some columns)
>
> and you define a function that needs the information from this data frame
and is supposed to return a calculation based on some columns of such data frame
D.
>
> func <- function(d) {}
> #EFFECT: Does calculation X from some columns of d
>
> QUESTION: Would you consider better practice to return the same data.frame
but expanded, or would you return a small data frame that consists of the newly
computed columns?
I'd say return what you need, no more no less: and if you want to
reattach it to the input data, do that at the caller level, but don't
make it required: orthogonality and minimality and all that jazz....

As Bert points out, note that returning a data.frame is by no means
necessary -- they aren't "primitive" data structures like (atomic)
vectors and lists [we are in a Scheme dialect after all!], but they
are helpful and well supported. Use them liberally but no more than
necessary ;-)

Best,
Michael
>
> Some might say, either way, personal preference.  But after using and
seeing other's code for some time, I am thinking that returning the result
that consists of ONLY the relevant columns is a better practice as it defines
the function as only returning what it was intended to return, and leaves it up
to the user of the function to do whatever they were intending to do with it
(including naming of the new columns, adding them to a data frame, etc.).  This
might be a question for a computer programming theory group, but if anybody has
any insight from their experience please share.
>
> Thanks in advance,
>
> Ramiro
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jeff Newmiller

2012-Aug-01 22:24 UTC

head link

[R] Best Programming Practices regarding data frames

I think the vector is far more fundamental than the data frame. Most of the time
I write functions that return vectors, even if my input is a data frame. If I
need a large number of input vectors, I set up the input arguments to include a
data frame and additional named parameters with defaults for the column names I
will use. In the function I refer to the input columns using list indexing by
name (D[[somecolname]]). In the rare event that I return a data frame, it has to
include the appropriate "key columns" from the original, and will
usually have the minimum number of input columns, and all columns have new,
fixed (renamed) column names. (This usually is associated with the instantiation
of a class I have defined.)

The use of vectors in function returns allows the caller to manage which columns
make sense for the analysis at hand by tacking on new ones to the input data
frame.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Ramiro Barrantes <ramiro at precisionbioassay.com> wrote:
>Hello,
>
>I come from using different programming languages (C++, Mathematica,
>Perl) but have been using R extensively for several months.  I see the
>data frame as a key piece of the language and wanted to inquire
>people's experience regarding its use.
>
>Say you have a data frame D
>
>D <- data.frame(some columns)
>
>and you define a function that needs the information from this data
>frame and is supposed to return a calculation based on some columns of
>such data frame D.
>
>func <- function(d) {}
>#EFFECT: Does calculation X from some columns of d
>
>QUESTION: Would you consider better practice to return the same
>data.frame but expanded, or would you return a small data frame that
>consists of the newly computed columns?
>
>Some might say, either way, personal preference.  But after using and
>seeing other's code for some time, I am thinking that returning the
>result that consists of ONLY the relevant columns is a better practice
>as it defines the function as only returning what it was intended to
>return, and leaves it up to the user of the function to do whatever
>they were intending to do with it (including naming of the new columns,
>adding them to a data frame, etc.).  This might be a question for a
>computer programming theory group, but if anybody has any insight from
>their experience please share.
>
>Thanks in advance,
>
>Ramiro
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Aug 2012 - Best Programming Practices regarding data frames

[R] Best Programming Practices regarding data frames

[R] Best Programming Practices regarding data frames

[R] Best Programming Practices regarding data frames

[R] Best Programming Practices regarding data frames

Possibly Parallel Threads