thr3ads.net - R help - [R] Is this kind of removing of elements from data.frame (in)efficient? [Sep 2016]

If this information is useful, please help other people find it:
Share via:

mviljamaa

2016-Sep-13 07:37 UTC

[R] Is this kind of removing of elements from data.frame (in)efficient?

So I'm a beginner in R and I was testing the removal of elements from a 
data.frame.

The way I remove the element(s) with the minimum value in kid_score 
variable is to do:

kidmomhs <- data[kidmomhs$kid_score != min(kidmomhs$kid_score),]

So now kidmomhs is the same data, but without the row(s) with the 
minimum value of kid_score.

Judging by the syntax this looks as if R might be creating a copy of the 
data array, just without the rows that were removed.

The question however is, is this the most efficient way to remove 
elements from data structures in R? And is the above inefficient? Does 
the above create copies of almost the entire data structure?

In other programming languages I've become accustomed to doing removal 
of elements by changing them to NULL and then e.g. reordering the data 
structure. Rather than having to take copies of almost the entire data 
structure.

Jeff Newmiller

2016-Sep-13 14:43 UTC

head link

[R] Is this kind of removing of elements from data.frame (in)efficient?

Your example is not reproducible [1], so the apparent error in it is
distracting... perhaps you meant

kidmomhs <- kidmomhs[kidmomhs$kid_score != min(kidmomhs$kid_score),]

yes, this creates a copy, and because the object name is re-used on the left
side the original memory gets returned to the memory pool the next time garbage
collection occurs.

While this may seem inefficient, this (functional) programming model is much
less likely to lead to programming errors than in-place approaches. My advice is
to refrain from premature optimization and get the algorithm right, then later
you could rewrite using something like the data.table package if the standard
functional model is too slow for a particular application.

In addition, I tend to find that not re-using the object name (not releasing the
memory) aids debugging and traceability, which if you are looking to make
reproducible research is often an advantage.

[1] see e.g. http://adv-r.had.co.nz/Reproducibility.html
-- 
Sent from my phone. Please excuse my brevity.

On September 13, 2016 12:37:18 AM PDT, mviljamaa <mviljamaa at kapsi.fi>
wrote:>So I'm a beginner in R and I was testing the removal of elements from a
>
>data.frame.
>
>The way I remove the element(s) with the minimum value in kid_score 
>variable is to do:
>
>kidmomhs <- data[kidmomhs$kid_score != min(kidmomhs$kid_score),]
>
>So now kidmomhs is the same data, but without the row(s) with the 
>minimum value of kid_score.
>
>Judging by the syntax this looks as if R might be creating a copy of
>the 
>data array, just without the rows that were removed.
>
>The question however is, is this the most efficient way to remove 
>elements from data structures in R? And is the above inefficient? Does 
>the above create copies of almost the entire data structure?
>
>In other programming languages I've become accustomed to doing removal 
>of elements by changing them to NULL and then e.g. reordering the data 
>structure. Rather than having to take copies of almost the entire data 
>structure.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

R help - Sep 2016 - Is this kind of removing of elements from data.frame (in)efficient?

[R] Is this kind of removing of elements from data.frame (in)efficient?

[R] Is this kind of removing of elements from data.frame (in)efficient?