mviljamaa
2016-Sep-13 07:37 UTC
[R] Is this kind of removing of elements from data.frame (in)efficient?
So I'm a beginner in R and I was testing the removal of elements from a data.frame. The way I remove the element(s) with the minimum value in kid_score variable is to do: kidmomhs <- data[kidmomhs$kid_score != min(kidmomhs$kid_score),] So now kidmomhs is the same data, but without the row(s) with the minimum value of kid_score. Judging by the syntax this looks as if R might be creating a copy of the data array, just without the rows that were removed. The question however is, is this the most efficient way to remove elements from data structures in R? And is the above inefficient? Does the above create copies of almost the entire data structure? In other programming languages I've become accustomed to doing removal of elements by changing them to NULL and then e.g. reordering the data structure. Rather than having to take copies of almost the entire data structure.
Jeff Newmiller
2016-Sep-13 14:43 UTC
[R] Is this kind of removing of elements from data.frame (in)efficient?
Your example is not reproducible [1], so the apparent error in it is distracting... perhaps you meant kidmomhs <- kidmomhs[kidmomhs$kid_score != min(kidmomhs$kid_score),] yes, this creates a copy, and because the object name is re-used on the left side the original memory gets returned to the memory pool the next time garbage collection occurs. While this may seem inefficient, this (functional) programming model is much less likely to lead to programming errors than in-place approaches. My advice is to refrain from premature optimization and get the algorithm right, then later you could rewrite using something like the data.table package if the standard functional model is too slow for a particular application. In addition, I tend to find that not re-using the object name (not releasing the memory) aids debugging and traceability, which if you are looking to make reproducible research is often an advantage. [1] see e.g. http://adv-r.had.co.nz/Reproducibility.html -- Sent from my phone. Please excuse my brevity. On September 13, 2016 12:37:18 AM PDT, mviljamaa <mviljamaa at kapsi.fi> wrote:>So I'm a beginner in R and I was testing the removal of elements from a > >data.frame. > >The way I remove the element(s) with the minimum value in kid_score >variable is to do: > >kidmomhs <- data[kidmomhs$kid_score != min(kidmomhs$kid_score),] > >So now kidmomhs is the same data, but without the row(s) with the >minimum value of kid_score. > >Judging by the syntax this looks as if R might be creating a copy of >the >data array, just without the rows that were removed. > >The question however is, is this the most efficient way to remove >elements from data structures in R? And is the above inefficient? Does >the above create copies of almost the entire data structure? > >In other programming languages I've become accustomed to doing removal >of elements by changing them to NULL and then e.g. reordering the data >structure. Rather than having to take copies of almost the entire data >structure. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.