thr3ads.net - R devel - [Rd] split.data.frame [Dec 2009]

If this information is useful, please help other people find it:
Share via:

Romain Francois

2009-Dec-15 09:06 UTC

[Rd] split.data.frame

Hello,

I very much enjoy "with" and "subset" semantics for data
frames and was
wondering if we could have something similar with split, basically by 
evaluating the second argument "with" the data frame :

split.data.frame
function(x, f, drop = FALSE, ...){
         call <- match.call( )
         fcall <- call( "with", data = call[["x"]], expr
= call[["f"]] )
         ff <- eval( fcall, parent.frame(1) )

         lapply(split(seq_len(nrow(x)), ff, drop = drop, ...), 
function(ind) x[ind, , drop = FALSE])
}


 > split( df, y )
$`1`
   x y
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1

$`2`
     x y
6   6 2
7   7 2
8   8 2
9   9 2
10 10 2

 > split( df, x > 3 )
$`FALSE`
   x y
1 1 1
2 2 1
3 3 1

$`TRUE`
     x y
4   4 1
5   5 1
6   6 2
7   7 2
8   8 2
9   9 2
10 10 2


Romain

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/HlX9 : new package : bibtex
|- http://tr.im/Gq7i : ohloh
`- http://tr.im/FtUu : new package : highlight

Felix Andrews

2009-Dec-15 22:35 UTC

head link

[Rd] split.data.frame

I agree, I would definitely appreciate that.

A simpler implementation:

split.data.frame <- function(x, f, drop = FALSE, ...)
{
    ff <- eval(substitute(f), x, parent.frame())
    lapply(split(seq_len(nrow(x)), ff, drop = drop, ...),
function(ind) x[ind, , drop = FALSE])
}


df <- data.frame(x = 1:10, y = rep(1:2, each = 5))
split( df, df$y )
split( df, y )
split( df, x > 3 )


2009/12/15 Romain Francois <romain.francois at
dbmail.com>:> Hello,
>
> I very much enjoy "with" and "subset" semantics for
data frames and was
> wondering if we could have something similar with split, basically by
> evaluating the second argument "with" the data frame :
>
> split.data.frame
> function(x, f, drop = FALSE, ...){
> ? ? ? ?call <- match.call( )
> ? ? ? ?fcall <- call( "with", data = call[["x"]],
expr = call[["f"]] )
> ? ? ? ?ff <- eval( fcall, parent.frame(1) )
>
> ? ? ? ?lapply(split(seq_len(nrow(x)), ff, drop = drop, ...), function(ind)
> x[ind, , drop = FALSE])
> }
>
>
>> split( df, y )
> $`1`
> ?x y
> 1 1 1
> 2 2 1
> 3 3 1
> 4 4 1
> 5 5 1
>
> $`2`
> ? ?x y
> 6 ? 6 2
> 7 ? 7 2
> 8 ? 8 2
> 9 ? 9 2
> 10 10 2
>
>> split( df, x > 3 )
> $`FALSE`
> ?x y
> 1 1 1
> 2 2 1
> 3 3 1
>
> $`TRUE`
> ? ?x y
> 4 ? 4 1
> 5 ? 5 1
> 6 ? 6 2
> 7 ? 7 2
> 8 ? 8 2
> 9 ? 9 2
> 10 10 2
>
>
> Romain
>
> --
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://tr.im/HlX9 : new package : bibtex
> |- http://tr.im/Gq7i : ohloh
> `- http://tr.im/FtUu : new package : highlight
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Felix Andrews / ???
Postdoctoral Fellow
Integrated Catchment Assessment and Management (iCAM) Centre
Fenner School of Environment and Society [Bldg 48a]
The Australian National University
Canberra ACT 0200 Australia
M: +61 410 400 963
T: + 61 2 6125 4670
E: felix.andrews at anu.edu.au
CRICOS Provider No. 00120C
-- 
http://www.neurofractal.org/felix/

Peter Dalgaard

2009-Dec-15 23:14 UTC

head link

[Rd] split.data.frame

Romain Francois wrote:> Hello,
> 
> I very much enjoy "with" and "subset" semantics for
data frames and was
> wondering if we could have something similar with split, basically by 
> evaluating the second argument "with" the data frame :
I seem to recall that this idea was considered and rejected when the 
current split.data.frame was written (10 years ago!). The main reasons 
were that

- it's not really THAT hard to evaluate a single splitting expression 
using with() or eval()

- not all applications will have the splitting factor inside the df to 
split ( split(df[-1], df[[1]]) for a simple case)

- if you need a computed splitting factor, there's a risk of inadvertent 
variable capture. I.e., if you inside a function do

   ....
   grp <- ...whatever...
   spl <- split(x, grp)
   ....

and x has a variable called grp, what do you get?





-- 
    O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Possibly Parallel Threads

Search for more seemingly similar threads

R devel - Dec 2009 - split.data.frame

[Rd] split.data.frame

[Rd] split.data.frame

[Rd] split.data.frame

Possibly Parallel Threads