thr3ads.net - R devel - [Rd] sample on data.frame [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Stavros Macrakis

2010-Feb-19 21:05 UTC

[Rd] sample on data.frame

Currently, sample of a data.frame is a sample of the columns:

e.g. sample(data.frame(a=1,b=2:3,c=4),2) => data.frame(b=2:3,c=c(4,4))

I'd have thought it would be much more common to want a sample of the rows.

It's easy enough to define an appropriate function for this:

sample.data.frame <- function(x,size,replace=FALSE,prob=NULL)
  # no auto-dispatch; sample is not a generic function
  {
    x[sample(nrow(x),size,replace,prob),]
  }

Would it be a bad idea for this to be the standard behavior for sample?

There is always, of course, the backwards-compatiblity argument.  Is sample
in fact used in practice to select random columns?  I realize it is hard to
quantify that, but perhaps there is some wisdom in the community about that.

            -s

	[[alternative HTML version deleted]]

Sean O'Riordain

2010-Feb-20 08:44 UTC

head link

[Rd] sample on data.frame

Good morning Stavos,

I currently use the following definition in my own environment.

sample.df <- function (df, n = 3) {
    df[sample(nrow(df), min(nrow(df), n)), ]
}

I also added in the possibility of returning n sequential rows which I used
when examining address files... but I haven't used it in ages :-)

Kind regards,
Sean O'Riordain
Dublin
Ireland

On Fri, Feb 19, 2010 at 9:05 PM, Stavros Macrakis
<macrakis@alum.mit.edu>wrote:
> Currently, sample of a data.frame is a sample of the columns:
>
> e.g. sample(data.frame(a=1,b=2:3,c=4),2) => data.frame(b=2:3,c=c(4,4))
>
> I'd have thought it would be much more common to want a sample of the
rows.
>
> It's easy enough to define an appropriate function for this:
>
> sample.data.frame <- function(x,size,replace=FALSE,prob=NULL)
>  # no auto-dispatch; sample is not a generic function
>  {
>    x[sample(nrow(x),size,replace,prob),]
>  }
>
> Would it be a bad idea for this to be the standard behavior for sample?
>
> There is always, of course, the backwards-compatiblity argument.  Is sample
> in fact used in practice to select random columns?  I realize it is hard to
> quantify that, but perhaps there is some wisdom in the community about
> that.
>
>            -s
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more apparently analagous threads

R devel - Feb 2010 - sample on data.frame

[Rd] sample on data.frame

[Rd] sample on data.frame

Apparently Analagous Threads