Dear R-devel list members, Probably not an opportune time for this, given the immanent release of 2.0.0, but I was just reminded (while preparing a lecture) of a function that I find useful and that I though might be a candidate for the utils package: The function, which I call some(), behaves like head() and tail(), from which it is adapted, except that it samples from an object (e.g., rows of a data frame). I like to use some() to get a quick look at data. Perhaps there's some other already existing function that I'm not aware of that does the same thing. Regards, John -------------- snip --------------------- # adapted from head() and tail() some <- function(x, ...) UseMethod("some") some.default <- function(x, n=10, ...){ len <- length(x) ans <- x[sort(sample(len, min(n, len)))] if (length(dim(x)) == 1) array(ans, n, list(names(ans))) else ans } some.matrix <- function(x, n=10, ...){ nr <- nrow(x) x[sort(sample(nr, min(n, nr))), , drop = FALSE] } some.data.frame <- function(x, n=10, ...){ nr <- nrow(x) x[sort(sample(nr, min(n, nr))), , drop=FALSE] }
John Fox <jfox <at> mcmaster.ca> writes: : : Dear R-devel list members, : : Probably not an opportune time for this, given the immanent release of : 2.0.0, but I was just reminded (while preparing a lecture) of a function : that I find useful and that I though might be a candidate for the utils : package: The function, which I call some(), behaves like head() and tail(), : from which it is adapted, except that it samples from an object (e.g., rows : of a data frame). : : I like to use some() to get a quick look at data. Perhaps there's some other : already existing function that I'm not aware of that does the same thing. Its cute but you could do it on vectors and data frames with 2 function calls. First get some test data: data(iris) data(state) # Now we have: head(sample(iris)) # data frame head(sample(data.frame(state.x77))) # matrix head(sample(letters)) # vector The only nuisance is that sample samples from the elements of matrices rather than from their rows thereby necessitating the conversion in the middle call to head(sample(...)). Perhaps an alternate suggestion would be to modify sample so it becomes an S3 generic with methods for matrices and data frames such that sample.matrix samples from the rows of a matrix and sample.data.frame samples from the rows of a data.frame. Then (1) the above idiom becomes consistent across the above mentioned classes. (2) This would also avoid burdening the base with an extra function and would (3) provide for the possibility of extending sample to other classes.
> -----Original Message----- > From: r-devel-bounces@stat.math.ethz.ch > [mailto:r-devel-bounces@stat.math.ethz.ch] On Behalf Of Gabor > Grothendieck > Sent: Friday, September 17, 2004 12:52 PM > To: r-devel@stat.math.ethz.ch > Subject: Re: [Rd] Function some() > > John Fox <jfox <at> mcmaster.ca> writes:. . .> > > Its cute but you could do it on vectors and data frames with > 2 function calls. First get some test data: > > data(iris) > data(state) > > # Now we have: > > head(sample(iris)) # data frame > head(sample(data.frame(state.x77))) # matrix > head(sample(letters)) # vector >A possible disadvantage of this approach is that it permutes the entire, potentially large, object before picking the presumably small sample.> The only nuisance is that sample samples from the elements of > matrices rather than from their rows thereby necessitating > the conversion in the middle call to head(sample(...)). > > Perhaps an alternate suggestion would be to modify sample so > it becomes an S3 generic with methods for matrices and data > frames such that sample.matrix samples from the rows of a > matrix and sample.data.frame samples from the rows of a > data.frame. Then (1) the above idiom becomes consistent > across the above mentioned classes. (2) This would also > avoid burdening the base with an extra function and would (3) > provide for the possibility of extending sample to other classes. >This occurred to me, too [as did providing a random argument to head()], but seemed a more radical proposal than introducing a simple new generic. Regards, John> ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
John Fox <jfox <at> mcmaster.ca> writes: : : > -----Original Message----- : > From: r-devel-bounces <at> stat.math.ethz.ch : > [mailto:r-devel-bounces <at> stat.math.ethz.ch] On Behalf Of Gabor : > Grothendieck : > Sent: Friday, September 17, 2004 12:52 PM : > To: r-devel <at> stat.math.ethz.ch : > Subject: Re: [Rd] Function some() : > : > John Fox <jfox <at> mcmaster.ca> writes: : . . . : > : > : > Its cute but you could do it on vectors and data frames with : > 2 function calls. First get some test data: : > : > data(iris) : > data(state) : > : > # Now we have: : > : > head(sample(iris)) # data frame : > head(sample(data.frame(state.x77))) # matrix : > head(sample(letters)) # vector : > : : A possible disadvantage of this approach is that it permutes the entire, : potentially large, object before picking the presumably small sample. : : > The only nuisance is that sample samples from the elements of : > matrices rather than from their rows thereby necessitating : > the conversion in the middle call to head(sample(...)). : > : > Perhaps an alternate suggestion would be to modify sample so : > it becomes an S3 generic with methods for matrices and data : > frames such that sample.matrix samples from the rows of a : > matrix and sample.data.frame samples from the rows of a : > data.frame. Then (1) the above idiom becomes consistent : > across the above mentioned classes. (2) This would also : > avoid burdening the base with an extra function and would (3) : > provide for the possibility of extending sample to other classes. : > : : This occurred to me, too [as did providing a random argument to head()], but : seemed a more radical proposal than introducing a simple new generic. I think I was wrong regarding my examples. sample(iris) samples from the columns of iris, not the rows, thus my examples do not work except for the vector case. One would have to do this: iris[sample(150,10),] some() is nice since it reduces the mental load of iris[sample(150,10),] yet I wonder if its worth the feature creep and if we are to make change whether it would not be better to fix sample, as suggested, in which case head(sample(iris)) does work as would: sample(iris, 10) If backward compatibility were an issue we could define a new name for the new generic so that sample remains unchanged. Optionally, sample could be made defunct, over time.