Wacek Kusnierczyk
2009-Jan-02 19:54 UTC
[Rd] [Fwd: Re: [R] Randomly remove condition-selected rows from a matrix]
Following Duncan's suggestion, I forward the below to R-devel. vQ -------- Original Message -------- Subject: Re: [R] Randomly remove condition-selected rows from a matrix Date: Fri, 02 Jan 2009 10:34:52 -0500 From: Duncan Murdoch <murdoch at stats.uwo.ca> To: Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> CC: R help <R-help at stat.math.ethz.ch> References: <79CAFBDD-4BB8-4C9D-A0E9-54E280458510 at gmail.com> <8b356f880812300920o19d18aeo47dc31f087c3f36 at mail.gmail.com> <DA6ECC19-C786-4C02-B246-4B613726BC7F at gmail.com> <8b356f880812311042la28aef3t81ad09a3b14ce65 at mail.gmail.com> <495E2D95.9040502 at idi.ntnu.no> On 02/01/2009 10:07 AM, Wacek Kusnierczyk wrote:> Stavros Macrakis wrote: >> On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron >> <carnivorescience at gmail.com> wrote: >> >>>> m[-sample(which(m[,1]<8 & m[,2]>12),2),] >>>> >>> Supposing I sample only one row among the ones matching my criteria. Then >>> consider the case where there is just one row matching this criteria. Sure, >>> there is no need to sample, but the instruction would still be executed. >>> Then if this row index is 15, my instruction becomes which(15,1), and this >>> can gives me any row from 1 to 15, which is not correct. I have to make a >>> condition in case there is only one row matching the criteria. >>> >> Yes, this is a (documented!) design flaw in 'sample' -- see the man page. >> >> For some reason, the designers of R have chosen to document the flaw >> and leave it up to individual users to work around it rather than fix >> it definitively. A related case is sample(c(),0), which gives an >> error rather than giving an empty vector, though in general R deals >> with empty vectors correctly (e.g. sum(c()) => 0). >> >> > > interestingly, ?sample says: > > " > 'sample' takes a sample of the specified size from the elements of > 'x' using either with or without replacement. > > x: Either a (numeric, complex, character or logical) vector of > more than one element from which to choose, or a positive > integer. > > If 'x' has length 1, is numeric (in the sense of 'is.numeric') and > 'x >= 1', sampling takes place from '1:x'. _Note_ that this > convenience feature may lead to undesired behaviour when 'x' is of > varying length 'sample(x)'. See the 'resample()' example below. > > " > > yet the following works, even though x has length 1 and is *not* numeric: > > x = "foolme" > is.numeric(x) > sample(x, 1) > sample(x) > > x = NA > is.numeric(NA) > sample(x, 1) > sample(x) > > is this a bug in the code, or a bug in the documentation? > > > >> To my mind, it is bizarre to have an important basic function which >> works for some argument lengths but not others. The convenience of >> being able to write sample(5,2) for sample(1:5,2) hardly seems worth >> inflicting inconsistency on all users -- but perhaps one of the >> designers of R/S can enlighten us on the design rationale here. >> >> > > hopefully.This is more of an R-devel sort of question. My guess is that this is in the S blue book, but I don't have a copy here to check. Duncan Murdoch
Mark.Bravington at csiro.au
2009-Jan-02 20:55 UTC
[Rd] [Fwd: Re: [R] Randomly remove condition-selected rows from a matrix]
This is a recurring problem and from previous correspondence it seems unlikely that "sample" itself will ever be changed (and having myself been on the wrong end of a number of non-back-compatible changes in R, that's fine with me!). To forestall future confusion, my suggestion is to add a function "rsample" defined as below, which has first argument "n" (number of values to return) consistent with the other "r..." random-generating functions. rsample <- function( n=length(pop), pop, replace=FALSE, prob=NULL) pop[ sample( seq_along( pop)-1, size=n, replace=replace, prob=prob)+1] The default for n is not necessary, but handy in case one is just trying to reorder a "pop" argument that is defined on-the-fly (as in Wacek's example). The -1 & +1 in the body prevent 'sample' from getting confused. Perhaps this should be patched up to cope with the case n==length(pop)==0 that Duncan mentions rsample <- function( n=length(pop), pop, replace=FALSE, prob=NULL) if( n>0) pop[ sample( seq_along( pop)-1, size=n, replace=replace, prob=prob)+1] else if(n==0) pop[0] else stop( "invalid 'n' argument") Mark Bravington ________________________________________ From: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org] On Behalf Of Wacek Kusnierczyk [Waclaw.Marcin.Kusnierczyk at idi.ntnu.no] Sent: 03 January 2009 06:54 To: r-devel at r-project.org Subject: [Rd] [Fwd: Re: [R] Randomly remove condition-selected rows from a matrix] Following Duncan's suggestion, I forward the below to R-devel. vQ -------- Original Message -------- Subject: Re: [R] Randomly remove condition-selected rows from a matrix Date: Fri, 02 Jan 2009 10:34:52 -0500 From: Duncan Murdoch <murdoch at stats.uwo.ca> To: Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> CC: R help <R-help at stat.math.ethz.ch> References: <79CAFBDD-4BB8-4C9D-A0E9-54E280458510 at gmail.com> <8b356f880812300920o19d18aeo47dc31f087c3f36 at mail.gmail.com> <DA6ECC19-C786-4C02-B246-4B613726BC7F at gmail.com> <8b356f880812311042la28aef3t81ad09a3b14ce65 at mail.gmail.com> <495E2D95.9040502 at idi.ntnu.no> On 02/01/2009 10:07 AM, Wacek Kusnierczyk wrote:> Stavros Macrakis wrote: >> On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron >> <carnivorescience at gmail.com> wrote: >> >>>> m[-sample(which(m[,1]<8 & m[,2]>12),2),] >>>> >>> Supposing I sample only one row among the ones matching my criteria. Then >>> consider the case where there is just one row matching this criteria. Sure, >>> there is no need to sample, but the instruction would still be executed. >>> Then if this row index is 15, my instruction becomes which(15,1), and this >>> can gives me any row from 1 to 15, which is not correct. I have to make a >>> condition in case there is only one row matching the criteria. >>> >> Yes, this is a (documented!) design flaw in 'sample' -- see the man page. >> >> For some reason, the designers of R have chosen to document the flaw >> and leave it up to individual users to work around it rather than fix >> it definitively. A related case is sample(c(),0), which gives an >> error rather than giving an empty vector, though in general R deals >> with empty vectors correctly (e.g. sum(c()) => 0). >> >> > > interestingly, ?sample says: > > " > 'sample' takes a sample of the specified size from the elements of > 'x' using either with or without replacement. > > x: Either a (numeric, complex, character or logical) vector of > more than one element from which to choose, or a positive > integer. > > If 'x' has length 1, is numeric (in the sense of 'is.numeric') and > 'x >= 1', sampling takes place from '1:x'. _Note_ that this > convenience feature may lead to undesired behaviour when 'x' is of > varying length 'sample(x)'. See the 'resample()' example below. > > " > > yet the following works, even though x has length 1 and is *not* numeric: > > x = "foolme" > is.numeric(x) > sample(x, 1) > sample(x) > > x = NA > is.numeric(NA) > sample(x, 1) > sample(x) > > is this a bug in the code, or a bug in the documentation? > > > >> To my mind, it is bizarre to have an important basic function which >> works for some argument lengths but not others. The convenience of >> being able to write sample(5,2) for sample(1:5,2) hardly seems worth >> inflicting inconsistency on all users -- but perhaps one of the >> designers of R/S can enlighten us on the design rationale here. >> >> > > hopefully.This is more of an R-devel sort of question. My guess is that this is in the S blue book, but I don't have a copy here to check. Duncan Murdoch ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel