Henrik Bengtsson
2021-Sep-17 12:38 UTC
[Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
> I?m curious, other than proper programming practice, why?Life's too short for troubleshooting silent mistakes - mine or others. While at it, searching the interwebs for use of set.seed(), gives mistakes/misunderstandings like using set.seed(<double>), e.g.> set.seed(6.1); sum(.Random.seed)[1] 73930104> set.seed(6.2); sum(.Random.seed)[1] 73930104 which clearly is not what the user expected. There are also a few cases of set.seed(<character>), e.g.> set.seed("42"); sum(.Random.seed)[1] -2119381568> set.seed(42); sum(.Random.seed)[1] -2119381568 which works just because as.numeric("42") is used. /Henrik On Fri, Sep 17, 2021 at 12:55 PM GILLIBERT, Andre <Andre.Gillibert at chu-rouen.fr> wrote:> > Hello, > > A vector with a length >= 2 to set.seed would probably be a bug. An error message will help the user to fix his R code. The bug may be accidental or due to bad understanding of the set.seed function. For instance, a user may think that the whole state of the PRNG can be passed to set.seed. > > The "if" instruction, emits a warning when the condition has length >= 2, because it is often a bug. I would expect a warning or error with set.seed(). > > Validating inputs and emitting errors early is a good practice. > > Just my 2 cents. > > Sincerely. > Andre GILLIBERT > > -----Message d'origine----- > De : R-devel [mailto:r-devel-bounces at r-project.org] De la part de Avraham Adler > Envoy? : vendredi 17 septembre 2021 12:07 > ? : Henrik Bengtsson > Cc : R-devel > Objet : Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent) > > Hi, Henrik. > > I?m curious, other than proper programming practice, why? > > Avi > > On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson < > henrik.bengtsson at gmail.com> wrote: > > > Hi, > > > > according to help("set.seed"), argument 'seed' to set.seed() should be: > > > > a single value, interpreted as an integer, or NULL (see ?Details?). > > > > From code inspection (src/main/RNG.c) and testing, it turns out that > > if you pass a 'seed' with length greater than one, it silently uses > > seed[1], e.g. > > > > > set.seed(1); sum(.Random.seed) > > [1] 4070365163 > > > set.seed(1:3); sum(.Random.seed) > > [1] 4070365163 > > > set.seed(1:100); sum(.Random.seed) > > [1] 4070365163 > > > > I'd like to suggest that set.seed() produces an error if length(seed) > > > 1. As a reference, for length(seed) == 0, we get: > > > > > set.seed(integer(0)) > > Error in set.seed(integer(0)) : supplied seed is not a valid integer > > > > /Henrik > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > -- > Sent from Gmail Mobile > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Duncan Murdoch
2021-Sep-17 13:10 UTC
[Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
I'd say a more serious problem would be using set.seed(.Random.seed), because the first entry codes for RNGkind, it hardly varies at all. So this sequence could really mislead someone: > set.seed(.Random.seed) > sum(.Random.seed) [1] 24428993419 # Use it to get a new .Random.seed value: > runif(1) [1] 0.3842704 > sum(.Random.seed) [1] -13435151647 # So let's make things really random, by using the new seed as a seed: > set.seed(.Random.seed) > sum(.Random.seed) [1] 24428993419 # Back to the original! Duncan Murdoch On 17/09/2021 8:38 a.m., Henrik Bengtsson wrote:>> I?m curious, other than proper programming practice, why? > > Life's too short for troubleshooting silent mistakes - mine or others. > > While at it, searching the interwebs for use of set.seed(), gives > mistakes/misunderstandings like using set.seed(<double>), e.g. > >> set.seed(6.1); sum(.Random.seed) > [1] 73930104 >> set.seed(6.2); sum(.Random.seed) > [1] 73930104 > > which clearly is not what the user expected. There are also a few > cases of set.seed(<character>), e.g. > >> set.seed("42"); sum(.Random.seed) > [1] -2119381568 >> set.seed(42); sum(.Random.seed) > [1] -2119381568 > > which works just because as.numeric("42") is used. > > /Henrik > > On Fri, Sep 17, 2021 at 12:55 PM GILLIBERT, Andre > <Andre.Gillibert at chu-rouen.fr> wrote: >> >> Hello, >> >> A vector with a length >= 2 to set.seed would probably be a bug. An error message will help the user to fix his R code. The bug may be accidental or due to bad understanding of the set.seed function. For instance, a user may think that the whole state of the PRNG can be passed to set.seed. >> >> The "if" instruction, emits a warning when the condition has length >= 2, because it is often a bug. I would expect a warning or error with set.seed(). >> >> Validating inputs and emitting errors early is a good practice. >> >> Just my 2 cents. >> >> Sincerely. >> Andre GILLIBERT >> >> -----Message d'origine----- >> De : R-devel [mailto:r-devel-bounces at r-project.org] De la part de Avraham Adler >> Envoy? : vendredi 17 septembre 2021 12:07 >> ? : Henrik Bengtsson >> Cc : R-devel >> Objet : Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent) >> >> Hi, Henrik. >> >> I?m curious, other than proper programming practice, why? >> >> Avi >> >> On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson < >> henrik.bengtsson at gmail.com> wrote: >> >>> Hi, >>> >>> according to help("set.seed"), argument 'seed' to set.seed() should be: >>> >>> a single value, interpreted as an integer, or NULL (see ?Details?). >>> >>> From code inspection (src/main/RNG.c) and testing, it turns out that >>> if you pass a 'seed' with length greater than one, it silently uses >>> seed[1], e.g. >>> >>>> set.seed(1); sum(.Random.seed) >>> [1] 4070365163 >>>> set.seed(1:3); sum(.Random.seed) >>> [1] 4070365163 >>>> set.seed(1:100); sum(.Random.seed) >>> [1] 4070365163 >>> >>> I'd like to suggest that set.seed() produces an error if length(seed) >>>> 1. As a reference, for length(seed) == 0, we get: >>> >>>> set.seed(integer(0)) >>> Error in set.seed(integer(0)) : supplied seed is not a valid integer >>> >>> /Henrik >>> >>> ______________________________________________ >>> R-devel at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >>> >> -- >> Sent from Gmail Mobile >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Avi Gross
2021-Sep-17 20:12 UTC
[Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)
R wobbles a bit as there is no normal datatype that is a singleton variable. Saying x <- 5 just creates a vector of current length 1. It is perfectly legal to then write x [2] <- 6 and so on. The vector lengthens. You can truncate it back to 1, if you wish: length(x) <- 1 So the question here is what happens if you supply more info than is needed? If it is an integer vector of length greater than one, should it ignore everything but the first entry? I note it happily accepts not-quite integers like TRUE and FALSE. it also accepts floating point numbers like 1.23 or 1.2e5. The goal seems to be to set a unique starting point, rounded or transformed if needed. The visible part of the function does not even look at the seed before calling the internal representation. So although superficially choosing the first integer in a vector makes some sense, it can be a problem if a program assumes the entire vector is consumed and perhaps hashed in some way to make a seed. If the program later changes parts of the vector other than the first entry, it may assume re-setting the seed gets something else and yet it may be exactly the same. So, yes, I suspect it is an ERROR to take anything that cannot be coerced by something like as.integer() into a vector of length 1. I have noted other places in R where I may get a warning when giving a longer vector that only the fist element will be used. Are they all problems that need to be addressed? Here is a short one:> x <- c(1:3) > if (x > 2) y <- TRUEWarning message: In if (x > 2) y <- TRUE : the condition has length > 1 and only the first element will be used> yError: object 'y' not found The above is not vectorized and makes the choice of x==1 and thus does not set y. Now a vectorized variant works as expected, making a vector of length 3 for y:> x[1] 1 2 3> y <- ifelse(x > 2, TRUE, FALSE) > y[1] FALSE FALSE TRUE I have no doubt fixing lots of this stuff, if indeed it is a fix, can break lots of existing code. Sure, it is not harmful to ask a programmer to always say x[1] to guarantee they are getting what they want, or to add a function like first(x) that does the same. R has some compromises or features I sometimes wonder about. If it had a concept of a numeric scalar, then some things that now happen might start being an error. What happens when you multiply a vector by a scalar as in 5*x is that every component of x is multiplied by 5. but x*x does componentwise multiplication. So say x is c(1:3) what should this do using a twosome times a threesome? x[1:2]*x [1] 1 4 3 Warning message: In x[1:2] * x : longer object length is not a multiple of shorter object length Is it recycling to get a 1 in pseudo-position 3? Yep, this shows recycling:> x[1:2]*x[1] 1 4 3 8 5 12 7 16 9 Warning message: In x[1:2] * x : longer object length is not a multiple of shorter object length You do get a warning but not telling you what it did. In essence, the earlier case of 5*x arguably recycled the 5 as many times as needed but with no warning. My point is that many languages, especially older ones, were designed a certain way and have been updated but we may be stuck with what we have. A brand new language might come up with a new way that includes vectorizing the heck out of things but allowing and even demanding that you explicitly convert things to a scalar in a context that needs it or to explicitly asking for recycling when you want it or ... -----Original Message----- From: R-devel <r-devel-bounces at r-project.org> On Behalf Of Henrik Bengtsson Sent: Friday, September 17, 2021 8:39 AM To: GILLIBERT, Andre <Andre.Gillibert at chu-rouen.fr> Cc: R-devel <r-devel at r-project.org> Subject: Re: [Rd] WISH: set.seed(seed) to produce error if length(seed) != 1 (now silent)> I?m curious, other than proper programming practice, why?Life's too short for troubleshooting silent mistakes - mine or others. While at it, searching the interwebs for use of set.seed(), gives mistakes/misunderstandings like using set.seed(<double>), e.g.> set.seed(6.1); sum(.Random.seed)[1] 73930104> set.seed(6.2); sum(.Random.seed)[1] 73930104 which clearly is not what the user expected. There are also a few cases of set.seed(<character>), e.g.> set.seed("42"); sum(.Random.seed)[1] -2119381568> set.seed(42); sum(.Random.seed)[1] -2119381568 which works just because as.numeric("42") is used. /Henrik On Fri, Sep 17, 2021 at 12:55 PM GILLIBERT, Andre <Andre.Gillibert at chu-rouen.fr> wrote:> > Hello, > > A vector with a length >= 2 to set.seed would probably be a bug. An error message will help the user to fix his R code. The bug may be accidental or due to bad understanding of the set.seed function. For instance, a user may think that the whole state of the PRNG can be passed to set.seed. > > The "if" instruction, emits a warning when the condition has length >= 2, because it is often a bug. I would expect a warning or error with set.seed(). > > Validating inputs and emitting errors early is a good practice. > > Just my 2 cents. > > Sincerely. > Andre GILLIBERT > > -----Message d'origine----- > De : R-devel [mailto:r-devel-bounces at r-project.org] De la part de > Avraham Adler Envoy? : vendredi 17 septembre 2021 12:07 ? : Henrik > Bengtsson Cc : R-devel Objet : Re: [Rd] WISH: set.seed(seed) to > produce error if length(seed) != 1 (now silent) > > Hi, Henrik. > > I?m curious, other than proper programming practice, why? > > Avi > > On Fri, Sep 17, 2021 at 11:48 AM Henrik Bengtsson < > henrik.bengtsson at gmail.com> wrote: > > > Hi, > > > > according to help("set.seed"), argument 'seed' to set.seed() should be: > > > > a single value, interpreted as an integer, or NULL (see ?Details?). > > > > From code inspection (src/main/RNG.c) and testing, it turns out that > > if you pass a 'seed' with length greater than one, it silently uses > > seed[1], e.g. > > > > > set.seed(1); sum(.Random.seed) > > [1] 4070365163 > > > set.seed(1:3); sum(.Random.seed) > > [1] 4070365163 > > > set.seed(1:100); sum(.Random.seed) > > [1] 4070365163 > > > > I'd like to suggest that set.seed() produces an error if > > length(seed) > > > 1. As a reference, for length(seed) == 0, we get: > > > > > set.seed(integer(0)) > > Error in set.seed(integer(0)) : supplied seed is not a valid integer > > > > /Henrik > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > -- > Sent from Gmail Mobile > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel