thr3ads.net - R help - [R] How important is set.seed [Mar 2022]

If this information is useful, please help other people find it:
Share via:

Ebert,Timothy Aaron

2022-Mar-22 00:44 UTC

[R] How important is set.seed

If you are using the program for data analysis then set.seed() is not necessary
unless you are developing a reproducible example. In a standard analysis it is
mostly counter-productive because one should then ask if your presented results
are an artifact of a specific seed that you selected to get a particular result.
However, in cases where you need a reproducible example, debugging a program, or
specific other cases where you might need the same result with every run of the
program then set.seed() is an essential tool.
Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller
Sent: Monday, March 21, 2022 8:41 PM
To: r-help at r-project.org; Neha gupta <neha.bologna90 at gmail.com>;
r-help mailing list <r-help at r-project.org>
Subject: Re: [R] How important is set.seed

[External Email]

First off, "ML models" do not all use random numbers (for prediction I
would guess very few of them do). Learn and pay attention to what the functions
you are using do.

Second, if you use random numbers properly and understand the precision that
your specific use case offers, then you don't need to use set.seed. However,
in practice, using set.seed can allow you to temporarily avoid chasing precision
gremlins, or set up specific test cases for testing code, not results. It is
your responsibility to not let this become a crutch... a randomized simulation
that is actually sensitive to the seed is unlikely to offer an accurate result.

Where to put set.seed depends a lot on how you are performing your simulations.
In general each process should set it once uniquely at the beginning, and if you
use parallel processing then use the features of your parallel processing
framework to insure that this happens. Beware of setting all worker processes to
use the same seed.

On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at gmail.com>
wrote:>Hello everyone
>
>I want to know
>
>(1) In which cases, we need to use set.seed while building ML models?
>
>(2) Which is the exact location we need to put the set.seed function i.e.
>when we split data into train/test sets, or just before we train a model?
>
>Thank you
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
>an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
>sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
>0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e>PLEASE do
read the posting guide
>https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
>_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
>zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
>f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e>and provide
commented, minimal, self-contained, reproducible code.
--
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&eand
provide commented, minimal, self-contained, reproducible code.

Jin Li

2022-Mar-22 04:07 UTC

head link

[R] How important is set.seed

The answer may depend on the model type you are going to develop. For
predictive models, yes you do need it. The dependence of predictive
accuracy measures on random seeds and dependence of stabilized predictive
accuracy measures on random seeds have been demonstrated and discussed in
Spatial Predictive Modeling with R (doi:10.1201/9781003091776), where many
reproducible examples are provided for various predictive methods including
RF, GBM and SVM.
Hope this helps.
Jin

On Tue, Mar 22, 2022 at 11:51 AM Ebert,Timothy Aaron <tebert at ufl.edu>
wrote:
> If you are using the program for data analysis then set.seed() is not
> necessary unless you are developing a reproducible example. In a standard
> analysis it is mostly counter-productive because one should then ask if
> your presented results are an artifact of a specific seed that you selected
> to get a particular result. However, in cases where you need a reproducible
> example, debugging a program, or specific other cases where you might need
> the same result with every run of the program then set.seed() is an
> essential tool.
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff
Newmiller
> Sent: Monday, March 21, 2022 8:41 PM
> To: r-help at r-project.org; Neha gupta <neha.bologna90 at
gmail.com>; r-help
> mailing list <r-help at r-project.org>
> Subject: Re: [R] How important is set.seed
>
> [External Email]
>
> First off, "ML models" do not all use random numbers (for
prediction I
> would guess very few of them do). Learn and pay attention to what the
> functions you are using do.
>
> Second, if you use random numbers properly and understand the precision
> that your specific use case offers, then you don't need to use
set.seed.
> However, in practice, using set.seed can allow you to temporarily avoid
> chasing precision gremlins, or set up specific test cases for testing code,
> not results. It is your responsibility to not let this become a crutch... a
> randomized simulation that is actually sensitive to the seed is unlikely to
> offer an accurate result.
>
> Where to put set.seed depends a lot on how you are performing your
> simulations. In general each process should set it once uniquely at the
> beginning, and if you use parallel processing then use the features of your
> parallel processing framework to insure that this happens. Beware of
> setting all worker processes to use the same seed.
>
> On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at
gmail.com>
> wrote:
> >Hello everyone
> >
> >I want to know
> >
> >(1) In which cases, we need to use set.seed while building ML models?
> >
> >(2) Which is the exact location we need to put the set.seed function
i.e.
> >when we split data into train/test sets, or just before we train a
model?
> >
> >Thank you
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
>
>an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
>
>sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
> >0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e>
>PLEASE do read the posting guide
> >https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
>
>_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
>
>zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
> >f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e>
>and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e>
PLEASE do read the posting guide
>
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e>
and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jin
------------------------------------------
Jin Li, PhD
Founder, Data2action, Australia
https://www.researchgate.net/profile/Jin_Li32
https://scholar.google.com/citations?user=Jeot53EAAAAJ&hl=en

	[[alternative HTML version deleted]]

Neha gupta

2022-Mar-22 10:32 UTC

head link

[R] How important is set.seed

Thank you all.

Actually I need set.seed because I have to evaluate the consistency of
features selection generated by different models, so I think for this, it's
recommended to use the seed.

Warm regards

On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at ufl.edu> wrote:
> If you are using the program for data analysis then set.seed() is not
> necessary unless you are developing a reproducible example. In a standard
> analysis it is mostly counter-productive because one should then ask if
> your presented results are an artifact of a specific seed that you selected
> to get a particular result. However, in cases where you need a reproducible
> example, debugging a program, or specific other cases where you might need
> the same result with every run of the program then set.seed() is an
> essential tool.
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff
Newmiller
> Sent: Monday, March 21, 2022 8:41 PM
> To: r-help at r-project.org; Neha gupta <neha.bologna90 at
gmail.com>; r-help
> mailing list <r-help at r-project.org>
> Subject: Re: [R] How important is set.seed
>
> [External Email]
>
> First off, "ML models" do not all use random numbers (for
prediction I
> would guess very few of them do). Learn and pay attention to what the
> functions you are using do.
>
> Second, if you use random numbers properly and understand the precision
> that your specific use case offers, then you don't need to use
set.seed.
> However, in practice, using set.seed can allow you to temporarily avoid
> chasing precision gremlins, or set up specific test cases for testing code,
> not results. It is your responsibility to not let this become a crutch... a
> randomized simulation that is actually sensitive to the seed is unlikely to
> offer an accurate result.
>
> Where to put set.seed depends a lot on how you are performing your
> simulations. In general each process should set it once uniquely at the
> beginning, and if you use parallel processing then use the features of your
> parallel processing framework to insure that this happens. Beware of
> setting all worker processes to use the same seed.
>
> On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90 at
gmail.com>
> wrote:
> >Hello everyone
> >
> >I want to know
> >
> >(1) In which cases, we need to use set.seed while building ML models?
> >
> >(2) Which is the exact location we need to put the set.seed function
i.e.
> >when we split data into train/test sets, or just before we train a
model?
> >
> >Thank you
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
>
>an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
>
>sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
> >0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e>
>PLEASE do read the posting guide
> >https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
>
>_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
>
>zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
> >f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e>
>and provide commented, minimal, self-contained, reproducible code.
>
> --
> Sent from my phone. Please excuse my brevity.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
>
ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r>
9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_
>
AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy
> RxpXsq4Y3TRMU&e> PLEASE do read the posting guide
https://urldefense.proofpoint.
> com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.
>
html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m>
s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL
> wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e> and
provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Mar 2022 - How important is set.seed

[R] How important is set.seed

[R] How important is set.seed

[R] How important is set.seed