"rather to understand how the choice of seed influences final model
output."
No! Different seeds just produce different streams of (pseudo)-random
numbers. Hence there cannot be any "understanding" of how
"choice of seed"
influences results. Presumably, what you meant is to characterize the
variability in results from the procedure due to its incorporation of
randomness in what it does. Re-read Jeff's last post. This does *not*
require set.seed() at all.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Mar 22, 2022 at 9:55 AM Ebert,Timothy Aaron <tebert at ufl.edu>
wrote:
> So step 1 is not to compare models, rather to understand how the choice of
> seed influences final model output. Once you have a handle on this issue,
> then work at comparing models.
>
> Tim
>
>
>
> *From:* Neha gupta <neha.bologna90 at gmail.com>
> *Sent:* Tuesday, March 22, 2022 12:19 PM
> *To:* Bert Gunter <bgunter.4567 at gmail.com>
> *Cc:* Ebert,Timothy Aaron <tebert at ufl.edu>; r-help at
r-project.org
> *Subject:* Re: [R] How important is set.seed
>
>
>
> *[External Email]*
>
> I read a paper two days ago (and that's why I then posted here about
> set.seed) which used interpretable machine learning.
>
>
>
> According to the authors, different explanations (of the black-box models)
> will be produced by the ML models if different seeds are used or never
> used.
>
>
>
>
>
>
>
> On Tue, Mar 22, 2022 at 5:12 PM Bert Gunter <bgunter.4567 at
gmail.com>
> wrote:
>
> OK, I'm somewhat puzzled by this discussion. Maybe I'm just
clueless.
> But...
>
> 1. set.seed() is used to make any procedure that uses R's
> pseudo-random number generator -- including, for example, sampling
> from a distribution, random data splitting, etc. --
"reproducible".
> That is, if the procedure is repeated *exactly,* by invoking
> set.seed() with its original argument values (once!) *before* the
> procedure begins, exactly the same results should be produced by the
> procedure. Full stop. It does not matter how many times random number
> generation occurs within the procedure thereafter -- R preserves the
> state of the rng between invocations (but see the notes in ?set.seed
> for subtle qualifications of this claim).
>
> 2. Hence, if no (pseudo-) random number generation is used, set.seed()
> is irrelevant. Full stop.
>
> 3. Hence, if you don't care about reproducibility (you should! -- if
> for no other reason than debugging), you don't need set.seed()
>
> 4. The "randomness" of any sequence of results from any
particular
> set.seed() arguments (including further calls to the rng) is a complex
> issue. ?set.seed has some discussion of this, but one needs
> considerable expertise to make informed choices here. As usual, we
> untutored users should be guided by the expert recommendations of the
> Help file.
>
> *** If anything I have said above is wrong, I would greatly appreciate
> a public response here showing my error.***
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip
)
>
>
>
> On Tue, Mar 22, 2022 at 7:48 AM Neha gupta <neha.bologna90 at
gmail.com>
> wrote:
> >
> > Hello Tim
> >
> > In some of the examples I see in the tutorials, they put the random
seed
> > just before the model training e.g train function in case of caret
> library.
> > Should I follow this?
> >
> > Best regards
> > On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at
ufl.edu> wrote:
> >
> > > Ah, so maybe what you need is to think of ?set.seed()? as a
treatment
> in
> > > an experiment. You could use a random number generator to select
an
> > > appropriate number of seeds, then use those seeds repeatedly in
the
> > > different models to see how seed selection influences outcomes. I
am
> not
> > > quite sure how many seeds would constitute a good sample. For me
that
> would
> > > depend on what I find and how long a run takes.
> > >
> > > In parallel processing you set seed in master and then use a
random
> > > number generator to set seeds in each worker.
> > >
> > > Tim
> > >
> > >
> > >
> > > *From:* Neha gupta <neha.bologna90 at gmail.com>
> > > *Sent:* Tuesday, March 22, 2022 6:33 AM
> > > *To:* Ebert,Timothy Aaron <tebert at ufl.edu>
> > > *Cc:* Jeff Newmiller <jdnewmil at dcn.davis.ca.us>; r-help
at r-project.org
> > > *Subject:* Re: How important is set.seed
> > >
> > >
> > >
> > > *[External Email]*
> > >
> > > Thank you all.
> > >
> > >
> > >
> > > Actually I need set.seed because I have to evaluate the
consistency of
> > > features selection generated by different models, so I think for
this,
> it's
> > > recommended to use the seed.
> > >
> > >
> > >
> > > Warm regards
> > >
> > > On Tuesday, March 22, 2022, Ebert,Timothy Aaron <tebert at
ufl.edu>
> wrote:
> > >
> > > If you are using the program for data analysis then set.seed() is
not
> > > necessary unless you are developing a reproducible example. In a
> standard
> > > analysis it is mostly counter-productive because one should then
ask if
> > > your presented results are an artifact of a specific seed that
you
> selected
> > > to get a particular result. However, in cases where you need a
> reproducible
> > > example, debugging a program, or specific other cases where you
might
> need
> > > the same result with every run of the program then set.seed() is
an
> > > essential tool.
> > > Tim
> > >
> > > -----Original Message-----
> > > From: R-help <r-help-bounces at r-project.org> On Behalf Of
Jeff
> Newmiller
> > > Sent: Monday, March 21, 2022 8:41 PM
> > > To: r-help at r-project.org; Neha gupta <neha.bologna90 at
gmail.com>;
> r-help
> > > mailing list <r-help at r-project.org>
> > > Subject: Re: [R] How important is set.seed
> > >
> > > [External Email]
> > >
> > > First off, "ML models" do not all use random numbers
(for prediction I
> > > would guess very few of them do). Learn and pay attention to what
the
> > > functions you are using do.
> > >
> > > Second, if you use random numbers properly and understand the
precision
> > > that your specific use case offers, then you don't need to
use
> set.seed.
> > > However, in practice, using set.seed can allow you to temporarily
avoid
> > > chasing precision gremlins, or set up specific test cases for
testing
> code,
> > > not results. It is your responsibility to not let this become a
> crutch... a
> > > randomized simulation that is actually sensitive to the seed is
> unlikely to
> > > offer an accurate result.
> > >
> > > Where to put set.seed depends a lot on how you are performing
your
> > > simulations. In general each process should set it once uniquely
at the
> > > beginning, and if you use parallel processing then use the
features of
> your
> > > parallel processing framework to insure that this happens. Beware
of
> > > setting all worker processes to use the same seed.
> > >
> > > On March 21, 2022 5:03:30 PM PDT, Neha gupta <neha.bologna90
at gmail.com
> >
> > > wrote:
> > > >Hello everyone
> > > >
> > > >I want to know
> > > >
> > > >(1) In which cases, we need to use set.seed while building ML
models?
> > > >
> > > >(2) Which is the exact location we need to put the set.seed
function
> i.e.
> > > >when we split data into train/test sets, or just before we
train a
> model?
> > > >
> > > >Thank you
> > > >
> > > > [[alternative HTML version deleted]]
> > > >
> > > >______________________________________________
> > > >R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
> > > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailm
> > >
>
>an_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRz
> > >
>
>sn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf
> > >
>0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2WyRxpXsq4Y3TRMU&e> > >
>PLEASE do read the posting guide
> > > >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org
> > >
>
>_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsR
> > >
>
>zsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrm
> > >
>f0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e> > >
>and provide commented, minimal, self-contained, reproducible code.
> > >
> > > --
> > > Sent from my phone. Please excuse my brevity.
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.
> > >
ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r>
> > 9PEhQh2kVeAsRzsn7AkP-g&m=s9osWKJN-zG2VafjXQYCmU_
> > >
>
AMS5w3eAtCfeJAwnphAb7ap8kDYfcLwt2jrmf0UaX&s=5b117E3OFSf5VyLOctfnrz0rj5B2Wy
> > > RxpXsq4Y3TRMU&e> > > PLEASE do read the posting
guide https://urldefense.proofpoint
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__urldefense.proofpoint&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=PQ6DQb4poGhoaIYvUOp1VjwHR_LLJ5Cf6ugqj9o6_q8&e=>
> .
> > > com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.
> > >
html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m>
> > s9osWKJN-zG2VafjXQYCmU_AMS5w3eAtCfeJAwnphAb7ap8kDYfcL
> > >
wt2jrmf0UaX&s=wI6SycC_C2fno2VfxGg9ObD3Dd1qh6vn56pIvmCcobg&e> >
> and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
>
<https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=gQOur-Bj_IkQUQavZr9GRQWDI6FLMolie3oSJK0pC1w&e=>
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwMFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=DheoTpUbiMMhocxNg-tk-BO_ZtdxO9LJyzryBrNGDROu1fkI31lSK_GB-p_qTuGX&s=yuDFhe31-hTPEV6voKWLGaIpMKTCGzo2zYVhaCzHqlc&e=>
> > and provide commented, minimal, self-contained, reproducible code.
>
>
[[alternative HTML version deleted]]