thr3ads.net - R help - [R] Non-reproducible LDA results across machines [Oct 2025]

If this information is useful, please help other people find it:
Share via:

Jeanne Moreau

2025-Oct-03 09:57 UTC

[R] Non-reproducible LDA results across machines

Good Morning,

I am working with LDA models in R (using both topicmodels::LDA and
quanteda::textmodel_lda) and noticed that the results differ slightly
across different machines, even when I use set.seed(1234) and the same
dataset.

So, I have a few questions:
Is this expected due to BLAS/LAPACK or low-level random number generation
differences?
Is there a recommended way to enforce bit-for-bit reproducibility of LDA
results across machines in R?
Would you recommend always saving fitted models with saveRDS() to ensure
reproducible outputs instead of re-fitting?

Thanks a lot for your guidance.

Best regards,

Jeanne Moreau

	[[alternative HTML version deleted]]

Jeff Newmiller

2025-Oct-03 12:44 UTC

head link

[R] Non-reproducible LDA results across machines

There was a change in the default R RNG a few years ago. Are you using the same
version of R in all cases?

On October 3, 2025 2:57:46 AM PDT, Jeanne Moreau <moreaujeanne02 at
gmail.com> wrote:>Good Morning,
>
>I am working with LDA models in R (using both topicmodels::LDA and
>quanteda::textmodel_lda) and noticed that the results differ slightly
>across different machines, even when I use set.seed(1234) and the same
>dataset.
>
>So, I have a few questions:
>Is this expected due to BLAS/LAPACK or low-level random number generation
>differences?
>Is there a recommended way to enforce bit-for-bit reproducibility of LDA
>results across machines in R?
>Would you recommend always saving fitted models with saveRDS() to ensure
>reproducible outputs instead of re-fitting?
>
>Thanks a lot for your guidance.
>
>Best regards,
>
>Jeanne Moreau
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

Jeff Newmiller

2025-Oct-03 12:50 UTC

head link

[R] Non-reproducible LDA results across machines

Also, it is a bad idea to make your randomized analyses dependent on bit-for-bit
reproducibility... first, different computer architectures handle floating point
intermediate calculations differently, and second the whole point of a
randomized trial is that it converges to some quantifiable mean result
regardless of the path taken to get there.

On October 3, 2025 2:57:46 AM PDT, Jeanne Moreau <moreaujeanne02 at
gmail.com> wrote:>Good Morning,
>
>I am working with LDA models in R (using both topicmodels::LDA and
>quanteda::textmodel_lda) and noticed that the results differ slightly
>across different machines, even when I use set.seed(1234) and the same
>dataset.
>
>So, I have a few questions:
>Is this expected due to BLAS/LAPACK or low-level random number generation
>differences?
>Is there a recommended way to enforce bit-for-bit reproducibility of LDA
>results across machines in R?
>Would you recommend always saving fitted models with saveRDS() to ensure
>reproducible outputs instead of re-fitting?
>
>Thanks a lot for your guidance.
>
>Best regards,
>
>Jeanne Moreau
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

Ben Bolker

2025-Oct-03 16:41 UTC

head link

[R] Non-reproducible LDA results across machines

To add a little bit of detail to what others have said:

  If you are using the same version of R, on the same operating system, 
with the same processor (e.g. there will be differences between 
Intel/M1/M2 Macs), then as far as I know the only source of 
non-determinism, which could even affect successive runs on the same 
machine, would be parallel operations in BLAS/LAPACK resulting in 
mathematically equivalent operations being done in a different order 
(floating point arithmetic is not associative, so (a+b)+c != a + (b+c) 
in general).

On 10/3/25 05:57, Jeanne Moreau wrote:> Good Morning,
> 
> I am working with LDA models in R (using both topicmodels::LDA and
> quanteda::textmodel_lda) and noticed that the results differ slightly
> across different machines, even when I use set.seed(1234) and the same
> dataset.
> 
> So, I have a few questions:
> Is this expected due to BLAS/LAPACK or low-level random number generation
> differences?
> Is there a recommended way to enforce bit-for-bit reproducibility of LDA
> results across machines in R?
> Would you recommend always saving fitted models with saveRDS() to ensure
> reproducible outputs instead of re-fitting?
> 
> Thanks a lot for your guidance.
> 
> Best regards,
> 
> Jeanne Moreau
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Associate chair (graduate), Mathematics & Statistics
Director, School of Computational Science and Engineering
* E-mail is sent at my convenience; I don't expect replies outside of 
working hours.

R help - Oct 2025 - Non-reproducible LDA results across machines

[R] Non-reproducible LDA results across machines

[R] Non-reproducible LDA results across machines

[R] Non-reproducible LDA results across machines

[R] Non-reproducible LDA results across machines