Good Morning, I am working with LDA models in R (using both topicmodels::LDA and quanteda::textmodel_lda) and noticed that the results differ slightly across different machines, even when I use set.seed(1234) and the same dataset. So, I have a few questions: Is this expected due to BLAS/LAPACK or low-level random number generation differences? Is there a recommended way to enforce bit-for-bit reproducibility of LDA results across machines in R? Would you recommend always saving fitted models with saveRDS() to ensure reproducible outputs instead of re-fitting? Thanks a lot for your guidance. Best regards, Jeanne Moreau [[alternative HTML version deleted]]
There was a change in the default R RNG a few years ago. Are you using the same version of R in all cases? On October 3, 2025 2:57:46 AM PDT, Jeanne Moreau <moreaujeanne02 at gmail.com> wrote:>Good Morning, > >I am working with LDA models in R (using both topicmodels::LDA and >quanteda::textmodel_lda) and noticed that the results differ slightly >across different machines, even when I use set.seed(1234) and the same >dataset. > >So, I have a few questions: >Is this expected due to BLAS/LAPACK or low-level random number generation >differences? >Is there a recommended way to enforce bit-for-bit reproducibility of LDA >results across machines in R? >Would you recommend always saving fitted models with saveRDS() to ensure >reproducible outputs instead of re-fitting? > >Thanks a lot for your guidance. > >Best regards, > >Jeanne Moreau > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide https://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
Also, it is a bad idea to make your randomized analyses dependent on bit-for-bit reproducibility... first, different computer architectures handle floating point intermediate calculations differently, and second the whole point of a randomized trial is that it converges to some quantifiable mean result regardless of the path taken to get there. On October 3, 2025 2:57:46 AM PDT, Jeanne Moreau <moreaujeanne02 at gmail.com> wrote:>Good Morning, > >I am working with LDA models in R (using both topicmodels::LDA and >quanteda::textmodel_lda) and noticed that the results differ slightly >across different machines, even when I use set.seed(1234) and the same >dataset. > >So, I have a few questions: >Is this expected due to BLAS/LAPACK or low-level random number generation >differences? >Is there a recommended way to enforce bit-for-bit reproducibility of LDA >results across machines in R? >Would you recommend always saving fitted models with saveRDS() to ensure >reproducible outputs instead of re-fitting? > >Thanks a lot for your guidance. > >Best regards, > >Jeanne Moreau > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide https://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
To add a little bit of detail to what others have said: If you are using the same version of R, on the same operating system, with the same processor (e.g. there will be differences between Intel/M1/M2 Macs), then as far as I know the only source of non-determinism, which could even affect successive runs on the same machine, would be parallel operations in BLAS/LAPACK resulting in mathematically equivalent operations being done in a different order (floating point arithmetic is not associative, so (a+b)+c != a + (b+c) in general). On 10/3/25 05:57, Jeanne Moreau wrote:> Good Morning, > > I am working with LDA models in R (using both topicmodels::LDA and > quanteda::textmodel_lda) and noticed that the results differ slightly > across different machines, even when I use set.seed(1234) and the same > dataset. > > So, I have a few questions: > Is this expected due to BLAS/LAPACK or low-level random number generation > differences? > Is there a recommended way to enforce bit-for-bit reproducibility of LDA > results across machines in R? > Would you recommend always saving fitted models with saveRDS() to ensure > reproducible outputs instead of re-fitting? > > Thanks a lot for your guidance. > > Best regards, > > Jeanne Moreau > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide https://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Dr. Benjamin Bolker Professor, Mathematics & Statistics and Biology, McMaster University Associate chair (graduate), Mathematics & Statistics Director, School of Computational Science and Engineering * E-mail is sent at my convenience; I don't expect replies outside of working hours.