Please confirm that when you do the manual load and check that f(v*)
matches the result from qsub() it succeeds for cases #1,#2 but only fails
for #3.
On Fri, Mar 4, 2022 at 10:06 AM Arthur Fendrich <arthfen at gmail.com>
wrote:
> Dear all,
>
> I am currently having a weird problem with a large-scale optimization
> routine. It would be nice to know if any of you have already gone through
> something similar, and how you solved it.
>
> I apologize in advance for not providing an example, but I think the
> non-reproducibility of the error is maybe a key point of this problem.
>
> Simplest possible description of the problem: I have two functions: g(X)
> and f(v).
> g(X) does:
> i) inputs a large matrix X;
> ii) derives four other matrices from X (I'll call them A, B, C and D)
then
> saves to disk for debugging purposes;
>
> Then, f(v) does:
> iii) loads A, B, C, D from disk
> iv) calculates the log-likelihood, which vary according to a vector of
> parameters, v.
>
> My goal application is quite big (X is a 40000x40000 matrix), so I created
> the following versions to test and run the codes/math/parallelization:
> #1) A simulated example with X being 100x100
> #2) A degraded version of the goal application, with X being 4000x4000
> #3) The goal application, with X being 40000x40000
>
> When I use qsub to submit the job, using the exact same code and processing
> cluster, #1 and #2 run flawlessly, so no problem. These results tell me
> that the codes/math/parallelization are fine.
>
> For application #3, it converges to a vector v*. However, when I manually
> load A, B, C and D from disk and calculate f(v*), then the value I get is
> completely different.
> For example:
> - qsub job says v* = c(0, 1, 2, 3) is a minimum with f(v*) = 1.
> - when I manually load A, B, C, D from disk and calculate f(v*) on the
> exact same machine with the same libraries and environment variables, I get
> f(v*) = 1000.
>
> This is a very confusing behavior. In theory the size of X should not
> affect my problem, but it seems that things get unstable as the dimension
> grows. The main issue for debugging is that g(X) for simulation #3 takes
> two hours to run, and I am completely lost on how I could find the causes
> of the problem. Would you have any general advices?
>
> Thank you very much in advance for literally any suggestions you might
> have!
>
> Best regards,
> Arthur
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]