thr3ads.net - R help - [R] Possible causes of unexpected behavior [Mar 2022]

If this information is useful, please help other people find it:
Share via:

Eric Berger

2022-Mar-04 08:47 UTC

[R] Possible causes of unexpected behavior

Please confirm that when you do the manual load and check that f(v*)
matches the result from qsub() it succeeds for cases #1,#2 but only fails
for #3.


On Fri, Mar 4, 2022 at 10:06 AM Arthur Fendrich <arthfen at gmail.com>
wrote:
> Dear all,
>
> I am currently having a weird problem with a large-scale optimization
> routine. It would be nice to know if any of you have already gone through
> something similar, and how you solved it.
>
> I apologize in advance for not providing an example, but I think the
> non-reproducibility of the error is maybe a key point of this problem.
>
> Simplest possible description of the problem: I have two functions: g(X)
> and f(v).
> g(X) does:
>  i) inputs a large matrix X;
>  ii) derives four other matrices from X (I'll call them A, B, C and D)
then
> saves to disk for debugging purposes;
>
> Then, f(v) does:
>  iii) loads A, B, C, D from disk
>  iv) calculates the log-likelihood, which vary according to a vector of
> parameters, v.
>
> My goal application is quite big (X is a 40000x40000 matrix), so I created
> the following versions to test and run the codes/math/parallelization:
> #1) A simulated example with X being 100x100
> #2) A degraded version of the goal application, with X being 4000x4000
> #3) The goal application, with X being 40000x40000
>
> When I use qsub to submit the job, using the exact same code and processing
> cluster, #1 and #2 run flawlessly, so no problem. These results tell me
> that the codes/math/parallelization are fine.
>
> For application #3, it converges to a vector v*. However, when I manually
> load A, B, C and D from disk and calculate f(v*), then the value I get is
> completely different.
> For example:
> - qsub job says v* = c(0, 1, 2, 3) is a minimum with f(v*) = 1.
> - when I manually load A, B, C, D from disk and calculate f(v*) on the
> exact same machine with the same libraries and environment variables, I get
> f(v*) = 1000.
>
> This is a very confusing behavior. In theory the size of X should not
> affect my problem, but it seems that things get unstable as the dimension
> grows. The main issue for debugging is that g(X) for simulation #3 takes
> two hours to run, and I am completely lost on how I could find the causes
> of the problem. Would you have any general advices?
>
> Thank you very much in advance for literally any suggestions you might
> have!
>
> Best regards,
> Arthur
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Arthur Fendrich

2022-Mar-04 09:21 UTC

head link

[R] Possible causes of unexpected behavior

Dear Eric,

Thank you for the response. Yes, I can confirm that, please see below the
behavior.
For #1, results are identical. For #2, they are not identical but very
close. For #3, they are completely different.

Best regards,
Arthur

--

For #1,
- qsub execution:
[1] "ll: 565.7251"
[1] "norm gr @ minimum: 2.96967368608131e-08"

- manual check:
f(v*): 565.7251
gradient norm at v*: 2.969674e-08

#
For #2,

- qsub execution:
[1] "ll: 14380.8308"
[1] "norm gr @ minimum: 0.0140857561408041"

- manual check:
f(v*): 14380.84
gradient norm at v*: 0.01404779

#
For #3,

- qsub execution:
[1] "ll: 14310.6812"
[1] "norm gr @ minimum: 6232158.38877002"

- manual check:
f(v*): 97604.69
gradient norm at v*: 6266696595

Em sex., 4 de mar. de 2022 ?s 09:48, Eric Berger <ericjberger at
gmail.com>
escreveu:
> Please confirm that when you do the manual load and check that f(v*)
> matches the result from qsub() it succeeds for cases #1,#2 but only fails
> for #3.
>
>
> On Fri, Mar 4, 2022 at 10:06 AM Arthur Fendrich <arthfen at
gmail.com> wrote:
>
>> Dear all,
>>
>> I am currently having a weird problem with a large-scale optimization
>> routine. It would be nice to know if any of you have already gone
through
>> something similar, and how you solved it.
>>
>> I apologize in advance for not providing an example, but I think the
>> non-reproducibility of the error is maybe a key point of this problem.
>>
>> Simplest possible description of the problem: I have two functions:
g(X)
>> and f(v).
>> g(X) does:
>>  i) inputs a large matrix X;
>>  ii) derives four other matrices from X (I'll call them A, B, C and
D)
>> then
>> saves to disk for debugging purposes;
>>
>> Then, f(v) does:
>>  iii) loads A, B, C, D from disk
>>  iv) calculates the log-likelihood, which vary according to a vector of
>> parameters, v.
>>
>> My goal application is quite big (X is a 40000x40000 matrix), so I
created
>> the following versions to test and run the codes/math/parallelization:
>> #1) A simulated example with X being 100x100
>> #2) A degraded version of the goal application, with X being 4000x4000
>> #3) The goal application, with X being 40000x40000
>>
>> When I use qsub to submit the job, using the exact same code and
>> processing
>> cluster, #1 and #2 run flawlessly, so no problem. These results tell me
>> that the codes/math/parallelization are fine.
>>
>> For application #3, it converges to a vector v*. However, when I
manually
>> load A, B, C and D from disk and calculate f(v*), then the value I get
is
>> completely different.
>> For example:
>> - qsub job says v* = c(0, 1, 2, 3) is a minimum with f(v*) = 1.
>> - when I manually load A, B, C, D from disk and calculate f(v*) on the
>> exact same machine with the same libraries and environment variables, I
>> get
>> f(v*) = 1000.
>>
>> This is a very confusing behavior. In theory the size of X should not
>> affect my problem, but it seems that things get unstable as the
dimension
>> grows. The main issue for debugging is that g(X) for simulation #3
takes
>> two hours to run, and I am completely lost on how I could find the
causes
>> of the problem. Would you have any general advices?
>>
>> Thank you very much in advance for literally any suggestions you might
>> have!
>>
>> Best regards,
>> Arthur
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
	[[alternative HTML version deleted]]

Eric Berger

2022-Mar-04 09:32 UTC

head link

[R] Possible causes of unexpected behavior

Can you confirm you have a distributed calculation running in parallel?
Have you determined that it is thread safe? How?
Your check on the smaller examples may not have ruled out such
possibilities.

On Fri, Mar 4, 2022 at 11:21 AM Arthur Fendrich <arthfen at gmail.com>
wrote:
> Dear Eric,
>
> Thank you for the response. Yes, I can confirm that, please see below the
> behavior.
> For #1, results are identical. For #2, they are not identical but very
> close. For #3, they are completely different.
>
> Best regards,
> Arthur
>
> --
>
> For #1,
> - qsub execution:
> [1] "ll: 565.7251"
> [1] "norm gr @ minimum: 2.96967368608131e-08"
>
> - manual check:
> f(v*): 565.7251
> gradient norm at v*: 2.969674e-08
>
> #
> For #2,
>
> - qsub execution:
> [1] "ll: 14380.8308"
> [1] "norm gr @ minimum: 0.0140857561408041"
>
> - manual check:
> f(v*): 14380.84
> gradient norm at v*: 0.01404779
>
> #
> For #3,
>
> - qsub execution:
> [1] "ll: 14310.6812"
> [1] "norm gr @ minimum: 6232158.38877002"
>
> - manual check:
> f(v*): 97604.69
> gradient norm at v*: 6266696595
>
> Em sex., 4 de mar. de 2022 ?s 09:48, Eric Berger <ericjberger at
gmail.com>
> escreveu:
>
>> Please confirm that when you do the manual load and check that f(v*)
>> matches the result from qsub() it succeeds for cases #1,#2 but only
fails
>> for #3.
>>
>>
>> On Fri, Mar 4, 2022 at 10:06 AM Arthur Fendrich <arthfen at
gmail.com>
>> wrote:
>>
>>> Dear all,
>>>
>>> I am currently having a weird problem with a large-scale
optimization
>>> routine. It would be nice to know if any of you have already gone
through
>>> something similar, and how you solved it.
>>>
>>> I apologize in advance for not providing an example, but I think
the
>>> non-reproducibility of the error is maybe a key point of this
problem.
>>>
>>> Simplest possible description of the problem: I have two functions:
g(X)
>>> and f(v).
>>> g(X) does:
>>>  i) inputs a large matrix X;
>>>  ii) derives four other matrices from X (I'll call them A, B, C
and D)
>>> then
>>> saves to disk for debugging purposes;
>>>
>>> Then, f(v) does:
>>>  iii) loads A, B, C, D from disk
>>>  iv) calculates the log-likelihood, which vary according to a
vector of
>>> parameters, v.
>>>
>>> My goal application is quite big (X is a 40000x40000 matrix), so I
>>> created
>>> the following versions to test and run the
codes/math/parallelization:
>>> #1) A simulated example with X being 100x100
>>> #2) A degraded version of the goal application, with X being
4000x4000
>>> #3) The goal application, with X being 40000x40000
>>>
>>> When I use qsub to submit the job, using the exact same code and
>>> processing
>>> cluster, #1 and #2 run flawlessly, so no problem. These results
tell me
>>> that the codes/math/parallelization are fine.
>>>
>>> For application #3, it converges to a vector v*. However, when I
manually
>>> load A, B, C and D from disk and calculate f(v*), then the value I
get is
>>> completely different.
>>> For example:
>>> - qsub job says v* = c(0, 1, 2, 3) is a minimum with f(v*) = 1.
>>> - when I manually load A, B, C, D from disk and calculate f(v*) on
the
>>> exact same machine with the same libraries and environment
variables, I
>>> get
>>> f(v*) = 1000.
>>>
>>> This is a very confusing behavior. In theory the size of X should
not
>>> affect my problem, but it seems that things get unstable as the
dimension
>>> grows. The main issue for debugging is that g(X) for simulation #3
takes
>>> two hours to run, and I am completely lost on how I could find the
causes
>>> of the problem. Would you have any general advices?
>>>
>>> Thank you very much in advance for literally any suggestions you
might
>>> have!
>>>
>>> Best regards,
>>> Arthur
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
	[[alternative HTML version deleted]]

R help - Mar 2022 - Possible causes of unexpected behavior

[R] Possible causes of unexpected behavior

[R] Possible causes of unexpected behavior

[R] Possible causes of unexpected behavior