thr3ads.net - R devel - [Rd] BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows) [Oct 2016]

If this information is useful, please help other people find it:
Share via:

luke-tierney at uiowa.edu

2016-Oct-27 16:26 UTC

[Rd] BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)

On unix, unless event polling is enabled Sys.sleep just waits in a
select() call (with a SIGINT handler in place) so the elapsed time
isn't checked until after the select call is complete. Rstudio uses
event polling, and in particular sets R_wait_usec to 10000, which
means event and interrupt checks happen during a Sys.seep call.  The R
GUI on macOS doesn't seem to do this (but my lldb skills aren't up to
checking). Now that we have this elapsed time limit mechanism it might
be a good idea to set the default for R_wait_usec to something
reasonable on unix in general. 100000 might be a good value.

A more worrying thing I noticed while looking at this is that blocking
reads on fifos and pipes and probably sockets are not interruptable --
that should probably be looked into.

Best,

luke

On Wed, 26 Oct 2016, peter dalgaard wrote:
> Spencer also had tools and rsconnect loaded (via a namespace) but it
doesn't seem to make a difference for me if I load them. It also doesn't
seem to matter for me whether it is CRAN R, locally built R, Terminal, R.app.
However, RStudio differs
>
>> setTimeLimit(elapsed=1)
> Error: reached elapsed time limit
>> setTimeLimit(elapsed=1)
> Error: reached elapsed time limit
>> setTimeLimit(elapsed=1);
system.time({Sys.sleep(10);message("done")})
> Error in Sys.sleep(10) : reached elapsed time limit
> Timing stopped at: 0.003 0.003 0.733
>
> -pd
>
>
>> On 26 Oct 2016, at 21:54 , Henrik Bengtsson <henrik.bengtsson at
gmail.com> wrote:
>>
>> Thank you for the feedback and confirmations.  Interesting to see that
>> it's also reproducible on macOS expect for Spencer; that might
>> indicate a difference in builds.
>>
>> BTW, my original post suggested that timeout error was for sure
>> detected while running Sys.sleep(10).  However, it could of course
>> also be that it is only detected after it finishes.
>>
>>
>> For troubleshooting, the help("setTimeLimit", package =
"base") says that:
>>
>> * "Time limits are checked whenever a user interrupt could occur.
This
>> will happen frequently in R code and during Sys.sleep, but only at
>> points in compiled C and Fortran code identified by the code
author."
>>
>> The example here uses Sys.sleep(), which supports and detects user
interrupts.
>>
>>
>> The timeout error message is thrown by the R_ProcessEvents(void)
>> function as defined in:
>>
>> * src/unix/sys-unix.c
>>
(https://github.com/wch/r-source/blob/trunk/src/unix/sys-unix.c#L421-L453)
>> * src/gnuwin32/system.c
>>
(https://github.com/wch/r-source/blob/trunk/src/gnuwin32/system.c#L110-L140)
>>
>> So, they're clearly different implementations on Windows and Unix.
>> Also, for the Unix implementation, the code differ based on
>> preprocessing directive HAVE_AQUA, which could explain why Spencer
>> observes a different behavior than Peter and Berend (all on macOS).
>>
>>
>> Whenever the R_CheckUserInterrupt() function is called it in turn
>> always calls R_ProcessEvents().  At the end, there is a code snippet -
>> if (R_interrupts_pending) onintr(); - which is Windows specific and
>> could be another important difference between Windows and Unix.  This
>> function is defined in:
>>
>> * src/main/errors.c
>>
(https://github.com/wch/r-source/blob/trunk/src/main/errors.c#L114-L134)
>>
>>
>> The do_setTimeLimit() function controls global variables cpuLimitValue
>> and elapsedLimitValue, which are checked in R_ProcessEvents(), but
>> other than setting the timeout limits I don't think it's
involved in
>> the runtime checks. The do_setTimeLimit() is defined in:
>>
>> * src/main/sysutils.c
>>
(https://github.com/wch/r-source/blob/trunk/src/main/sysutils.c#L1692-L1736)
>>
>>
>> Unfortunately, right now, I've got little extra time to
troubleshoot
>> this further.
>>
>> /Henrik
>>
>> On Wed, Oct 26, 2016 at 2:22 AM, Berend Hasselman <bhh at
xs4all.nl> wrote:
>>>
>>>> On 26 Oct 2016, at 04:44, Henrik Bengtsson <henrik.bengtsson
at gmail.com> wrote:
>>>> .......
>>>> This looks like a bug to me.  Can anyone on macOS confirm
whether this
>>>> is also a problem there or not?
>>>>
>>>
>>>
>>> Tried it on macOS El Capitan and got this (running in R.app with R
version 3.3.2 RC (2016-10-23 r71574):
>>>
>>>> setTimeLimit(elapsed=1)
>>>> system.time({ Sys.sleep(10); message("done") })
>>> Error in Sys.sleep(10) : reached elapsed time limit
>>> Timing stopped at: 0.113 0.042 10.038
>>>
>>> Berend
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Henrik Bengtsson

2016-Oct-31 00:35 UTC

head link

[Rd] BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)

Thank you for looking into this Luke.

On Thu, Oct 27, 2016 at 9:26 AM,  <luke-tierney at uiowa.edu>
wrote:> On unix, unless event polling is enabled Sys.sleep just waits in a
> select() call (with a SIGINT handler in place) so the elapsed time
> isn't checked until after the select call is complete. Rstudio uses
> event polling, and in particular sets R_wait_usec to 10000, which
> means event and interrupt checks happen during a Sys.seep call.  The R
> GUI on macOS doesn't seem to do this (but my lldb skills aren't up
to
> checking). Now that we have this elapsed time limit mechanism it might
> be a good idea to set the default for R_wait_usec to something
> reasonable on unix in general. 100000 might be a good value.
>
> A more worrying thing I noticed while looking at this is that blocking
> reads on fifos and pipes and probably sockets are not interruptable --
> that should probably be looked into.
This is actually related to the use case where I want to use
setTimeLimit().  When using parallel:::newPSOCKnode(), there's a
30-day timeout associated with the socket connection.  Now, this long
timeout is needed in order for long-running tasks to not to timeout
the master-worker connection.   However, when it comes to the actual
setup of the connection, then it would be able to detect connection
issues earlier than that.  For example, if the socket connection
cannot be established within 60 seconds, then it is very likely that
the worker machine couldn't be reached, especially for connecting to
remote machines over SSH.

The current code of parallel:::newPSOCKnode() basically does:

system("ssh remote.server.org Rscript -e <launch worker and connect
back>", wait = FALSE)
con <- socketConnection("localhost", port = 11000, server = TRUE,
blocking = TRUE, open = "a+b", timeout = 30*24*60*60)

If the remote SSH system call fails to reach or set up the worker, the
following call to socketConnection() will sit there and wait for 30
days.  Ideally one could solve this as:

system("ssh remote.server.org Rscript -e <launch worker and connect
back>", wait = FALSE)
setTimeLimit(elapsed=60)
con <- socketConnection("localhost", port = 11000, server = TRUE,
blocking = TRUE, open = "a+b", timeout = 30*24*60*60)

Thanks,

Henrik
>
> Best,
>
> luke
>
>
> On Wed, 26 Oct 2016, peter dalgaard wrote:
>
>> Spencer also had tools and rsconnect loaded (via a namespace) but it
>> doesn't seem to make a difference for me if I load them. It also
doesn't
>> seem to matter for me whether it is CRAN R, locally built R, Terminal,
>> R.app. However, RStudio differs
>>
>>> setTimeLimit(elapsed=1)
>>
>> Error: reached elapsed time limit
>>>
>>> setTimeLimit(elapsed=1)
>>
>> Error: reached elapsed time limit
>>>
>>> setTimeLimit(elapsed=1);
system.time({Sys.sleep(10);message("done")})
>>
>> Error in Sys.sleep(10) : reached elapsed time limit
>> Timing stopped at: 0.003 0.003 0.733
>>
>> -pd
>>
>>
>>> On 26 Oct 2016, at 21:54 , Henrik Bengtsson <henrik.bengtsson at
gmail.com>
>>> wrote:
>>>
>>> Thank you for the feedback and confirmations.  Interesting to see
that
>>> it's also reproducible on macOS expect for Spencer; that might
>>> indicate a difference in builds.
>>>
>>> BTW, my original post suggested that timeout error was for sure
>>> detected while running Sys.sleep(10).  However, it could of course
>>> also be that it is only detected after it finishes.
>>>
>>>
>>> For troubleshooting, the help("setTimeLimit", package =
"base") says
>>> that:
>>>
>>> * "Time limits are checked whenever a user interrupt could
occur. This
>>> will happen frequently in R code and during Sys.sleep, but only at
>>> points in compiled C and Fortran code identified by the code
author."
>>>
>>> The example here uses Sys.sleep(), which supports and detects user
>>> interrupts.
>>>
>>>
>>> The timeout error message is thrown by the R_ProcessEvents(void)
>>> function as defined in:
>>>
>>> * src/unix/sys-unix.c
>>>
>>>
(https://github.com/wch/r-source/blob/trunk/src/unix/sys-unix.c#L421-L453)
>>> * src/gnuwin32/system.c
>>>
>>>
(https://github.com/wch/r-source/blob/trunk/src/gnuwin32/system.c#L110-L140)
>>>
>>> So, they're clearly different implementations on Windows and
Unix.
>>> Also, for the Unix implementation, the code differ based on
>>> preprocessing directive HAVE_AQUA, which could explain why Spencer
>>> observes a different behavior than Peter and Berend (all on macOS).
>>>
>>>
>>> Whenever the R_CheckUserInterrupt() function is called it in turn
>>> always calls R_ProcessEvents().  At the end, there is a code
snippet -
>>> if (R_interrupts_pending) onintr(); - which is Windows specific and
>>> could be another important difference between Windows and Unix. 
This
>>> function is defined in:
>>>
>>> * src/main/errors.c
>>>
(https://github.com/wch/r-source/blob/trunk/src/main/errors.c#L114-L134)
>>>
>>>
>>> The do_setTimeLimit() function controls global variables
cpuLimitValue
>>> and elapsedLimitValue, which are checked in R_ProcessEvents(), but
>>> other than setting the timeout limits I don't think it's
involved in
>>> the runtime checks. The do_setTimeLimit() is defined in:
>>>
>>> * src/main/sysutils.c
>>>
>>>
(https://github.com/wch/r-source/blob/trunk/src/main/sysutils.c#L1692-L1736)
>>>
>>>
>>> Unfortunately, right now, I've got little extra time to
troubleshoot
>>> this further.
>>>
>>> /Henrik
>>>
>>> On Wed, Oct 26, 2016 at 2:22 AM, Berend Hasselman <bhh at
xs4all.nl> wrote:
>>>>
>>>>
>>>>> On 26 Oct 2016, at 04:44, Henrik Bengtsson
<henrik.bengtsson at gmail.com>
>>>>> wrote:
>>>>> .......
>>>>> This looks like a bug to me.  Can anyone on macOS confirm
whether this
>>>>> is also a problem there or not?
>>>>>
>>>>
>>>>
>>>> Tried it on macOS El Capitan and got this (running in R.app
with R
>>>> version 3.3.2 RC (2016-10-23 r71574):
>>>>
>>>>> setTimeLimit(elapsed=1)
>>>>> system.time({ Sys.sleep(10); message("done") })
>>>>
>>>> Error in Sys.sleep(10) : reached elapsed time limit
>>>> Timing stopped at: 0.113 0.042 10.038
>>>>
>>>> Berend
>>>>
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa                  Phone:             319-335-3386
> Department of Statistics and        Fax:               319-335-3017
>    Actuarial Science
> 241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

luke-tierney at uiowa.edu

2016-Oct-31 16:36 UTC

head link

[Rd] BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)

On Mon, 31 Oct 2016, Henrik Bengtsson wrote:
> Thank you for looking into this Luke.
>
> On Thu, Oct 27, 2016 at 9:26 AM,  <luke-tierney at uiowa.edu> wrote:
>> On unix, unless event polling is enabled Sys.sleep just waits in a
>> select() call (with a SIGINT handler in place) so the elapsed time
>> isn't checked until after the select call is complete. Rstudio uses
>> event polling, and in particular sets R_wait_usec to 10000, which
>> means event and interrupt checks happen during a Sys.seep call.  The R
>> GUI on macOS doesn't seem to do this (but my lldb skills aren't
up to
>> checking). Now that we have this elapsed time limit mechanism it might
>> be a good idea to set the default for R_wait_usec to something
>> reasonable on unix in general. 100000 might be a good value.
>>
>
>> A more worrying thing I noticed while looking at this is that blocking
>> reads on fifos and pipes and probably sockets are not interruptable --
>> that should probably be looked into.
I'll address the sleep issue sometime soon but I won't be able to look
into the blocking read issue for many months. SOmeone else might have
a chance to look earlier.

But for the situation you describe below using setTimeLimit doesn't
seem like the right approach. The parallel code is not written for
situations that need this kind of fault tolerance; it is not robust to
user interrupts and would not be to timer interrupts either. If you
are concerned that some potential workers might not be available then
you would be better checking that with a ping or simple ssh commend
first before starting a cluster on the available nodes.

Best,

luke
>
> This is actually related to the use case where I want to use
> setTimeLimit().  When using parallel:::newPSOCKnode(), there's a
> 30-day timeout associated with the socket connection.  Now, this long
> timeout is needed in order for long-running tasks to not to timeout
> the master-worker connection.   However, when it comes to the actual
> setup of the connection, then it would be able to detect connection
> issues earlier than that.  For example, if the socket connection
> cannot be established within 60 seconds, then it is very likely that
> the worker machine couldn't be reached, especially for connecting to
> remote machines over SSH.
>
> The current code of parallel:::newPSOCKnode() basically does:
>
> system("ssh remote.server.org Rscript -e <launch worker and connect
> back>", wait = FALSE)
> con <- socketConnection("localhost", port = 11000, server =
TRUE,
> blocking = TRUE, open = "a+b", timeout = 30*24*60*60)
>
> If the remote SSH system call fails to reach or set up the worker, the
> following call to socketConnection() will sit there and wait for 30
> days.  Ideally one could solve this as:
>
> system("ssh remote.server.org Rscript -e <launch worker and connect
> back>", wait = FALSE)
> setTimeLimit(elapsed=60)
> con <- socketConnection("localhost", port = 11000, server =
TRUE,
> blocking = TRUE, open = "a+b", timeout = 30*24*60*60)
>
> Thanks,
>
> Henrik
>
>>
>> Best,
>>
>> luke
>>
>>
>> On Wed, 26 Oct 2016, peter dalgaard wrote:
>>
>>> Spencer also had tools and rsconnect loaded (via a namespace) but
it
>>> doesn't seem to make a difference for me if I load them. It
also doesn't
>>> seem to matter for me whether it is CRAN R, locally built R,
Terminal,
>>> R.app. However, RStudio differs
>>>
>>>> setTimeLimit(elapsed=1)
>>>
>>> Error: reached elapsed time limit
>>>>
>>>> setTimeLimit(elapsed=1)
>>>
>>> Error: reached elapsed time limit
>>>>
>>>> setTimeLimit(elapsed=1);
system.time({Sys.sleep(10);message("done")})
>>>
>>> Error in Sys.sleep(10) : reached elapsed time limit
>>> Timing stopped at: 0.003 0.003 0.733
>>>
>>> -pd
>>>
>>>
>>>> On 26 Oct 2016, at 21:54 , Henrik Bengtsson
<henrik.bengtsson at gmail.com>
>>>> wrote:
>>>>
>>>> Thank you for the feedback and confirmations.  Interesting to
see that
>>>> it's also reproducible on macOS expect for Spencer; that
might
>>>> indicate a difference in builds.
>>>>
>>>> BTW, my original post suggested that timeout error was for sure
>>>> detected while running Sys.sleep(10).  However, it could of
course
>>>> also be that it is only detected after it finishes.
>>>>
>>>>
>>>> For troubleshooting, the help("setTimeLimit", package
= "base") says
>>>> that:
>>>>
>>>> * "Time limits are checked whenever a user interrupt could
occur. This
>>>> will happen frequently in R code and during Sys.sleep, but only
at
>>>> points in compiled C and Fortran code identified by the code
author."
>>>>
>>>> The example here uses Sys.sleep(), which supports and detects
user
>>>> interrupts.
>>>>
>>>>
>>>> The timeout error message is thrown by the
R_ProcessEvents(void)
>>>> function as defined in:
>>>>
>>>> * src/unix/sys-unix.c
>>>>
>>>>
(https://github.com/wch/r-source/blob/trunk/src/unix/sys-unix.c#L421-L453)
>>>> * src/gnuwin32/system.c
>>>>
>>>>
(https://github.com/wch/r-source/blob/trunk/src/gnuwin32/system.c#L110-L140)
>>>>
>>>> So, they're clearly different implementations on Windows
and Unix.
>>>> Also, for the Unix implementation, the code differ based on
>>>> preprocessing directive HAVE_AQUA, which could explain why
Spencer
>>>> observes a different behavior than Peter and Berend (all on
macOS).
>>>>
>>>>
>>>> Whenever the R_CheckUserInterrupt() function is called it in
turn
>>>> always calls R_ProcessEvents().  At the end, there is a code
snippet -
>>>> if (R_interrupts_pending) onintr(); - which is Windows specific
and
>>>> could be another important difference between Windows and Unix.
This
>>>> function is defined in:
>>>>
>>>> * src/main/errors.c
>>>>
(https://github.com/wch/r-source/blob/trunk/src/main/errors.c#L114-L134)
>>>>
>>>>
>>>> The do_setTimeLimit() function controls global variables
cpuLimitValue
>>>> and elapsedLimitValue, which are checked in R_ProcessEvents(),
but
>>>> other than setting the timeout limits I don't think
it's involved in
>>>> the runtime checks. The do_setTimeLimit() is defined in:
>>>>
>>>> * src/main/sysutils.c
>>>>
>>>>
(https://github.com/wch/r-source/blob/trunk/src/main/sysutils.c#L1692-L1736)
>>>>
>>>>
>>>> Unfortunately, right now, I've got little extra time to
troubleshoot
>>>> this further.
>>>>
>>>> /Henrik
>>>>
>>>> On Wed, Oct 26, 2016 at 2:22 AM, Berend Hasselman <bhh at
xs4all.nl> wrote:
>>>>>
>>>>>
>>>>>> On 26 Oct 2016, at 04:44, Henrik Bengtsson
<henrik.bengtsson at gmail.com>
>>>>>> wrote:
>>>>>> .......
>>>>>> This looks like a bug to me.  Can anyone on macOS
confirm whether this
>>>>>> is also a problem there or not?
>>>>>>
>>>>>
>>>>>
>>>>> Tried it on macOS El Capitan and got this (running in R.app
with R
>>>>> version 3.3.2 RC (2016-10-23 r71574):
>>>>>
>>>>>> setTimeLimit(elapsed=1)
>>>>>> system.time({ Sys.sleep(10); message("done")
})
>>>>>
>>>>> Error in Sys.sleep(10) : reached elapsed time limit
>>>>> Timing stopped at: 0.113 0.042 10.038
>>>>>
>>>>> Berend
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>>
>>>
>>
>> --
>> Luke Tierney
>> Ralph E. Wareham Professor of Mathematical Sciences
>> University of Iowa                  Phone:             319-335-3386
>> Department of Statistics and        Fax:               319-335-3017
>>    Actuarial Science
>> 241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
>> Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu
>
-- 
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:   luke-tierney at uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Maybe Matching Threads

Search for more maybe matching threads

R devel - Oct 2016 - BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)

[Rd] BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)

[Rd] BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)

[Rd] BUG?: On Linux setTimeLimit() fails to propagate timeout error when it occurs (works on Windows)

Maybe Matching Threads