On 8/22/20 8:26 PM, Tomas Kalibera wrote:> On 8/22/20 7:58 PM, Jeroen Ooms wrote: >> On Sat, Aug 22, 2020 at 8:39 AM Tomas Kalibera >> <tomas.kalibera at gmail.com> wrote: >>> On 8/21/20 11:45 PM, m19tdn+9alxwj7d2bmk--- via R-devel wrote: >>>> Ah yes, this is related. I reported v2010 below, but it looks like >>>> I was updated to this Insider Build overnight without my knowledge, >>>> and conflated it with the new installation R v4 this morning. >>>> >>>> I will continue to look into the issue with the methods Tomas >>>> mentioned. >>> It is interesting that a rare 5 years old problem would re-appear on >>> current Insider builds. Which build of Windows are you running exactly? >>> I've seen another report about a crash on 20190.1000. It'd be nice to >>> know if it is present also in newer builds, i.e. in 20197. >> I installed the latest 20197 build in a vm, and I can indeed reproduce >> this problem. >> >> What seems to be happening is that R triggers an infinite recursion in >> Windows unwinding mechanism, and eventually dies with a stack >> overflow. Attached a backtrace of the initial 100 frames of the main >> thread (the pattern in the top ~30 frames continues forever). >> >> The microsoft blog doesn't mention anything related to exception >> handling has changed in recent versions: >> https://docs.microsoft.com/en-us/windows-insider/at-home/active-dev-branch >> > > Thanks, unfortunately that does not ring any bells (except below), I > can't guess from this what is the underlying cause of the problem. > There may be something wrong in how we use setjmp/longjmp or how > setjmp/longjmp works on Windows. > > It reminds me of a problem I've been debugging few days ago, when > longjump implementation segfaults on Windows 10 (recent but not > Insider build) probably soon after unwinding the stack, but only with > GCC 10 / MinGW 7 and only in one of the no-segfault tests, and only > with -03 (not -O2, not with with -O3 -fno-split-loops). The problem > was sensitive to these optimization options interestingly on the call > site of long jump (do_abs), even when it was not an immediate caller > of the longjump. I've not tracked this down yet, it will require > looking at the assembly level, and I was suspecting a compiler error > causing the compiler to generate code that messes with the stack or > registers in a way that impacts the upcoming jump. But now as we have > this other problem with setjmp/logjmp, the compiler may not be the top > suspect anymore. > > I may not be able to work on this in the next few days or a week, so > if anyone gets there first, please let me know what you find out.Btw could you please try out if the UCRT build of R crashes as well in the Insider Windows build ? https://www.r-project.org/nosvn/winutf8/R-devel-win.exe Thanks Tomas (from https://developer.r-project.org/Blog/public/2020/07/30/windows/utf-8-build-of-r-and-cran-packages)> > Thanks, > Tomas > >
On Sat, Aug 22, 2020 at 9:10 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> > On 8/22/20 8:26 PM, Tomas Kalibera wrote: > > On 8/22/20 7:58 PM, Jeroen Ooms wrote: > >> On Sat, Aug 22, 2020 at 8:39 AM Tomas Kalibera > >> <tomas.kalibera at gmail.com> wrote: > >>> On 8/21/20 11:45 PM, m19tdn+9alxwj7d2bmk--- via R-devel wrote: > >>>> Ah yes, this is related. I reported v2010 below, but it looks like > >>>> I was updated to this Insider Build overnight without my knowledge, > >>>> and conflated it with the new installation R v4 this morning. > >>>> > >>>> I will continue to look into the issue with the methods Tomas > >>>> mentioned. > >>> It is interesting that a rare 5 years old problem would re-appear on > >>> current Insider builds. Which build of Windows are you running exactly? > >>> I've seen another report about a crash on 20190.1000. It'd be nice to > >>> know if it is present also in newer builds, i.e. in 20197. > >> I installed the latest 20197 build in a vm, and I can indeed reproduce > >> this problem. > >> > >> What seems to be happening is that R triggers an infinite recursion in > >> Windows unwinding mechanism, and eventually dies with a stack > >> overflow. Attached a backtrace of the initial 100 frames of the main > >> thread (the pattern in the top ~30 frames continues forever). > >> > >> The microsoft blog doesn't mention anything related to exception > >> handling has changed in recent versions: > >> https://docs.microsoft.com/en-us/windows-insider/at-home/active-dev-branch > >> > > > > Thanks, unfortunately that does not ring any bells (except below), I > > can't guess from this what is the underlying cause of the problem. > > There may be something wrong in how we use setjmp/longjmp or how > > setjmp/longjmp works on Windows. > > > > It reminds me of a problem I've been debugging few days ago, when > > longjump implementation segfaults on Windows 10 (recent but not > > Insider build) probably soon after unwinding the stack, but only with > > GCC 10 / MinGW 7 and only in one of the no-segfault tests, and only > > with -03 (not -O2, not with with -O3 -fno-split-loops). The problem > > was sensitive to these optimization options interestingly on the call > > site of long jump (do_abs), even when it was not an immediate caller > > of the longjump. I've not tracked this down yet, it will require > > looking at the assembly level, and I was suspecting a compiler error > > causing the compiler to generate code that messes with the stack or > > registers in a way that impacts the upcoming jump. But now as we have > > this other problem with setjmp/logjmp, the compiler may not be the top > > suspect anymore. > > > > I may not be able to work on this in the next few days or a week, so > > if anyone gets there first, please let me know what you find out. > > Btw could you please try out if the UCRT build of R crashes as well in > the Insider Windows build ?Yes, it hangs in exactly the same way, except that the backtrace shows ucrtbase!.intrinsic_setjmpex () from C:\WINDOWS\System32\ucrtbase.dll Instead of msvcrt!_setjmpex (as expected of course).
On 8/22/20 9:33 PM, Jeroen Ooms wrote:> On Sat, Aug 22, 2020 at 9:10 PM Tomas Kalibera <tomas.kalibera at gmail.com> wrote: >> On 8/22/20 8:26 PM, Tomas Kalibera wrote: >>> On 8/22/20 7:58 PM, Jeroen Ooms wrote: >>>> On Sat, Aug 22, 2020 at 8:39 AM Tomas Kalibera >>>> <tomas.kalibera at gmail.com> wrote: >>>>> On 8/21/20 11:45 PM, m19tdn+9alxwj7d2bmk--- via R-devel wrote: >>>>>> Ah yes, this is related. I reported v2010 below, but it looks like >>>>>> I was updated to this Insider Build overnight without my knowledge, >>>>>> and conflated it with the new installation R v4 this morning. >>>>>> >>>>>> I will continue to look into the issue with the methods Tomas >>>>>> mentioned. >>>>> It is interesting that a rare 5 years old problem would re-appear on >>>>> current Insider builds. Which build of Windows are you running exactly? >>>>> I've seen another report about a crash on 20190.1000. It'd be nice to >>>>> know if it is present also in newer builds, i.e. in 20197. >>>> I installed the latest 20197 build in a vm, and I can indeed reproduce >>>> this problem. >>>> >>>> What seems to be happening is that R triggers an infinite recursion in >>>> Windows unwinding mechanism, and eventually dies with a stack >>>> overflow. Attached a backtrace of the initial 100 frames of the main >>>> thread (the pattern in the top ~30 frames continues forever). >>>> >>>> The microsoft blog doesn't mention anything related to exception >>>> handling has changed in recent versions: >>>> https://docs.microsoft.com/en-us/windows-insider/at-home/active-dev-branch >>>> >>> Thanks, unfortunately that does not ring any bells (except below), I >>> can't guess from this what is the underlying cause of the problem. >>> There may be something wrong in how we use setjmp/longjmp or how >>> setjmp/longjmp works on Windows. >>> >>> It reminds me of a problem I've been debugging few days ago, when >>> longjump implementation segfaults on Windows 10 (recent but not >>> Insider build) probably soon after unwinding the stack, but only with >>> GCC 10 / MinGW 7 and only in one of the no-segfault tests, and only >>> with -03 (not -O2, not with with -O3 -fno-split-loops). The problem >>> was sensitive to these optimization options interestingly on the call >>> site of long jump (do_abs), even when it was not an immediate caller >>> of the longjump. I've not tracked this down yet, it will require >>> looking at the assembly level, and I was suspecting a compiler error >>> causing the compiler to generate code that messes with the stack or >>> registers in a way that impacts the upcoming jump. But now as we have >>> this other problem with setjmp/logjmp, the compiler may not be the top >>> suspect anymore. >>> >>> I may not be able to work on this in the next few days or a week, so >>> if anyone gets there first, please let me know what you find out. >> Btw could you please try out if the UCRT build of R crashes as well in >> the Insider Windows build ? > Yes, it hangs in exactly the same way, except that the backtrace shows > > ucrtbase!.intrinsic_setjmpex () from C:\WINDOWS\System32\ucrtbase.dll > > Instead of msvcrt!_setjmpex (as expected of course).Thanks. I found what is causing the problem I observed with GCC10/stock Windows 10, I expect this is the same one as in the Insider build. I will investigate further, Tomas