Christian Borntraeger
2020-Jun-24 18:32 UTC
[Bridge] linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
On 24.06.20 20:09, Christian Borntraeger wrote:> > > On 24.06.20 19:58, Christian Borntraeger wrote: >> >> >> On 24.06.20 18:09, Luis Chamberlain wrote: >>> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 24.06.20 16:43, Christoph Hellwig wrote: >>>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: >>>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the >>>>>> linux-bridge on s390? >>>>> >>>>> Are we even sure this is s390 specific and doesn't happen on other >>>>> architectures with the same bridge setup? >>>> >>>> Fair point. AFAIK nobody has tested this yet on x86. >>> >>> Regardless, can you enable dynamic debug prints, to see if the kernel >>> reveals anything on the bridge code which may be relevant: >>> >>> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control >>> >>> Luis >> >> When I start a guest the following happens with the patch: >> >> [ 47.420237] virbr0: port 2(vnet0) entered blocking state >> [ 47.420242] virbr0: port 2(vnet0) entered disabled state >> [ 47.420315] device vnet0 entered promiscuous mode >> [ 47.420365] virbr0: port 2(vnet0) event 16 >> [ 47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 47.420373] virbr0: toggle option: 12 state: 0 -> 0 >> [ 47.420536] virbr0: port 2(vnet0) entered blocking state >> [ 47.420538] virbr0: port 2(vnet0) event 16 >> [ 47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> >> and the nothing happens. >> >> >> without the patch >> [ 33.805410] virbr0: hello timer expired >> [ 35.805413] virbr0: hello timer expired >> [ 36.184349] virbr0: port 2(vnet0) entered blocking state >> [ 36.184353] virbr0: port 2(vnet0) entered disabled state >> [ 36.184427] device vnet0 entered promiscuous mode >> [ 36.184479] virbr0: port 2(vnet0) event 16 >> [ 36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 36.184487] virbr0: toggle option: 12 state: 0 -> 0 >> [ 36.184636] virbr0: port 2(vnet0) entered blocking state >> [ 36.184638] virbr0: port 2(vnet0) entered listening state >> [ 36.184639] virbr0: port 2(vnet0) event 16 >> [ 36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 36.184645] virbr0: port 2(vnet0) event 16 >> [ 36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 37.805478] virbr0: hello timer expired >> [ 38.205413] virbr0: port 2(vnet0) forward delay timer >> [ 38.205414] virbr0: port 2(vnet0) entered learning state >> [ 38.205427] virbr0: port 2(vnet0) event 16 >> [ 38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 38.765414] virbr0: port 2(vnet0) hold timer expired >> [ 39.805415] virbr0: hello timer expired >> [ 40.285410] virbr0: port 2(vnet0) forward delay timer >> [ 40.285411] virbr0: port 2(vnet0) entered forwarding state >> [ 40.285418] virbr0: topology change detected, propagating >> [ 40.285420] virbr0: decreasing ageing time to 400 >> [ 40.285427] virbr0: port 2(vnet0) event 16 >> [ 40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 40.765408] virbr0: port 2(vnet0) hold timer expired >> [ 41.805415] virbr0: hello timer expired >> [ 42.765426] virbr0: port 2(vnet0) hold timer expired >> [ 43.805425] virbr0: hello timer expired >> [ 44.765426] virbr0: port 2(vnet0) hold timer expired >> [ 45.805418] virbr0: hello timer expired >> >> and continuing.... > > Just reverting the umh.c parts like this makes the problem go away. > > diff --git a/kernel/umh.c b/kernel/umh.c > index f81e8698e36e..79f139a7ca03 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > * the real error code is already in sub_info->retval or > * sub_info->retval is 0 anyway, so don't mess with it then. > */ > - if (KWIFEXITED(ret)) > - sub_info->retval = KWEXITSTATUS(ret); > + if (ret) > + sub_info->retval = ret; > } > > /* Restore default kernel sig handler */I instrumented this: [ 5.118528] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 9.409235] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.114914] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.116308] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.117690] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.118732] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.119899] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.121365] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.128044] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.945814] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.962799] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.983755] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.992705] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.118877] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.124359] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.129364] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.142139] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.148385] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.153686] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.158861] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.164068] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.192445] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.200605] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.208493] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.216612] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.226467] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.525363] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.532758] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.535279] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.697660] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.701634] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.818652] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.836228] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 12.082266] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 40.965898] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 So the translations look correct. But your change is actually a sematic change if(ret) will only trigger if there is an error if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD and we did not do it before.
Christian Borntraeger
2020-Jun-24 18:38 UTC
[Bridge] linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
On 24.06.20 20:32, Christian Borntraeger wrote: [...]>> So the translations look correct. But your change is actually a sematic change > if(ret) will only trigger if there is an error > if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD > and we did not do it before. >So the right fix is diff --git a/kernel/umh.c b/kernel/umh.c index f81e8698e36e..a3a3196e84d1 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) * the real error code is already in sub_info->retval or * sub_info->retval is 0 anyway, so don't mess with it then. */ - if (KWIFEXITED(ret)) + if (KWEXITSTATUS(ret)) sub_info->retval = KWEXITSTATUS(ret); } I think.