Tetsuo Handa
2020-Jul-01 13:24 UTC
[Bridge] linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
On 2020/07/01 19:08, Christian Borntraeger wrote:> > > On 30.06.20 19:57, Luis Chamberlain wrote: >> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote: >>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 24.06.20 20:32, Christian Borntraeger wrote: >>>> [...]> >>>>> So the translations look correct. But your change is actually a sematic change >>>>> if(ret) will only trigger if there is an error >>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD >>>>> and we did not do it before. >>>>> >>>> >>>> So the right fix is >>>> >>>> diff --git a/kernel/umh.c b/kernel/umh.c >>>> index f81e8698e36e..a3a3196e84d1 100644 >>>> --- a/kernel/umh.c >>>> +++ b/kernel/umh.c >>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) >>>> * the real error code is already in sub_info->retval or >>>> * sub_info->retval is 0 anyway, so don't mess with it then. >>>> */ >>>> - if (KWIFEXITED(ret)) >>>> + if (KWEXITSTATUS(ret)) >>>> sub_info->retval = KWEXITSTATUS(ret);Well, it is not br_stp_call_user() but br_stp_start() which is expecting to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix processed error when UMH_WAIT_PROC is used" will be the correct behavior).
Luis Chamberlain
2020-Jul-01 13:53 UTC
[Bridge] linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
On Wed, Jul 01, 2020 at 10:24:29PM +0900, Tetsuo Handa wrote:> On 2020/07/01 19:08, Christian Borntraeger wrote: > > > > > > On 30.06.20 19:57, Luis Chamberlain wrote: > >> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote: > >>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: > >>>> > >>>> > >>>> On 24.06.20 20:32, Christian Borntraeger wrote: > >>>> [...]> > >>>>> So the translations look correct. But your change is actually a sematic change > >>>>> if(ret) will only trigger if there is an error > >>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD > >>>>> and we did not do it before. > >>>>> > >>>> > >>>> So the right fix is > >>>> > >>>> diff --git a/kernel/umh.c b/kernel/umh.c > >>>> index f81e8698e36e..a3a3196e84d1 100644 > >>>> --- a/kernel/umh.c > >>>> +++ b/kernel/umh.c > >>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > >>>> * the real error code is already in sub_info->retval or > >>>> * sub_info->retval is 0 anyway, so don't mess with it then. > >>>> */ > >>>> - if (KWIFEXITED(ret)) > >>>> + if (KWEXITSTATUS(ret)) > >>>> sub_info->retval = KWEXITSTATUS(ret); > > Well, it is not br_stp_call_user() but br_stp_start() which is expecting > to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. > That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix > processed error when UMH_WAIT_PROC is used" will be the correct behavior).br_stp_start() doesn't check for the raw value, it just checks for err or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is used" propagates the correct error now. Christian, can you try removing the binary temporarily and seeing if you get your bridge working? Luis
Tetsuo Handa
2020-Jul-01 14:08 UTC
[Bridge] linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
On 2020/07/01 22:53, Luis Chamberlain wrote:>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting >> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. >> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix >> processed error when UMH_WAIT_PROC is used" will be the correct behavior). > > br_stp_start() doesn't check for the raw value, it just checks for err > or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is > used" propagates the correct error now.No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line), br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()).