Christian Borntraeger
2020-Jul-01 15:49 UTC
[Bridge] linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
On 01.07.20 17:38, Luis Chamberlain wrote:> On Wed, Jul 01, 2020 at 11:08:57PM +0900, Tetsuo Handa wrote: >> On 2020/07/01 22:53, Luis Chamberlain wrote: >>>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting >>>> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. >>>> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix >>>> processed error when UMH_WAIT_PROC is used" will be the correct behavior). >>> >>> br_stp_start() doesn't check for the raw value, it just checks for err >>> or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is >>> used" propagates the correct error now. >> >> No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV >> (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line), >> br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that >> /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't >> ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()). > > Ah, well that would be a different fix required, becuase again, > br_stp_start() does not untangle the correct error today really. > I also I think it would be odd odd that SIGSEGV or another signal > is what was terminating Christian's bridge stp call, but let's > find out! > > Note we pass 0 to the options to wait so the mistake here could indeed > be that we did not need KWIFSIGNALED(). I was afraid of this prospect... > as it other implications. > > It means we either *open code* all callers, or we handle this in a > unified way on the umh. And if we do handle this in a unified way, it > then begs the question as to *what* do we pass for the signals case and > continued case. Below we just pass the signal, and treat continued as > OK, but treating continued as OK would also be a *new* change as well. > > For instance (this goes just boot tested, but Christian if you can > try this as well that would be appreciated):Does not help, the bridge stays in DOWN state.
Luis Chamberlain
2020-Jul-01 15:58 UTC
[Bridge] linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
On Wed, Jul 01, 2020 at 05:48:48PM +0200, Christian Borntraeger wrote:> > > On 01.07.20 17:38, Luis Chamberlain wrote: > > On Wed, Jul 01, 2020 at 11:08:57PM +0900, Tetsuo Handa wrote: > >> On 2020/07/01 22:53, Luis Chamberlain wrote: > >>>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting > >>>> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. > >>>> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix > >>>> processed error when UMH_WAIT_PROC is used" will be the correct behavior). > >>> > >>> br_stp_start() doesn't check for the raw value, it just checks for err > >>> or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is > >>> used" propagates the correct error now. > >> > >> No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV > >> (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line), > >> br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that > >> /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't > >> ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()). > > > > Ah, well that would be a different fix required, becuase again, > > br_stp_start() does not untangle the correct error today really. > > I also I think it would be odd odd that SIGSEGV or another signal > > is what was terminating Christian's bridge stp call, but let's > > find out! > > > > Note we pass 0 to the options to wait so the mistake here could indeed > > be that we did not need KWIFSIGNALED(). I was afraid of this prospect... > > as it other implications. > > > > It means we either *open code* all callers, or we handle this in a > > unified way on the umh. And if we do handle this in a unified way, it > > then begs the question as to *what* do we pass for the signals case and > > continued case. Below we just pass the signal, and treat continued as > > OK, but treating continued as OK would also be a *new* change as well. > > > > For instance (this goes just boot tested, but Christian if you can > > try this as well that would be appreciated): > > > Does not help, the bridge stays in DOWN state.OK thanks for testing, that was fast! Does your code go through the STP kernel path or userpath? If it is taking the STP kernel path then this is not the real culprit to your issue then. Luis