xen.org
2012-Oct-17 05:24 UTC
[xen-unstable bisection] complete test-amd64-i386-qemuu-rhel6hvm-amd
branch xen-unstable xen branch xen-unstable job test-amd64-i386-qemuu-rhel6hvm-amd test xen-boot Tree: linux git://xenbits.xen.org/linux-pvops.git Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git Tree: xen http://xenbits.xen.org/hg/staging/xen-unstable.hg *** Found and reproduced problem changeset *** Bug is in tree: xen http://xenbits.xen.org/hg/staging/xen-unstable.hg Bug introduced: 4fc87c2f31a0 Bug not present: c1c549c4fe9e changeset: 26060:4fc87c2f31a0 tag: tip user: Huang Ying <ying.huang@intel.com> date: Tue Oct 16 17:26:36 2012 +0200 ACPI: fix APEI related table size checking On Huang Ying''s machine: erst_tab->header_length == sizeof(struct acpi_table_einj) but Yinghai reported that on his machine, erst_tab->header_length == sizeof(struct acpi_table_einj) - sizeof(struct acpi_table_header) To make erst table size checking code works on all systems, both testing are treated as PASS. Same situation applies to einj_tab->header_length, so corresponding table size checking is changed in similar way too. Originally-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Huang Ying <ying.huang@intel.com> - use switch() for better readability - add comment explaining why a formally invalid size it also being accepted - check erst_tab->header.length before even looking at erst_tab->header_length - prefer sizeof(*erst_tab) over sizeof(struct acpi_table_erst) Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com> For bisection revision-tuple graph see: http://www.chiark.greenend.org.uk/~xensrcts/results/bisect.xen-unstable.test-amd64-i386-qemuu-rhel6hvm-amd.xen-boot.html Revision IDs in each graph node refer, respectively, to the Trees above. ---------------------------------------- Searching for failure / basis pass: 13979 fail [host=leaf-beetle] / 13967 [host=potato-beetle] 13963 [host=lace-bug] 13962 [host=moss-bug] 13960 ok. Failure / basis pass flights: 13979 / 13960 Tree: linux git://xenbits.xen.org/linux-pvops.git Tree: qemu git://xenbits.xen.org/staging/qemu-xen-unstable.git Tree: qemuu git://xenbits.xen.org/staging/qemu-upstream-unstable.git Tree: xen http://xenbits.xen.org/hg/staging/xen-unstable.hg Latest a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 4fc87c2f31a0 Basis pass a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 e0e1350dfe9b Generating revisions with ./adhoc-revtuple-generator git://xenbits.xen.org/linux-pvops.git#a938a246d34912423c560f475ccf1ce0c71d9d00-a938a246d34912423c560f475ccf1ce0c71d9d00 git://xenbits.xen.org/staging/qemu-xen-unstable.git#bacc0d302445c75f18f4c826750fb5853b60e7ca-bacc0d302445c75f18f4c826750fb5853b60e7ca git://xenbits.xen.org/staging/qemu-upstream-unstable.git#cdf4d2bb4006805f209712fbb8ed1f83127e9984-cdf4d2bb4006805f209712fbb8ed1f83127e9984 http://xenbits.xen.org/hg/staging/xen-unstable.hg#e0e1350dfe9b-4fc87c2f31a0 pulling from ssh://xen@xenbits.xen.org/HG/staging/xen-unstable.hg searching for changes no changes found pulling from ssh://xen@xenbits.xen.org/HG/staging/xen-unstable.hg searching for changes no changes found Loaded 239 nodes in revision graph Searching for test results: 13958 [host=potato-beetle] 13960 pass a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 e0e1350dfe9b 13967 [host=potato-beetle] 13978 pass a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 c1c549c4fe9e 13962 [host=moss-bug] 13972 fail a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 4fc87c2f31a0 13963 [host=lace-bug] 13956 [host=lace-bug] 13980 fail a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 4fc87c2f31a0 13964 [] 13974 pass a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 e0e1350dfe9b 13981 pass a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 c1c549c4fe9e 13975 fail a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 4fc87c2f31a0 13976 pass a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 177fdda0be56 13982 fail a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 4fc87c2f31a0 13973 fail a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 4fc87c2f31a0 13977 pass a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 37bb894121c7 13979 fail a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 4fc87c2f31a0 13983 pass a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 c1c549c4fe9e 13984 fail a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 4fc87c2f31a0 Searching for interesting versions Result found: flight 13960 (pass), for basis pass Result found: flight 13972 (fail), for basis failure Repro found: flight 13974 (pass), for basis pass Repro found: flight 13975 (fail), for basis failure 0 revisions at a938a246d34912423c560f475ccf1ce0c71d9d00 bacc0d302445c75f18f4c826750fb5853b60e7ca cdf4d2bb4006805f209712fbb8ed1f83127e9984 c1c549c4fe9e No revisions left to test, checking graph state. Result found: flight 13978 (pass), for last pass Result found: flight 13979 (fail), for first failure Repro found: flight 13981 (pass), for last pass Repro found: flight 13982 (fail), for first failure Repro found: flight 13983 (pass), for last pass Repro found: flight 13984 (fail), for first failure *** Found and reproduced problem changeset *** Bug is in tree: xen http://xenbits.xen.org/hg/staging/xen-unstable.hg Bug introduced: 4fc87c2f31a0 Bug not present: c1c549c4fe9e pulling from ssh://xen@xenbits.xen.org/HG/staging/xen-unstable.hg searching for changes no changes found changeset: 26060:4fc87c2f31a0 tag: tip user: Huang Ying <ying.huang@intel.com> date: Tue Oct 16 17:26:36 2012 +0200 ACPI: fix APEI related table size checking On Huang Ying''s machine: erst_tab->header_length == sizeof(struct acpi_table_einj) but Yinghai reported that on his machine, erst_tab->header_length == sizeof(struct acpi_table_einj) - sizeof(struct acpi_table_header) To make erst table size checking code works on all systems, both testing are treated as PASS. Same situation applies to einj_tab->header_length, so corresponding table size checking is changed in similar way too. Originally-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Huang Ying <ying.huang@intel.com> - use switch() for better readability - add comment explaining why a formally invalid size it also being accepted - check erst_tab->header.length before even looking at erst_tab->header_length - prefer sizeof(*erst_tab) over sizeof(struct acpi_table_erst) Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com> Revision graph left in /home/xc_osstest/results/bisect.xen-unstable.test-amd64-i386-qemuu-rhel6hvm-amd.xen-boot.{dot,ps,png,html}. ---------------------------------------- 13984: tolerable ALL FAIL flight 13984 xen-unstable real-bisect [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/13984/ Failures :-/ but no regressions. Tests which did not succeed, including tests which could not be run: test-amd64-i386-qemuu-rhel6hvm-amd 5 xen-boot fail baseline untested jobs: test-amd64-i386-qemuu-rhel6hvm-amd fail ------------------------------------------------------------ sg-report-flight on woking.cam.xci-test.com logs: /home/xc_osstest/logs images: /home/xc_osstest/images Logs, config files, etc. are available at http://www.chiark.greenend.org.uk/~xensrcts/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary
Jan Beulich
2012-Oct-17 07:41 UTC
Re: [xen-unstable bisection] complete test-amd64-i386-qemuu-rhel6hvm-amd
>>> On 17.10.12 at 07:24, xen.org <ian.jackson@eu.citrix.com> wrote: > *** Found and reproduced problem changeset *** > > Bug is in tree: xen http://xenbits.xen.org/hg/staging/xen-unstable.hg > Bug introduced: 4fc87c2f31a0 > Bug not present: c1c549c4fe9e > > > changeset: 26060:4fc87c2f31a0 > tag: tip > user: Huang Ying <ying.huang@intel.com> > date: Tue Oct 16 17:26:36 2012 +0200 > > ACPI: fix APEI related table size checking > > On Huang Ying''s machine: > > erst_tab->header_length == sizeof(struct acpi_table_einj) > > but Yinghai reported that on his machine, > > erst_tab->header_length == sizeof(struct acpi_table_einj) - > sizeof(struct acpi_table_header) > > To make erst table size checking code works on all systems, both > testing are treated as PASS. > > Same situation applies to einj_tab->header_length, so corresponding > table size checking is changed in similar way too. > > Originally-by: Yinghai Lu <yinghai@kernel.org> > Signed-off-by: Huang Ying <ying.huang@intel.com> > > - use switch() for better readability > - add comment explaining why a formally invalid size it also being > accepted > - check erst_tab->header.length before even looking at > erst_tab->header_length > - prefer sizeof(*erst_tab) over sizeof(struct acpi_table_erst) > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > Acked-by: Keir Fraser <keir@xen.org> > Committed-by: Jan Beulich <jbeulich@suse.com>So I just now realized that a similar change was already done by 23736:31683aa4bfb3 and then reverted by 23760:ae10d7804168. Nothing was done subsequently to address the actual problem(s). It is quite obvious that the more relaxed check uncovers other bugs in the ERST code, yet looking at the Linux history of the corresponding file doesn''t reveal any fix the lack of which would explain an outright hang (rather than a crash, as I would expect to be the result of e.g. the missing ioremap()-s added by 0bbba38a61283a55f2061ab3e0910c572d19f462. Most of the other changes are cosmetic or pstore related, so I wonder whether instead of reverting again we should try pulling in this one extra fix. If reverting is preferred (or turns out necessary if that second fix doesn''t help), we should settle on the disposition of the whole APEI/ERST code, as my conclusion is that it is pretty much unmaintained since its original contribution over two years ago. Jan
Ian Campbell
2012-Oct-17 09:02 UTC
Re: [xen-unstable bisection] complete test-amd64-i386-qemuu-rhel6hvm-amd
On Wed, 2012-10-17 at 08:41 +0100, Jan Beulich wrote:> So I just now realized that a similar change was already done > by 23736:31683aa4bfb3 and then reverted by > 23760:ae10d7804168. Nothing was done subsequently to > address the actual problem(s). It is quite obvious that the more > relaxed check uncovers other bugs in the ERST code, yet looking > at the Linux history of the corresponding file doesn''t reveal any > fix the lack of which would explain an outright hang (rather than > a crash, as I would expect to be the result of e.g. the missing > ioremap()-s added by 0bbba38a61283a55f2061ab3e0910c572d19f462. > > Most of the other changes are cosmetic or pstore related, so I > wonder whether instead of reverting again we should try pulling > in this one extra fix.Worth a go. I assume this issue isn''t related to the typo you found this morning?> If reverting is preferred (or turns out necessary if that second > fix doesn''t help), we should settle on the disposition of the whole > APEI/ERST code, as my conclusion is that it is pretty much > unmaintained since its original contribution over two years ago.If we revert we should leave a big fat comment pointing to this and/or the previous discussion to stop the next guy doing the same thing. The comment in 26060:4fc87c2f31a0 seems to suggest that the issue is (possibly) only a problem non-production or early machines, if that''s the case it doesn''t seem worth worrying about ERST not functioning on those Assuming the system works generally I think the world can live without the ERST facility being available on them. Ian.
Keir Fraser
2012-Oct-17 09:04 UTC
Re: [xen-unstable bisection] complete test-amd64-i386-qemuu-rhel6hvm-amd
On 17/10/2012 08:41, "Jan Beulich" <JBeulich@suse.com> wrote:> So I just now realized that a similar change was already done > by 23736:31683aa4bfb3 and then reverted by > 23760:ae10d7804168. Nothing was done subsequently to > address the actual problem(s). It is quite obvious that the more > relaxed check uncovers other bugs in the ERST code, yet looking > at the Linux history of the corresponding file doesn''t reveal any > fix the lack of which would explain an outright hang (rather than > a crash, as I would expect to be the result of e.g. the missing > ioremap()-s added by 0bbba38a61283a55f2061ab3e0910c572d19f462. > > Most of the other changes are cosmetic or pstore related, so I > wonder whether instead of reverting again we should try pulling > in this one extra fix. > > If reverting is preferred (or turns out necessary if that second > fix doesn''t help), we should settle on the disposition of the whole > APEI/ERST code, as my conclusion is that it is pretty much > unmaintained since its original contribution over two years ago.Perhaps we reverted before because we decided that this advertised table size was a bug, and handling it just opened us to other bugs lurking in the table? -- Keir> Jan >
Jan Beulich
2012-Oct-17 09:18 UTC
Re: [xen-unstable bisection] complete test-amd64-i386-qemuu-rhel6hvm-amd
>>> On 17.10.12 at 11:02, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Wed, 2012-10-17 at 08:41 +0100, Jan Beulich wrote: >> So I just now realized that a similar change was already done >> by 23736:31683aa4bfb3 and then reverted by >> 23760:ae10d7804168. Nothing was done subsequently to >> address the actual problem(s). It is quite obvious that the more >> relaxed check uncovers other bugs in the ERST code, yet looking >> at the Linux history of the corresponding file doesn''t reveal any >> fix the lack of which would explain an outright hang (rather than >> a crash, as I would expect to be the result of e.g. the missing >> ioremap()-s added by 0bbba38a61283a55f2061ab3e0910c572d19f462. >> >> Most of the other changes are cosmetic or pstore related, so I >> wonder whether instead of reverting again we should try pulling >> in this one extra fix. > > Worth a go. I assume this issue isn''t related to the typo you found this > morning?I didn''t find any typo; the original author of the Linux side patch pointed out a typo in the commit message (which was copied verbatim from the Linux side).>> If reverting is preferred (or turns out necessary if that second >> fix doesn''t help), we should settle on the disposition of the whole >> APEI/ERST code, as my conclusion is that it is pretty much >> unmaintained since its original contribution over two years ago. > > If we revert we should leave a big fat comment pointing to this and/or > the previous discussion to stop the next guy doing the same thing.We should rather decide whether this should get fixed, or the code ripped out (decision mainly depending on the original contributer - Intel - being willing to fix the code).> The comment in 26060:4fc87c2f31a0 seems to suggest that the issue is > (possibly) only a problem non-production or early machines, if that''s > the case it doesn''t seem worth worrying about ERST not functioning on > those Assuming the system works generally I think the world can live > without the ERST facility being available on them.It''s actually the other way around - prior to the patch, to code accepted only the value seen on some early systems, whereas the patch gets it in line with the spec, but still permits the wrong value to be used. Which implies that the AMD systems the tester uses use the correct value (and hangs because we now don''t reject that anymore). Jan
Jan Beulich
2012-Oct-17 09:21 UTC
Re: [xen-unstable bisection] complete test-amd64-i386-qemuu-rhel6hvm-amd
>>> On 17.10.12 at 11:04, Keir Fraser <keir@xen.org> wrote: > On 17/10/2012 08:41, "Jan Beulich" <JBeulich@suse.com> wrote: > >> So I just now realized that a similar change was already done >> by 23736:31683aa4bfb3 and then reverted by >> 23760:ae10d7804168. Nothing was done subsequently to >> address the actual problem(s). It is quite obvious that the more >> relaxed check uncovers other bugs in the ERST code, yet looking >> at the Linux history of the corresponding file doesn''t reveal any >> fix the lack of which would explain an outright hang (rather than >> a crash, as I would expect to be the result of e.g. the missing >> ioremap()-s added by 0bbba38a61283a55f2061ab3e0910c572d19f462. >> >> Most of the other changes are cosmetic or pstore related, so I >> wonder whether instead of reverting again we should try pulling >> in this one extra fix. >> >> If reverting is preferred (or turns out necessary if that second >> fix doesn''t help), we should settle on the disposition of the whole >> APEI/ERST code, as my conclusion is that it is pretty much >> unmaintained since its original contribution over two years ago. > > Perhaps we reverted before because we decided that this advertised table > size was a bug, and handling it just opened us to other bugs lurking in the > table?As just said in the response to Ian - originally, only a spec violating vale was accepted here, whereas with the change we also accept the proper value. The patch itself was prompted by a bug report we got that on a spec conformant system the "ERST table is invalid" message was seen. Jan
Keir Fraser
2012-Oct-17 09:33 UTC
Re: [xen-unstable bisection] complete test-amd64-i386-qemuu-rhel6hvm-amd
On 17/10/2012 10:21, "Jan Beulich" <JBeulich@suse.com> wrote:>> Perhaps we reverted before because we decided that this advertised table >> size was a bug, and handling it just opened us to other bugs lurking in the >> table? > > As just said in the response to Ian - originally, only a spec violating > vale was accepted here, whereas with the change we also accept > the proper value. The patch itself was prompted by a bug report > we got that on a spec conformant system the "ERST table is invalid" > message was seen.Oh dear! K.
Liu, Jinsong
2012-Oct-17 13:04 UTC
Re: [xen-unstable bisection] complete test-amd64-i386-qemuu-rhel6hvm-amd
Jan Beulich wrote:>>>> On 17.10.12 at 11:02, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> On Wed, 2012-10-17 at 08:41 +0100, Jan Beulich wrote: >>> So I just now realized that a similar change was already done >>> by 23736:31683aa4bfb3 and then reverted by >>> 23760:ae10d7804168. Nothing was done subsequently to >>> address the actual problem(s). It is quite obvious that the more >>> relaxed check uncovers other bugs in the ERST code, yet looking >>> at the Linux history of the corresponding file doesn''t reveal any >>> fix the lack of which would explain an outright hang (rather than >>> a crash, as I would expect to be the result of e.g. the missing >>> ioremap()-s added by 0bbba38a61283a55f2061ab3e0910c572d19f462. >>> >>> Most of the other changes are cosmetic or pstore related, so I >>> wonder whether instead of reverting again we should try pulling >>> in this one extra fix. >> >> Worth a go. I assume this issue isn''t related to the typo you found >> this morning? > > I didn''t find any typo; the original author of the Linux side patch > pointed out a typo in the commit message (which was copied > verbatim from the Linux side). > >>> If reverting is preferred (or turns out necessary if that second >>> fix doesn''t help), we should settle on the disposition of the whole >>> APEI/ERST code, as my conclusion is that it is pretty much >>> unmaintained since its original contribution over two years ago. >> >> If we revert we should leave a big fat comment pointing to this >> and/or the previous discussion to stop the next guy doing the same >> thing. > > We should rather decide whether this should get fixed, or the > code ripped out (decision mainly depending on the original > contributer - Intel - being willing to fix the code).I just checked the history. 1. It firstly checked in as c/s 22052 --> at that time it tested at an old intel platform (I forget type); 2. then it updated with c/s 23736 --> since when our QA tested at a newer intel platform, it cannot pass erst_check_table, after debug we found the newer intel platform bios has different header length definiation at its ERST table, so we allow both size table work; 3. then it reverted with c/s 23760 --> since when test at an AMD platform by Critix (see attached, complete test-amd64-i386-xl), it will hang, so Keir revert it. c/s 22052 and 23736 were tested at 2 intel platforms, but we have no way to test at AMD platform. So the problem is, how to debug&fix it? Thanks, Jinsong _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel