I think I''ve narrowed down my random crashes to Xen 4.0.0 rc2 causing memory corruption in domUs. Xen 3.4.2 is ok, and it doesn''t seem to matter which domU or dom0 kernel I''m using. I have two domUs, both PV * 2.6.32.7 from kernel.org * 2.6.18-xen Everything is 64-bit. The random segfaults I was getting looked like bad RAM, but memtest86 didn''t report any problems. So I ran memtester (http://pyropus.ca/software/memtester/), a userspace mem test app. When I ran memtester on the dom0 (assigning all available RAM to the dom0) there was no problem. However, when I ran it against either of the domUs, it reported problems with RAM. See memtester output below. The corruption starts at around address 0x02404900. Uninstalling 4.0 rc2 and installing 3.4.2 got rid of the problem, so it looks like 4.0.0 rc2 could be the problem, rather than any of the kernels or the hardware. I''m still running pvops dom0 (2.6.31.6) so I don''t think that''s the problem either. Thanks, Yasir -----Running on 2.6.18-xen domU---- # memtester 1525m memtester version 4.0.8 (64-bit) Copyright (C) 2007 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffffffffffff000 want 1525MB (1599078400 bytes) got 1525MB (1599078400 bytes), trying mlock ...locked. Loop 1: Stuck Address : testing 3FAILURE: possible bad address line at offset 0x08658810. Skipping to next test... Random Value : FAILURE: 0x7ebab5c6ebdfd4d4 != 0x00000288 at offset 0x0270890f. FAILURE: 0x73f971b85f2fe49d != 0x12ead09de82 at offset 0x02708910. FAILURE: 0xb3674c0877ffb947 != 0x6111300f17 at offset 0x02708911. FAILURE: 0x95e7124bbf1d1c24 != 0xffadfffb19 at offset 0x02708912. FAILURE: 0x91452c1930276f10 != 0xefff99dfdbf8b94c at offset 0x0270890f. FAILURE: 0x9c06e86784d75f59 != 0xefff98f176f16546 at offset 0x02708910. FAILURE: 0x5c98d5d7ac070283 != 0xefff99becac8b4d3 at offset 0x02708911. FAILURE: 0x7a188b9464e5a7e0 != 0xefff9920760740dd at offset 0x02708912. Compare XOR : FAILURE: 0x514dfa00b0300dd6 != 0xb00867c75c015812 at offset 0x0270890f. FAILURE: 0x5c0fb64f04dffe1f != 0xb00866d8f6fa040c at offset 0x02708910. FAILURE: 0x1ca1a3bf2c0fa149 != 0xb00867a64ad15399 at offset 0x02708911. FAILURE: 0x3a21597be4ee46a6 != 0xb0086707f60fdfa3 at offset 0x02708912. Compare SUB : FAILURE: 0x64944c45c77934d6 != 0xdd3a6f89990ab512 at offset 0x0270890f. FAILURE: 0x889400266c2e739f != 0x4ca230955702420c at offset 0x02708910. FAILURE: 0x4ad3b21f6fa76fc9 != 0xbfa0e02b1c4a6a19 at offset 0x02708911. FAILURE: 0xa3605c44d864f5a6 != 0xcba3b7c32db57f23 at offset 0x02708912. Compare MUL : FAILURE: 0x00000000 != 0x00000290 at offset 0x0270890f. FAILURE: 0x00000000 != 0x1316a9d9d6a at offset 0x02708910. FAILURE: 0x00000000 != 0x61ff9ea28c at offset 0x02708911. FAILURE: 0x00000000 != 0xffadfffb19 at offset 0x02708912. Compare DIV : FAILURE: 0x3eefb02777fe3231 != 0x3eefb02777fe32b1 at offset 0x0270890f. FAILURE: 0x3eefb02777fe3231 != 0x3eefb1377fffbf7b at offset 0x02708910. FAILURE: 0x3eefb02777fe3231 != 0x3eefb067fffeb2bd at offset 0x02708911. FAILURE: 0x3eefb02777fe3231 != 0x3eefb0fffffffb39 at offset 0x02708912. Compare OR : FAILURE: 0x3ecf0003073a3001 != 0x3ecf0003073a3081 at offset 0x0270890f. FAILURE: 0x3ecf0003073a3001 != 0x3ecf0113073b3409 at offset 0x02708910. FAILURE: 0x3ecf0003073a3001 != 0x3ecf0003873a3089 at offset 0x02708911. FAILURE: 0x3ecf0003073a3001 != 0x3ecf0093873b3009 at offset 0x02708912. Compare AND : Sequential Increment: ok Solid Bits : testing 15FAILURE: 0xffffffffffffffff != 0x000002a8 at offset 0x02708917. FAILURE: 0x00000000 != 0x139a35f9abc at offset 0x02708918. FAILURE: 0xffffffffffffffff != 0x64caeca891 at offset 0x02708919. FAILURE: 0x00000000 != 0xffadfffb19 at offset 0x0270891a. Block Sequential : testing 12FAILURE: 0xc0c0c0c0c0c0c0c != 0x000002b0 at offset 0x02708917. FAILURE: 0xc0c0c0c0c0c0c0c != 0x13f1e8ef754 at offset 0x02708918. FAILURE: 0xc0c0c0c0c0c0c0c != 0x66a7cc7c85 at offset 0x02708919. FAILURE: 0xc0c0c0c0c0c0c0c != 0xffadfffb19 at offset 0x0270891a. Checkerboard : testing 16 <<rest was cut>> -----Running on 2.6.32.7 domU # memtester 1525m memtester version 4.1.2 (64-bit) Copyright (C) 2009 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xfffffffffffff000 want 1525MB (1599078400 bytes) got 1525MB (1599078400 bytes), trying mlock ...locked. Loop 1: Stuck Address : testing 2FAILURE: possible bad address line at offset 0x08354808. Skipping to next test... Random Value : FAILURE: 0xfe763568effdc436 != 0x00000692 at offset 0x02404907. FAILURE: 0xb7e4008dffa750f0 != 0x205cffa1a92 at offset 0x02404908. FAILURE: 0xbe7f20834ff96726 != 0xaa2e7aeeb5 at offset 0x02404909. FAILURE: 0x555fa9bc6eb765e6 != 0xffadfffb19 at offset 0x0240490a. FAILURE: 0xddff29736a3404ad != 0x00000416 at offset 0x0240490f. FAILURE: 0x5ddf8516eff903a7 != 0x203c1cb9906 at offset 0x02404910. FAILURE: 0x7eeef1fabff7b112 != 0xa97ba81a3d at offset 0x02404911. FAILURE: 0x3ffe6c115b1cffeb != 0xffadfffb19 at offset 0x02404912. FAILURE: 0x6fef4dc4dadf373a != 0x00000420 at offset 0x02404917. FAILURE: 0x7ae29b8bddbf7873 != 0x2052094bf4a at offset 0x02404918. FAILURE: 0x2b95a5b07ff9293c != 0xa9f2df2514 at offset 0x02404919. FAILURE: 0x7baf407d3fd722cd != 0xffadfffb19 at offset 0x0240491a. FAILURE: 0x1189acb734057ff2 != 0xefff99dfdbf8bd50 at offset 0x02404907. FAILURE: 0x581b9952245feb34 != 0xefff9bd9a4a777ea at offset 0x02404908. FAILURE: 0x5180b95c9401dce2 != 0xefff9975b1ee6e43 at offset 0x02404909. FAILURE: 0xbaa03063b54fde22 != 0xefff9920760740dd at offset 0x0240490a. FAILURE: 0x3200b0acb1ccbf69 != 0xefff99dfdbf8bfd2 at offset 0x0240490f. FAILURE: 0xb2201cc93401b863 != 0xefff9bdc1a3322c2 at offset 0x02404910. FAILURE: 0x91116825640f0ad6 != 0xefff9976a050a1f9 at offset 0x02404911. FAILURE: 0xd001f5ce80e4442f != 0xefff9920760740dd at offset 0x02404912. FAILURE: 0x8010d41b01278cfe != 0xefff99dfdbf8bfe4 at offset 0x02404917. FAILURE: 0x951d02540647c3b7 != 0xefff9bdafb6c048e at offset 0x02404918. FAILURE: 0xc46a3c6fa40192f8 != 0xefff997629279ed0 at offset 0x02404919. FAILURE: 0x9450d9a2e42f9909 != 0xefff9920760740dd at offset 0x0240491a. Compare XOR : FAILURE: 0xd1927a9eb40e1eb8 != 0xb00867c75c015c16 at offset 0x02404907. FAILURE: 0x18246739a46889fa != 0xb00869c124b016b0 at offset 0x02404908. FAILURE: 0x11898744140a7ba8 != 0xb008675d31f70d09 at offset 0x02404909. FAILURE: 0x7aa8fe4b35587ce8 != 0xb0086707f60fdfa3 at offset 0x0240490a. FAILURE: 0xf2097e9431d55e2f != 0xb00867c75c015e98 at offset 0x0240490f. FAILURE: 0x7228eab0b40a5729 != 0xb00869c39a3bc188 at offset 0x02404910. FAILURE: 0x511a360ce417a99c != 0xb008675e205940bf at offset 0x02404911. FAILURE: 0x900ac3b600ece2f5 != 0xb0086707f60fdfa3 at offset 0x02404912. FAILURE: 0x4019a20281302bc4 != 0xb00867c75c015eaa at offset 0x02404917. FAILURE: 0x5525d03b8650627d != 0xb00869c27b74a354 at offset 0x02404918. FAILURE: 0x84730a57240a31be != 0xb008675da9303d96 at offset 0x02404919. FAILURE: 0x5459a78a643837cf != 0xb0086707f60fdfa3 at offset 0x0240491a. Compare SUB : FAILURE: 0x81d8c2d8ecce2ab8 != 0xb591ec765a590d96 at offset 0x02404907. FAILURE: 0xa5892ba2f4626afa != 0xe44a1e4db26d65ac at offset 0x02404908. FAILURE: 0x861092b66dad5fa8 != 0x1def84765e9f06b2 at offset 0x02404909. FAILURE: 0xe02697fa97cc80e8 != 0xe3a7975bedefd199 at offset 0x0240490a. FAILURE: 0xcd3ba57827117baf != 0xabcc7e3273e61a98 at offset 0x0240490f. FAILURE: 0x9ff88dc50efdd5a9 != 0xa42594f88d225588 at offset 0x02404910. FAILURE: 0xe3d51711bfdf4f9c != 0x3bbfd09c2f65463f at offset 0x02404911. FAILURE: 0xd5919e36ee05ff75 != 0xcba3b7c32db57f23 at offset 0x02404912. FAILURE: 0x9e6d2ff1f4a6f5c4 != 0x8fac57b9a35377aa at offset 0x02404917. FAILURE: 0x9b39e51123fb12fd != 0xc5b35b06aae7d554 at offset 0x02404918. FAILURE: 0xff6c457e67d0dcbe != 0xdb13c471bb7ac496 at offset 0x02404919. FAILURE: 0xfec3b8151193654f != 0xcba3b7c32db57f23 at offset 0x0240491a. FAILURE: 0x00000000 != 0x2088d8ebe4a at offset 0x02404908. FAILURE: 0x00000000 != 0xab1ce9d031 at offset 0x02404909. FAILURE: 0x00000000 != 0xffadfffb19 at offset 0x0240490a. FAILURE: 0x00000001 != 0x00000000 at offset 0x02404919. FAILURE: 0x00000001 != 0x00000000 at offset 0x0240491a. Compare DIV : FAILURE: 0x3eefb02777fe3231 != 0x3eefb02777fe36bb at offset 0x02404907. FAILURE: 0x3eefb02777fe3231 != 0x3eefb22ffffebe7b at offset 0x02404908. FAILURE: 0x3eefb02777fe3231 != 0x3eefb0af7ffff231 at offset 0x02404909. FAILURE: 0x3eefb02777fe3231 != 0x3eefb0fffffffb39 at offset 0x0240490a. <<rest was cut>> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2010-Feb-05 16:37 UTC
Re: [Xen-devel] Xen 4 rc2: pv domU memory corruption
On Fri, Feb 05, 2010 at 04:42:16PM +1100, Yasir Assam wrote:> I think I''ve narrowed down my random crashes to Xen 4.0.0 rc2 causing > memory corruption in domUs. > > Xen 3.4.2 is ok, and it doesn''t seem to matter which domU or dom0 kernel > I''m using. > > I have two domUs, both PV > > * 2.6.32.7 from kernel.org > * 2.6.18-xen > > Everything is 64-bit. > > The random segfaults I was getting looked like bad RAM, but memtest86These segfaults - were they in dom0 or domU? What CPU do you have? AMD/Intel? Did you try to swap the memory modules when you were trying to narrow this down?> didn''t report any problems. So I ran memtester > (http://pyropus.ca/software/memtester/), a userspace mem test app. > > When I ran memtester on the dom0 (assigning all available RAM to the > dom0) there was no problem. However, when I ran it against either of the > domUs, it reported problems with RAM. See memtester output below. The > corruption starts at around address 0x02404900. > > Uninstalling 4.0 rc2 and installing 3.4.2 got rid of the problem, so it > looks like 4.0.0 rc2 could be the problem, rather than any of the > kernels or the hardware.Did you compile 4.0.0-rc2 with debug=y? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The seg faults are only in domU - dom0 is fine (as I said, I ran memtester in dom0 with no problems). Keir Fraser asked me to try passing "no-tmem" as a boot parameter to the hypervisor but that didn''t stop the memory corruption. I only have one memory module (nothing to swap) and given that it''s fine under Xen 3.4.2 it looks likely to be a software-only problem (esp. as repeated memtest86 runs yield no faults). I have an Intel Core i3 530, which has "onboard" graphics (which I use). I also have the Asus P7H55M-Pro motherboard. Thanks, Yasir> On Fri, Feb 05, 2010 at 04:42:16PM +1100, Yasir Assam wrote: > >> I think I''ve narrowed down my random crashes to Xen 4.0.0 rc2 causing >> memory corruption in domUs. >> >> Xen 3.4.2 is ok, and it doesn''t seem to matter which domU or dom0 kernel >> I''m using. >> >> I have two domUs, both PV >> >> * 2.6.32.7 from kernel.org >> * 2.6.18-xen >> >> Everything is 64-bit. >> >> The random segfaults I was getting looked like bad RAM, but memtest86 >> > These segfaults - were they in dom0 or domU? What CPU do you have? > AMD/Intel? > > Did you try to swap the memory modules when you were trying to narrow > this down? > > >> didn''t report any problems. So I ran memtester >> (http://pyropus.ca/software/memtester/), a userspace mem test app. >> >> When I ran memtester on the dom0 (assigning all available RAM to the >> dom0) there was no problem. However, when I ran it against either of the >> domUs, it reported problems with RAM. See memtester output below. The >> corruption starts at around address 0x02404900. >> >> Uninstalling 4.0 rc2 and installing 3.4.2 got rid of the problem, so it >> looks like 4.0.0 rc2 could be the problem, rather than any of the >> kernels or the hardware. >> > Did you compile 4.0.0-rc2 with debug=y? > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 9.0.733 / Virus Database: 271.1.1/2668 - Release Date: 02/05/10 06:35:00 > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Can you post your domU config? -- Keir On 06/02/2010 04:25, "Yasir Assam" <mail@endlessvoid.com> wrote:> The seg faults are only in domU - dom0 is fine (as I said, I ran memtester in > dom0 with no problems). > > Keir Fraser asked me to try passing "no-tmem" as a boot parameter to the > hypervisor but that didn''t stop the memory corruption. > > I only have one memory module (nothing to swap) and given that it''s fine under > Xen 3.4.2 it looks likely to be a software-only problem (esp. as repeated > memtest86 runs yield no faults). > > I have an Intel Core i3 530, which has "onboard" graphics (which I use). I > also have the Asus P7H55M-Pro motherboard. > > Thanks, > Yasir >> >> On Fri, Feb 05, 2010 at 04:42:16PM +1100, Yasir Assam wrote: >> >> >>> >>> I think I''ve narrowed down my random crashes to Xen 4.0.0 rc2 causing >>> memory corruption in domUs. >>> >>> Xen 3.4.2 is ok, and it doesn''t seem to matter which domU or dom0 kernel >>> I''m using. >>> >>> I have two domUs, both PV >>> >>> * 2.6.32.7 from kernel.org >>> * 2.6.18-xen >>> >>> Everything is 64-bit. >>> >>> The random segfaults I was getting looked like bad RAM, but memtest86 >>> >>> >> >> >> These segfaults - were they in dom0 or domU? What CPU do you have? >> AMD/Intel? >> >> Did you try to swap the memory modules when you were trying to narrow >> this down? >> >> >> >>> >>> didn''t report any problems. So I ran memtester >>> (http://pyropus.ca/software/memtester/), a userspace mem test app. >>> >>> When I ran memtester on the dom0 (assigning all available RAM to the >>> dom0) there was no problem. However, when I ran it against either of the >>> domUs, it reported problems with RAM. See memtester output below. The >>> corruption starts at around address 0x02404900. >>> >>> Uninstalling 4.0 rc2 and installing 3.4.2 got rid of the problem, so it >>> looks like 4.0.0 rc2 could be the problem, rather than any of the >>> kernels or the hardware. >>> >>> >> >> >> Did you compile 4.0.0-rc2 with debug=y? >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel >> >> >> >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com <http://www.avg.com> >> Version: 9.0.733 / Virus Database: 271.1.1/2668 - Release Date: 02/05/10 >> 06:35:00 >> >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel