Ram Yalamanchili
2006-Mar-11 01:46 UTC
[syslinux] mboot.c32, weird e820 map on HP blade machine, possible memory corruption
I'm seeing this on a HP Blade and i'm not sure why this is happning since the nature of the issue is so wierd. I compiled mboot.c with a DEBUG defined in the mboot.c file. In the funciton init_mmap(), it prints the e820 map and on the HP blade this map values come out to be totally random. Some weird numbers which dont make any sense at all. However, if i add a while(1); or a exit(1); at the end of init_mmap() function, I magically see the values of the e820 printed correctly! its very weird why this may happen, and I haven't changed anything else in mboot.c from syslinux-2.11 code. Can someone think of anything? Another thing is, the e820 buffer should be zeroed out since some bios'es are buggy and dont overwrite the high doubleword of Length field of the AddrRangeDesc. This is seen on the Dell Poweredge1800's. So it should look like: while(((void *)(e820 + 1)) < __com32.cs_bounce + __com32.cs_bounce_size) { memset(e820, 0, sizeof(*e820)); e820->size = sizeof(*e820) - sizeof(e820->size); .... hope Tim, the maintainer can take care of this. -- Ram
Tim Deegan
2006-Mar-13 12:19 UTC
[syslinux] mboot.c32, weird e820 map on HP blade machine, possible memory corruption
Hi, Thanks for pointing out the memset thing: I've rolled that change into the patch at http://www.cl.cam.ac.uk/~tjd21/tmp/shtab.patch On Sat, Mar 11, 2006 at 12:02:39PM -0800, syslinux-request at zytor.com wrote:> I compiled mboot.c with a DEBUG defined in the mboot.c file. In the funciton > init_mmap(), it prints the e820 map and on the HP blade this map values come > out to be totally random. Some weird numbers which dont make any sense at > all.Urgh. Does it then pass a correct mmap to the kernel or is that corrupt too? Does compiling with -O0 instead of -Os make any difference? Tim. -- Tim Deegan (My opinions, not the University's) Systems Research Group University of Cambridge Computer Laboratory
Ram Yalamanchili
2006-Mar-14 00:13 UTC
[syslinux] Re: mboot.c32, weird e820 map on HP blade machine, possible memory corruption
Hi Tim, Ok cool. I dont see the memset change in your link for the patch. I tried with -O0 and still see discrepancies. The data passed to the kernel seems to be all different from the printf's. I put a memset as I suggested, and the printf's give 0x0 - 0x0 for all memory region results printed by mboot's debugging information. I have some printf's inside my kernel code, which show the data actually being passed is different from the mboot's printfs: 0x0 - 0x9f40000000000 0xffffffffffffffff - 0xfffffffffffffffe 0xffffffffffffffff - 0xfffffffffffffffe 0xffffffffffffffff - 0xfffffffffffffffe 0xffffffffffffffff - 0xfffffffffffffffe 0xffffffffffffffff - 0xfffffffffffffffe>From grub, the correct data is supposed to be:0x0 - 0x9e 0x9f400 - 0xa0000 0xf0000 - 0x100000 0x100 - 0x1fff9 0x1fffa000 - 0x20000000 0xfec00000 - 0xfec10000 0xfee00000 - 0xfee10000 0xffc00000 - 0x100000000 thanks, Ram> Message: 1 > Date: Mon, 13 Mar 2006 12:19:58 +0000 > From: Tim Deegan <Tim.Deegan at cl.cam.ac.uk> > Subject: Re: [syslinux] mboot.c32, weird e820 map on HP blade machine, > possible memory corruption > To: syslinux at zytor.com > Message-ID: <20060313121958.GB4133 at tyer.cl.cam.ac.uk> > Content-Type: text/plain; charset=iso-8859-1 > > Hi, > > Thanks for pointing out the memset thing: I've rolled that change into > the patch at http://www.cl.cam.ac.uk/~tjd21/tmp/shtab.patch > > On Sat, Mar 11, 2006 at 12:02:39PM -0800, syslinux-request at zytor.com > wrote: > > I compiled mboot.c with a DEBUG defined in the mboot.c file. In the > funciton > > init_mmap(), it prints the e820 map and on the HP blade this map values > come > > out to be totally random. Some weird numbers which dont make any sense > at > > all. > > Urgh. Does it then pass a correct mmap to the kernel or is that corrupt > too? Does compiling with -O0 instead of -Os make any difference? > > Tim. > > -- > Tim Deegan (My opinions, not the University's) > Systems Research Group > University of Cambridge Computer Laboratory >
Ram Yalamanchili
2006-Mar-14 22:32 UTC
[syslinux] Re: mboot.c32, weird e820 map on HP blade machine, possible memory corruption
On further investigation I find this might be related to __intcall or some other part of the code because registers aren't properly restored. I tried compiling with gcc 4.0.2 with -O0, and -Os and both dont work. They show random addresses on the blade. However when i use -O1, it shows the right memory map, but it wont boot. It just hangs up at Booting: MBI=0x... I tried with gcc 2.96 and -O0, -O1, and -Os are useless when I have DEBUG undefined. However, when DEBUG is defined, -O0 works and the rest dont. I'm guessing something in init_mmap isn't safely setting up the registers, but i'm not totally sure. -- Ram
Tim Deegan
2006-Mar-15 10:01 UTC
[syslinux] Re: mboot.c32, weird e820 map on HP blade machine, possible memory corruption
> Ok cool. I dont see the memset change in your link for the patch.Oops! Forgot to push it to the right place; apologies. Fixed. The rest of this bug is confusing me. I'm pretty busy right now but I'll hopefully have some time next week to look at it properly. Did you say the memory map is wrong only on the debug build or always?> From grub, the correct data is supposed to be: > > 0x0 - 0x9e > 0x9f400 - 0xa0000 > 0xf0000 - 0x100000 > 0x100 - 0x1fff9 > 0x1fffa000 - 0x20000000 > 0xfec00000 - 0xfec10000 > 0xfee00000 - 0xfee10000 > 0xffc00000 - 0x100000000That's a pretty strange memory map. What are the memory types for those ranges? I've never worked with blades so maybe I'm missing something, but if there's no memory at 1MB, how does the com32 module work at all? If Linux boots on this machine, can you grab the BIOS-e820 lines from dmesg()? Tim. -- Tim Deegan (My opinions, not the University's) Systems Research Group University of Cambridge Computer Laboratory
Ram Yalamanchili
2006-Mar-15 20:47 UTC
[syslinux] Re: mboot.c32, weird e820 map on HP blade machine, possible memory corruption
Tim, I guess my suspicion was right :) This seems to be a BIOS bug. memset of regs_in fixed the problem and boots the blade. Thanks for your suggestion. Please add this into the patch as well. One more thing is, do you prefer the displaying of some numbers instead of .'s while loading the file from network? I modified the code so i can see the size of files instead, however this shows the zcat extracted size and not what we got from the network (compressed size). I just like to see something more informative than .'s :) --- ../../../syslinux-3.11/com32/modules/mboot.c 2006-03-10 16:14:27.000040000 -0800 +++ mboot.c 2006-03-15 11:39:53.000197000 -0800 @@ -476,8 +478,8 @@ } while (next_load_addr + LOAD_CHUNK <= section_addr) { + printf("\rLoading %s: ", filename); bsize = gzread(fp, next_load_addr, LOAD_CHUNK); - printf("%s","."); if (bsize < 0) { printf("\nFatal: read error in %s\n", filename); @@ -489,16 +491,16 @@ sizep[0] += bsize; if (bsize < LOAD_CHUNK) { - printf("%s","\n"); + printf("%8d KB\n", (int)(next_load_addr - start)/1024); gzclose(fp); return; } + printf("%8d KB", (int)(next_load_addr - start)/1024); } /* Running out of memory. Try and use up the last bit */ if (section_addr > next_load_addr) { bsize = gzread(fp, next_load_addr, section_addr - next_load_addr); - printf("%s","."); } else { bsize = 0; } @@ -512,6 +514,8 @@ next_load_addr += bsize; sizep[0] += bsize; + printf("\rLoading %s: %8d KB", filename, (int)(next_load_addr - start)/1024); + if (!gzeof(fp)) { gzclose(fp); printf("\nFatal: out of memory reading %s\n", filename);> Date: Wed, 15 Mar 2006 10:24:14 +0000 > From: Tim Deegan <Tim.Deegan at cl.cam.ac.uk> > Subject: Re: [syslinux] Re: mboot.c32, weird e820 map on HP blade > machine, possible memory corruption > To: syslinux at zytor.com > Message-ID: <20060315102414.GC32024 at tyer.cl.cam.ac.uk> > Content-Type: text/plain; charset=iso-8859-1 > > Hi Ram, > > Just saw your latest message about optimization levels. Does explicitly > clearing the registers help? Shouldn't make a difference, since es:edi > is the only address we pass, but... > > --- old/mboot.c 2006-03-15 10:12:43.000000000 +0000 > +++ ./mboot.c 2006-03-15 10:13:21.000000000 +0000 > @@ -359,6 +359,7 @@ > while(((void *)(e820 + 1)) < __com32.cs_bounce + > __com32.cs_bounce_size) > { > memset(e820, 0, sizeof (*e820)); > + memset(®s_in, 0, sizeof regs_in); > e820->size = sizeof(*e820) - sizeof(e820->size); > > /* Ask the BIOS to fill in this descriptor */ > > > > ------------------------------ > > _______________________________________________ > SYSLINUX mailing list > Submissions to SYSLINUX at zytor.com > Unsubscribe or set options at: > http://www.zytor.com/mailman/listinfo/syslinux > Please do not send private replies to mailing list traffic. > > > End of SYSLINUX Digest, Vol 36, Issue 13 > **************************************** >