Patrick J. LoPresti
2004-Jan-26 19:21 UTC
[syslinux] Problem with FreeDOS + himem64 + PXELINUX + memdisk
(FreeDOS developers, I apologize for the redundant parts of this message. But I want to bring the SYSLINUX folks into the discussion, and the SourceForge mailing list archives are broken.) Background: I have a little Sourceforge project (http://unattended.sourceforge.net/) for which I use SYSLINUX to provide CD-ROM and PXE boot support for my boot disk. And it works great with MS-DOS. However, I want to use FreeDOS for the boot disk. This is very close to working with the latest releases of the FreeDOS kernel and utilities. I just have one blocking problem. I am trying to use himem64.exe, part of FreeDOS EMM386: http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/dos/emm386/ I am comparing it with fdxxms.sys, part of FDXMS: http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/dos/fdxms/ When I use PXELINUX+memdisk (version 2.08) to boot a trivial FreeDOS boot disk on my IBM Thinkpad T20, using "DEVICE=himem64.exe" in config.sys, the system spontaneously reboots after reading config.sys and printing this message: Kernel allocated 42 Diskbuffers = 22344 Bytes in HMA This problem only happens on the T20; it does not happen on my other test system (Dell Optiplex GX200). It does not happen if I use "DEVICE=fdxxms.sys ps" instead of himem64.exe. And it does not happen if I boot from a physical floppy instead of PXELINUX+memdisk. I cannot control which hardware my users have (unfortunately). I cannot just use "fdxxms.sys ps" because my testers report that it does not always work for them. So I would like to use himem64.exe. But there is apparently some incompatibility between himem64.exe and memdisk on this laptop. I would like to help fix this. My knowledge of x86 internals is limited, but I am pretty good at testing binaries and reporting back results... Eric Auer, a FreeDOS developer, commented: Note that MEMDISK access might fail while A20 is off - the handler itself is in 40:[13] allocated low memory, but the disk itself is in int 15 high memory as far as I remember. You should check the MEMDISK sources to see how disk data is accessed and what is done for the A20. Unfortunately, I do not know enough x86 assembly to understand the memdisk sources. Help? - Pat
Blaauw,Bernd B.
2004-Jan-26 19:58 UTC
[syslinux] Problem with FreeDOS + himem64 + PXELINUX + memdisk
Patrick, I've mentioned this also, but could not rule out it was a VMware problem. Now I can thanks to you (Bochs emulator is OK for Memdisk). I'm using Isolinux instead of PXELinux, btw. should not matter. Glad to see there's a reallife system. (are you using UNDI driver btw for PXE, instead of DOS network drivers?) could you leave everything identical on the bootdisk but use a MSDOS kernel on that machine? (dos 7.10 for example)? all HIMEM versions fail for me. FDXMS is OK. Again, only someone more knowledgeable can explain what's going on between the *fd kernel (http://freedos.sourceforge.net/kernel/kernel.tgz) *himem(http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/dos/emm386/) *fdxms(http://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/dos/fdxms/) *Syslinux(http://www.kernel.org/pub/linux/utils/boot/syslinux/) as HIMEM (and HIMEM64) are a spin-off from FDXMS, source code should not be too difficult to compare (but which versions?). My preference for himem above fdxms is the cooperation with emm386 (included in the same package as himem at above emm386 link). sorry I can't offer any help except testing possible patches once I receive a new binary. point-of-hangup is exactly the same as you mentioned. something with the HMA/A20. Bernd
Patrick J. LoPresti
2004-Jan-26 21:30 UTC
[syslinux] Re: Problem with FreeDOS + himem64 + PXELINUX + memdisk
Michael Devore <FreeDosStuff at devoresoftware.com> writes:> MEMDISK appears to give a great deal of printed feedback, based on > the printf()'s in the code. Is there a lot of information on the > screen from MEMDISK prior to the reboot? IF so, posting that > information here would be handy.Yes, MEMDISK prints a bunch of information before booting from the "virtual floppy". Here it is: =====================================================================MEMDISK 2.08 2003-12-12 Copyright 2001-2003 H. Peter Anvin e820: 0000000000000000 000000000009f800 1 e820: 000000000009f800 0000000000000800 2 e820: 00000000000e0000 0000000000020000 2 e820: 0000000000100000 000000000fef0000 1 e820: 000000000fff0000 000000000000ec00 3 e820: 000000000fffec00 0000000000001400 4 e820: 00000000fff80000 0000000000080000 2 Ramdisk at 0xfe60000, length 0x00168000 Command line: initrd=test.img keeppxe BOOT_IMAGE=memdisk Disk is floppy, 1440 K, C/H/S = 80/2/18 Total size needed = 1460 bytes, allocating 2K Old dos memory at 0x8fc00 (map says 0x9f800), loading at 0x8f400 1588: 0xffff 15E801: 0x3c00 0x0ee6 INT13 08: Success, count=1, BPT=f0000:9d36 old: int13=e1f74d1a int15=f000f859 new: int13=8f400008 int15=8f400272 Loading boot sector... booting... FreeDOS FAT Kernel [etc.] ===================================================================== Then I see the FreeDOS intro text, the option to press F5/F8, and ultimately the crash and reboot.> Also, if DOS Is loaded high and the XMS swapper shell is present, > try turning them off to eliminate potential side-effects and > reporting on the results.If I remove "DOS=HIGH", it works! Well, mostly. The keyboard eventually locks up. But it appears to work fine as long as I do not use the keyboard; the boot disk loads network drivers, maps a Windows share, runs cwsdpmi + DJGPP Perl... Unfortunately, I need DOS=HIGH because otherwise there is insufficient conventional memory to run winnt.exe (my ultimate goal).> A very preliminary look at MEMDISK seems to indicate that it > directly queries, accesses, and uses extended memory, operations > which could conflict with an extended memory manager. MEMDISK may > be fundamentally incompatible with HIMEM.EXE (the [64] part of HIMEM > is not an issue here). Of course it's hard to tell exactly what the > thing is doing just by quickly scanning the source code.So I am just getting lucky with fdxms and MS-DOS + himem.sys? Possible, I suppose.> However, if you could track down and communicate with the MEMDISK > author Peter Anvin, and he was amenable, he may be able to answer > critical questions in a matter of minutes and save a lot of work on > the part of others.HPA monitors syslinux at zytor.com, and at least one other reader has an interest in this. So I believe this thread is on-topic for both lists. Thank you for the reply! Let me know what else I can do. - Pat
Patrick J. LoPresti
2004-Jan-26 22:18 UTC
[syslinux] Re: Problem with FreeDOS + himem64 + PXELINUX + memdisk
The following reply was sent only to freedos-devel. I do not know the answers to his questions (although I suspect the "keeppxe" flag has something to do with it). Any help would be appreciated. Feel free to CC freedos-devel at lists.sourceforge.net. - Pat From: Eric Auer <eric at CoLi.Uni-SB.DE> Message-Id: <200401262148.WAA03293 at gnome.at.coli.uni-sb.de> To: freedos-devel at lists.sourceforge.net Subject: [Freedos-devel] re: Problem with FreeDOS + himem64 + PXELINUX + memdisk Date: Mon, 26 Jan 2004 22:48:36 +0100 (MET) Lines: 48 Hi Pat, are you sure that you do not for example have a virus on that system? Well, maybe it is just the PXE software but:> Old dos memory at 0x8fc00 (map says 0x9f800), loading at 0x8f400 > old: int13=e1f74d1a int15=f000f859**** Something in the UMB / ROM area hooks BIOS disk int Further, the "old DOS memory" information tells us that according to int 15, 638k of 640k (rest is BIOS EBDA I guess) should be avail, but that actually 63k less than that are available in low memory. MEMDISK then allocates 2k more there.> Ramdisk at 0xfe60000, length 0x00168000 > e820: 0000000000100000 000000000fef0000 1 > e820: 000000000fff0000 000000000000ec00 3 > e820: 000000000fffec00 0000000000001400 4 > e820: 00000000fff80000 0000000000080000 2Not sure how to read this, but looks like for example you have some 59k used at the end of the first 16 MB - maybe an overlay of some sort, special disk driver, ROM, BIOS data area for suspend...? Then there are 5k more to round up to 16 MB. Finally you have 512k of something much later, maybe a framebuffer or something. The MEMDISK (again, if I do understand things right) uses 1.5 MB at the end of the first big chunk in the first 16 MB. This seems to overlap other things, but I do not hope that MEMDISK would have such an obvious bug... As I do not know whether I have to subscribe to SYSLINUX for mailing there and as I do not know whether it has address hiding in a possible web archive, I am not CCing SYSLINUX at zytor - but feel free to forward. Hope this is not a too unqualified posting... Maybe PXELINUX or some BIOS or driver thing is an easy explanation for the extra memory areas, and maybe I misinterpreted the int 15.e820 information completely and there is no overlap, but you never know 8-]. Eric
H. Peter Anvin
2004-Jan-27 03:56 UTC
[syslinux] Problem with FreeDOS + himem64 + PXELINUX + memdisk
Patrick J. LoPresti wrote:> > Eric Auer, a FreeDOS developer, commented: > > Note that MEMDISK access might fail while A20 is off - the handler > itself is in 40:[13] allocated low memory, but the disk itself is > in int 15 high memory as far as I remember. You should check the > MEMDISK sources to see how disk data is accessed and what is done > for the A20. > > Unfortunately, I do not know enough x86 assembly to understand the > memdisk sources. >MEMDISK uses the INT 15h AH=87h mover function, using 386-style selectors. -hpa
H. Peter Anvin
2004-Jan-27 04:02 UTC
[syslinux] Re: Problem with FreeDOS + himem64 + PXELINUX + memdisk
Patrick J. LoPresti wrote:> The following reply was sent only to freedos-devel. I do not know the > answers to his questions (although I suspect the "keeppxe" flag has > something to do with it). > > Any help would be appreciated. Feel free to CC > freedos-devel at lists.sourceforge.net. > > - Pat > > > From: Eric Auer <eric at CoLi.Uni-SB.DE> > Message-Id: <200401262148.WAA03293 at gnome.at.coli.uni-sb.de> > To: freedos-devel at lists.sourceforge.net > Subject: [Freedos-devel] re: Problem with FreeDOS + himem64 + PXELINUX + memdisk > Date: Mon, 26 Jan 2004 22:48:36 +0100 (MET) > Lines: 48 > > > Hi Pat, are you sure that you do not for example have a virus on > that system? Well, maybe it is just the PXE software but: > > >>Old dos memory at 0x8fc00 (map says 0x9f800), loading at 0x8f400 >>old: int13=e1f74d1a int15=f000f859 > > **** Something in the UMB / ROM area hooks BIOS disk intIndeed, that is odd.> Further, the "old DOS memory" information tells us that according to > int 15, 638k of 640k (rest is BIOS EBDA I guess) should be avail, but > that actually 63k less than that are available in low memory. MEMDISK > then allocates 2k more there.This is presumably explainable by Patrick using "keeppxe" -- this is the memory occupied by the PXE stack.>>Ramdisk at 0xfe60000, length 0x00168000 >>e820: 0000000000100000 000000000fef0000 1 >>e820: 000000000fff0000 000000000000ec00 3 >>e820: 000000000fffec00 0000000000001400 4 >>e820: 00000000fff80000 0000000000080000 2 >These are: address, length, type for each INT 15h, AX=0E820h memory area. The type codes are 1 = memory, 2 = reserved, 3 and 4 = ACPI.> Not sure how to read this, but looks like for example you have some > 59k used at the end of the first 16 MB - maybe an overlay of some sort, > special disk driver, ROM, BIOS data area for suspend...? Then there are > 5k more to round up to 16 MB. Finally you have 512k of something much > later, maybe a framebuffer or something. The MEMDISK (again, if I do > understand things right) uses 1.5 MB at the end of the first big chunk > in the first 16 MB. This seems to overlap other things, but I do not hope > that MEMDISK would have such an obvious bug... > > As I do not know whether I have to subscribe to SYSLINUX for mailing there > and as I do not know whether it has address hiding in a possible web archive, > I am not CCing SYSLINUX at zytor - but feel free to forward.http://www.zytor.com/mailman/listinfo/syslinux Right now there is no must-subscribe policy for this list.> Hope this is not a too unqualified posting... Maybe PXELINUX or some BIOS > or driver thing is an easy explanation for the extra memory areas, and > maybe I misinterpreted the int 15.e820 information completely and there is > no overlap, but you never know 8-].-hpa
Patrick J. LoPresti
2004-Jan-27 22:01 UTC
[syslinux] Re: [Freedos-devel] Problem with FreeDOS + himem64 + PXELINUX + memdisk
I believe I have solved my own problem. (I am CC'ing syslinux at zytor.com for complete archives, but please direct additional followups to freedos-devel since this is definitely not a SYSLINUX problem.) I have been experimenting with my own himem sources. I applied the following patch to himem64.asm: =====================================================================--- himem64.asm 2004/01/27 20:42:43 1.1 +++ himem64.asm 2004/01/27 21:25:00 1.2 @@ -276,6 +276,8 @@ ; has to be requested a20_locks dw 0 ; internal A20 lock count +a20state db ? ; keeps A20 state across INT15h call + xms_handle_start dw normal_driver_end @@ -361,14 +363,27 @@ proc enable_a20 push ax mov ah,2 - jmp short disable_enable_a20 + call disable_enable_a20 + pop ax + ret disable_a20: push ax mov ah,0 + call disable_enable_a20 + push cx + mov cx,32 +@@delayloop: + call test_a20 + jz @@disabled + loop @@delayloop +@@disabled: + pop cx + pop ax + ret disable_enable_a20: - + push ax mov al,0d1h out 64h,al call delay @@ -491,10 +506,35 @@ ; proc int15_handler + cmp ah,87h + je do_move cmp ah,88h ; is it a ext. mem size req.? je ext_mem_size jmp [cs:old_int15] ; jump to old handler +do_move: + call test_a20 ; check if A20 is on or off + jz @@a20disabled + mov [cs:a20state],1 ; preserve state + jmp @@call_old_mover +@@a20disabled: + mov [cs:a20state],0 +@@call_old_mover: + pushf ; simulate INT call + call [cs:old_int15] + pushf ; save flags for return + push ax + cmp [cs:a20state],0 ; see if A20 has to be switched + jz @@disable_it + call enable_a20 + jmp @@move_done +@@disable_it: + call disable_a20 +@@move_done: + pop ax + popf + iret + ext_mem_size: xor ax,ax ; no memory available clc ; no error ===================================================================== This patch does two things. First, I stole the "save/restore A20 on INT15/AH=87" logic from FDXMS. Judging by the comments at the top of himem64.asm, this logic used to exist in himem64.exe as well; it is not clear why it was removed. With this change, my "instant" crashes on the Thinkpad T20 went away! But it still crashed somewhat later while running commands from autoexec.bat. Second, I stole the "delay on A20 disable" logic from memdisk. This made all of the crashes stop! That is, it resulted in a himem64.exe which works the same when invoked from PXELINUX+memdisk (with DOS=HIGH) as it does from a floppy (or without DOS=HIGH). Unfortunately, on my Thinkpad T20, the keyboard still locks up pretty quickly in all of these scenarios. The following additional patch fixes everything on my T20: =====================================================================--- himem64.asm 2004/01/27 21:25:00 1.2 +++ himem64.asm 2004/01/27 21:37:33 @@ -362,15 +362,24 @@ proc enable_a20 push ax - mov ah,2 - call disable_enable_a20 + mov ax,2401h + pushf + int 15h + popf + +; mov ah,2 +; call disable_enable_a20 pop ax ret disable_a20: push ax - mov ah,0 - call disable_enable_a20 + mov ax,2400h + pushf + int 15h + popf +; mov ah,0 +; call disable_enable_a20 push cx mov cx,32 @@delayloop: ===================================================================== This patch uses the BIOS (INT15/AX=2400 and 2401) interface to enable/disable A20. With this additional patch, everything works flawlessly on my T20. ...but it breaks on my Optiplex GX200, presumably because its BIOS does not support INT15/AX=2400. The ultimate solution, in my opinion, is to steal all of the logic from memdisk (init32.asm) for doing A20 switching. The logic goes: 1) See if there is no A20 gate; if so, use NOOP to disable/enable A20 2) See if the BIOS interface works (INT15/AX=2401); if so, use that. 3) See if the keyboard controller mechanism works (this is the mechanism himem64.exe currently uses always); if so, use that. 4) See if the "fast A20 gate" mechanism works; if so, use that. 5) Retry steps 1-4 255 times... 6) ...if that does not work, bomb out. I believe himem64.exe would support the widest variety of systems if it incorporated all of this logic. But steps (2) and (3) are mandatory for me. I believe the BIOS interface is superior to using PS/2 switching, since according to Ralf Brown's Interrupt List later PS/2s support the BIOS interface anyway. I am willing to write the code to support all this if the himem maintainer(s) are amenable to accepting patches. Although you will probably want to clean up my assembly first :-). Comments? - Pat
Eric Auer
2004-Jan-27 22:36 UTC
[syslinux] re: Problem with FreeDOS + himem64 + PXELINUX + memdisk
Hi Pat, your patches and/or MEMDISK have the problem that they do not DETECT which A20 setting styles work and which not! - PS/2: port 92h -> or 2 to enable, and ~2 to disable A20 - 8042: command d1 / port 60 ... here, too, ONLY bit 1 should be messed with (or 2 / and ~2). It is important to do "wait until 8042 is ready" and "wait until A20 actually switched". 8042 is slow! - BIOS: your BIOS call may or may not function, depending on the BIOS. So you should try PS/2 first. If this does not work, try BIOS. Finally, try 8042. Then keep using only the "tested and working" access method. Some hardware has the A20 stuck to enabled, should not be a problem. Just display a warning. Sometimes BIOS CMOS setup has a menu entry to switch between PS/2 and 8042 style support. You could tell the user about that if you detect that only slow 8042 access works. Finally, MEMDISK hooks int 15.87? HIMEM does, too. But the HIMEM hook only helps programs which are loaded AFTER HIMEM. Interesting. So the MEMDISK hook more or less has to help HIMEM ;-). And I do think that this is a SYSLINUX problem, so CCing them. Both MEMDISK (part of SYSLINUX package) and FreeDOS XMS drivers should be careful and select a fast and working A20 switching method. Retrying 255 times is not that good. Better try only once and allow for some waiting until it switches. Even after initial checks, the driver should always WAIT until the 8042 (if used) is really ready and WAIT until the A20 state really changed. I read in RBIL 61 ports.txt that on some systems the A20 gate is gate enabled = PS/2 gate enabled "or" 8042 gate enabled, but this depends on your CMOS setup and on your hardware. Some setup values can mean "only use PS/2 gate setting" or "only use 8042 gate setting" or even "keep A20 stuck to enabled" or even more weird things. It might be an idea to check if the logical connection is an OR if you find both PS/2 and 8042 working: In that case you would want to DISABLE through 8042 and SWITCH through PS/2 later. If the connection is an AND... well, imagine yourself. If connection is AND and 8042 is on "disable A20", you can do anything with PS/2 without success. This is very UNLIKELY, but still. If the connection is OR and 8042 is left on "enable A20", you cannot turn A20 off through PS/2... does not really hurt (A20 stuck to enabled does not really hurt at all in most cases). But of course there is this stupid MS EXEPACK software which needs A20 off if you try to load a file without LOADFIX (which moves the load segment beyond the first 64k)... I think EXEPACK is the only software which ever needed a "disable-able" A20 :-(. Confusing topic maybe, but definitely needs to be handled with care, so better do more checks than less. Luckily most of the checking code does not have to stay in RAM after initialization. Eric.