Dmitry Morozovsky
2005-Feb-16 06:08 UTC
stable sata patch: panic at kernel boot (can't dump)
Dear Doug, trying to boot RELENG_4 kernel with your patches (sata_7) on our FTP I got kernel panic (page fault in kernel mode, pid 2, no dump possible). Hardware involved: root@kucha:~# grep ata /var/run/dmesg.boot atapci0: <Promise ATA66 controller> port 0xa000-0xa03f,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 mem 0xed100000-0xed11ffff irq 11 at device 8.0 on pci0 ata2: at 0x9000 on atapci0 ata3: at 0x9800 on atapci0 atapci1: <CMD 649 ATA100 controller> port 0xb400-0xb40f,0xb000-0xb003,0xac00-0xac07,0xa800-0xa803,0xa400-0xa407 irq 10 at device 9.0 on pci0 ata4: at 0xa400 on atapci1 ata5: at 0xac00 on atapci1 atapci2: <VIA 8233 ATA133 controller> port 0xbc00-0xbc0f at device 17.1 on pci0 ata0: at 0x1f0 irq 14 on atapci2 ata1: at 0x170 irq 15 on atapci2 ad0: 238475MB <WDC WD2500JB-00EVA0> [484521/16/63] at ata0-master UDMA100 ad2: 114473MB <WDC WD1200JB-00CRA1> [232581/16/63] at ata1-master UDMA100 ad4: 76319MB <WDC WD800JB-00CRA1> [155061/16/63] at ata2-master UDMA66 ad6: 76319MB <WDC WD800BB-00CJA1> [155061/16/63] at ata3-master UDMA66 ad8: 57241MB <WDC WD600BB-00CCB0> [116301/16/63] at ata4-master UDMA100 Kernel paniced just after sio0/sio1, where basic RELENG_4 starts ata channel probes. No serial console at the moment, alas. Unfortunately I can't bring this machine out of service for long time; however, we can survive occasional reboots/crashes. What other info can I provide to debug this? Sincerely, D.Marck [DM5020, MCK-RIPE, DM3-RIPN] ------------------------------------------------------------------------ *** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- marck@rinet.ru *** ------------------------------------------------------------------------
Dmitry Morozovsky writes: | Dear Doug, | | trying to boot RELENG_4 kernel with your patches (sata_7) on our FTP I got | kernel panic (page fault in kernel mode, pid 2, no dump possible). Hardware | involved: | | root@kucha:~# grep ata /var/run/dmesg.boot | atapci0: <Promise ATA66 controller> port 0xa000-0xa03f,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 mem 0xed100000-0xed11ffff irq 11 at device 8.0 on pci0 | ata2: at 0x9000 on atapci0 | ata3: at 0x9800 on atapci0 | atapci1: <CMD 649 ATA100 controller> port 0xb400-0xb40f,0xb000-0xb003,0xac00-0xac07,0xa800-0xa803,0xa400-0xa407 irq 10 at device 9.0 on pci0 | ata4: at 0xa400 on atapci1 | ata5: at 0xac00 on atapci1 | atapci2: <VIA 8233 ATA133 controller> port 0xbc00-0xbc0f at device 17.1 on pci0 | ata0: at 0x1f0 irq 14 on atapci2 | ata1: at 0x170 irq 15 on atapci2 | ad0: 238475MB <WDC WD2500JB-00EVA0> [484521/16/63] at ata0-master UDMA100 | ad2: 114473MB <WDC WD1200JB-00CRA1> [232581/16/63] at ata1-master UDMA100 | ad4: 76319MB <WDC WD800JB-00CRA1> [155061/16/63] at ata2-master UDMA66 | ad6: 76319MB <WDC WD800BB-00CJA1> [155061/16/63] at ata3-master UDMA66 | ad8: 57241MB <WDC WD600BB-00CCB0> [116301/16/63] at ata4-master UDMA100 | | Kernel paniced just after sio0/sio1, where basic RELENG_4 starts ata channel | probes. No serial console at the moment, alas. | | Unfortunately I can't bring this machine out of service for long time; however, | we can survive occasional reboots/crashes. What other info can I provide to | debug this? I'd like some clarification. Does the system boot sometimes and other times is doesn't? Once the system is up does it stay up for a while? It doesn't seem like you are not using RAID. I have a couple more ata bug fixes that I need to roll into another patchset. It fixes a bug in which DMA transfers have not been cancelled when the controller is reset. I fixed another panic situation in version 8 that happens on boot if you have a bad sector at the beginning of the drive. I'd wait to version 9. I should be able to get that out later today. Another thing that you might want to do is monitor dmesgs for any ata/ad errors while the system is running. Most panics happen later after the first error message. Also you could try looking at /var/log/messages. Thanks, Doug A.