Raphael H. Becker
2006-Jan-11 04:10 UTC
[5.4-p6] Trouble with swap_pager: indefinite wait buffer on LSI(PERC4)-RAID on Dell PE6650
Hi *, one of our Dell PE6650 (4x Xeon, HTT, 2GB RAM) crash from time to time with kernel messages like: swap_pager: indefinite wait buffer: device amrd1s1d, blkno 77 Any access to the RAID is impossible (e.g. login on console, shutdown, ... ), have to powercycle it. What is the meaning of this message? What is the causation for this error? Does swap_pager crash the RAID? Maybe under load? Maybe any locking/SMP? swap seems to work: Swap: 2048M Total, 144K Used, 2048M Free Some technical details: * This filesystem is pretty loaded/stressed by the webserver/CMS and periodic rsync-jobs. Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/amrd1s1d 265G 77G 167G 32% 1415076 34454618 4% /data * There is a 2GB swap on amrd0s2b (or what is the problem with swap_pager?) * From dmesg: amr0: <LSILogic MegaRAID 1.51> mem 0xfce00000-0xfce0ffff irq 21 at device 1.0 on pci3 amr0: <LSILogic PERC 4/DC> Firmware 351S, BIOS 1.10, 128MB RAM amrd0: <LSILogic MegaRAID logical drive> on amr0 amrd0: 69880MB (143114240 sectors) RAID 1 (optimal) amrd1: <LSILogic MegaRAID logical drive> on amr0 amrd1: 279800MB (573030400 sectors) RAID 5 (optimal) * from pciconf: amr0@pci3:1:0: class=0x010400 card=0x05181028 chip=0x19601000 rev=0x01 hdr=0x00 vendor = 'LSI Logic (Was: Symbios Logic, NCR)' class = mass storage subclass = RAID * a typical load average is "1.02, 1.05, 1.05" (actually 52 httpd processes, have seen up to 120 httpd) * Kernel is 5.4-RELEASE-p6 with GENERIC plus SMP include GENERIC ident PE6650 options SMP Is there anything I can do? Any switches? sysctl? Is 6.0-RELEASE or will 6.1-RELEASE be a solution for that? Any patches in 5-STABLE? Need more info? Need more testing? I have a second machine of this which acts as a standby/fallback-system. I may test some things here (without workload). TIA Regards Raphael Becker
Jim Pingle
2006-Jan-11 15:36 UTC
[5.4-p6] Trouble with swap_pager: indefinite wait buffer on LSI(PERC4)-RAID on Dell PE6650
Raphael H. Becker wrote:> Hi *, > > swap_pager: indefinite wait buffer: device amrd1s1d, blkno 77 > > Any access to the RAID is impossible (e.g. login on console, shutdown, > ... ), have to powercycle it. > > What is the meaning of this message?I have encountered this error once before, and it meant that it timed out trying to access the disk/partition where swap was. It also coincided with a network card (fxp0) timeout, but I'm not sure that was related. The system was unresponsive from the console or remotely until it was completely power cycled, the reset button wasn't enough.> What is the causation for this error?Probably a disk/controller/SCSI timeout of some sort> amr0: <LSILogic MegaRAID 1.51> mem 0xfce00000-0xfce0ffff irq 21 at device 1.0 on pci3Mine also happened to be on an LSI/amr based card, but not a Dell. It's an older dual CPU PIII-800.> Is there anything I can do? > Any switches? sysctl? > Is 6.0-RELEASE or will 6.1-RELEASE be a solution for that? > Any patches in 5-STABLE?In my case, after some hair pulling, it turned out to be a bad SCSI cable. You might check your cabling and termination, and perhaps swap the cable even if it looks good -- mine looked better than the cable I replaced it with. Jim