Raphael H. Becker
2006-Jan-11  04:10 UTC
[5.4-p6] Trouble with swap_pager: indefinite wait buffer on LSI(PERC4)-RAID on Dell PE6650
Hi *,
one of our Dell PE6650 (4x Xeon, HTT, 2GB RAM) crash from time to time
with kernel messages like:
swap_pager: indefinite wait buffer: device amrd1s1d, blkno 77 
Any access to the RAID is impossible (e.g. login on console, shutdown,
... ), have to powercycle it.
What is the meaning of this message? 
What is the causation for this error? 
Does swap_pager crash the RAID? Maybe under load? Maybe any locking/SMP?
swap seems to work: Swap: 2048M Total, 144K Used, 2048M Free
Some technical details:
* This filesystem is pretty loaded/stressed by the webserver/CMS and
  periodic rsync-jobs.
Filesystem       Size    Used   Avail Capacity iused    ifree %iused Mounted on
/dev/amrd1s1d    265G     77G    167G    32% 1415076 34454618    4% /data
* There is a 2GB swap on amrd0s2b (or what is the problem with swap_pager?)
* From dmesg:
amr0: <LSILogic MegaRAID 1.51> mem 0xfce00000-0xfce0ffff irq 21 at device
1.0 on pci3
amr0: <LSILogic PERC 4/DC> Firmware 351S, BIOS 1.10, 128MB RAM
amrd0: <LSILogic MegaRAID logical drive> on amr0
amrd0: 69880MB (143114240 sectors) RAID 1 (optimal)
amrd1: <LSILogic MegaRAID logical drive> on amr0
amrd1: 279800MB (573030400 sectors) RAID 5 (optimal)
* from pciconf:
amr0@pci3:1:0:  class=0x010400 card=0x05181028 chip=0x19601000 rev=0x01 hdr=0x00
    vendor   = 'LSI Logic (Was: Symbios Logic, NCR)'
    class    = mass storage
    subclass = RAID
* a typical load average is "1.02,  1.05,  1.05" (actually 52 httpd
  processes, have seen up to 120 httpd)
* Kernel is 5.4-RELEASE-p6 with GENERIC plus SMP
include GENERIC
ident PE6650
options SMP
Is there anything I can do?
Any switches? sysctl?
Is 6.0-RELEASE or will 6.1-RELEASE be a solution for that?
Any patches in 5-STABLE?
Need more info?
Need more testing? I have a second machine of this which acts as a
standby/fallback-system. I may test some things here (without workload).
TIA
Regards
Raphael Becker
Jim Pingle
2006-Jan-11  15:36 UTC
[5.4-p6] Trouble with swap_pager: indefinite wait buffer on LSI(PERC4)-RAID on Dell PE6650
Raphael H. Becker wrote:> Hi *, > > swap_pager: indefinite wait buffer: device amrd1s1d, blkno 77 > > Any access to the RAID is impossible (e.g. login on console, shutdown, > ... ), have to powercycle it. > > What is the meaning of this message?I have encountered this error once before, and it meant that it timed out trying to access the disk/partition where swap was. It also coincided with a network card (fxp0) timeout, but I'm not sure that was related. The system was unresponsive from the console or remotely until it was completely power cycled, the reset button wasn't enough.> What is the causation for this error?Probably a disk/controller/SCSI timeout of some sort> amr0: <LSILogic MegaRAID 1.51> mem 0xfce00000-0xfce0ffff irq 21 at device 1.0 on pci3Mine also happened to be on an LSI/amr based card, but not a Dell. It's an older dual CPU PIII-800.> Is there anything I can do? > Any switches? sysctl? > Is 6.0-RELEASE or will 6.1-RELEASE be a solution for that? > Any patches in 5-STABLE?In my case, after some hair pulling, it turned out to be a bad SCSI cable. You might check your cabling and termination, and perhaps swap the cable even if it looks good -- mine looked better than the cable I replaced it with. Jim