Chao-Rui Chang
2010-Jul-20 07:27 UTC
[Xen-users] xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device
Dear folks, I have problem with xen-4 + megasas driver (dell raid card 6i). The raid controller goes offline randomly. This may be the xen hypervisor problem. The following is the log message from xen 4.0 and 4.1-unstable: xen 4.1-unstable + kernel 2.6.32.16: ->System can boot, but megasas fail after about 3-4 hours megasas: [ 0]waiting for 25 commands to complete megasas: [ 5]waiting for 25 commands to complete megasas: [10]waiting for 25 commands to complete megasas: [15]waiting for 25 commands to complete megasas: [20]waiting for 25 commands to complete megasas: [25]waiting for 25 commands to complete ... megasas: cannot recover from previous reset failures end_request: I/O error, dev sda, sector 46755821 Buffer I/O error on device dm-0, logical block 5818372 lost page write due to I/O error on dm-0 Buffer I/O error on device dm-0, logical block 5818373 ... end_request: I/O error, dev sda, sector 7024621 Aborting journal on device dm-0. end_request: I/O error, dev sda, sector 7024653 ... ext4-fs error (device dm-0) in ext4_dirty_inode: Journal has aborted sd 0:2:0:0: rejecting I/O to offline device ext4_abort called. ext4-fs error (device dm-0): ext4_journal_start_sb: Detected aborted journal Remounting filesystem read-only end_request: I/O error, dev sda, sector 7024861 end_request: I/O error, dev sda, sector 7024893 ext4-fs error (device dm-0) in ext4_reserve_inode_write: Journal has aborted ext4-fs error (device dm-0) in ext4_reserve_inode_write: Journal has aborted end_request: I/O error, dev sda, sector 7024965 ... Buffer I/O error on device dm-0, logical block 1 lost page write due to I/O error on dm-0 sd 0:2:0:0: rejecting I/O to offline device sd 0:2:0:0: rejecting I/O to offline device ... xen 4.0-stable + kernel 2.6.32.16: ->System can not boot, megasas fail at boot time xen: registering gsi 33 triggering 0 polarity 1 xen: --> irq=33 megaraid_sas 0000:03:00.0: PCI INT A -> GSI 33 (level, low) -> IRQ 33 megaraid_sas 0000:03:00.0: setting latency timer to 64 megasas: FW now in Ready state scsi0 : LSI SAS based MegaRAID driver scsi 0:0:0:0: Direct-Access ATA WDC WD1002FBYS-1 0C10 PQ: 0 ANSI: 5 scsi 0:0:1:0: Direct-Access ATA WDC WD1002FBYS-1 0C10 PQ: 0 ANSI: 5 scsi 0:0:2:0: Direct-Access ATA WDC WD1002FBYS-1 0C10 PQ: 0 ANSI: 5 scsi 0:0:32:0: Enclosure DP BACKPLANE 1.07 PQ: 0 ANSI: 5 scsi 0:2:0:0: Direct-Access DELL PERC 6/i 1.22 PQ: 0 ANSI: 5 megasas: [ 0]waiting for 25 commands to complete megasas: [ 5]waiting for 25 commands to complete .... any idea about this problem? Thanks. ----------------------------------------------------- Best regards, Chao-Rui _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2010-Jul-20 12:26 UTC
Re: [Xen-users] xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device
On Tue, Jul 20, 2010 at 03:27:19PM +0800, Chao-Rui Chang wrote:> Dear folks, > > I have problem with xen-4 + megasas driver (dell raid card 6i). > The raid controller goes offline randomly. > > This may be the xen hypervisor problem. >I don''t think it''s a problem with Xen hypervisor. It''s more probably a problem with the dom0 kernel. Does the same kernel work OK when it''s booted on native baremetal (without Xen) ? -- Pasi> The following is the log message from xen 4.0 and 4.1-unstable: > > xen 4.1-unstable + kernel [1]2.6.32.16: >      ->System can boot, but megasas fail after about 3-4 hours >     megasas: [ 0]waiting for 25 commands to complete >     megasas: [ 5]waiting for 25 commands to complete >     megasas: [10]waiting for 25 commands to complete >     megasas: [15]waiting for 25 commands to complete >     megasas: [20]waiting for 25 commands to complete >     megasas: [25]waiting for 25 commands to complete >                      ... >     megasas: cannot recover from previous reset failures >     end_request: I/O error, dev sda, sector 46755821 >     Buffer I/O error on device dm-0, logical block 5818372 >     lost page write due to I/O error on dm-0 >     Buffer I/O error on device dm-0, logical block 5818373 >                     ... >     end_request: I/O error, dev sda, sector 7024621 >     Aborting journal on device dm-0. >     end_request: I/O error, dev sda, sector 7024653 >                     ... >     ext4-fs error (device dm-0) in ext4_dirty_inode: Journal has > aborted >     sd 0:2:0:0: rejecting I/O to offline device >     ext4_abort called. >     ext4-fs error (device dm-0): ext4_journal_start_sb: Detected > aborted journal >     Remounting filesystem read-only >     end_request: I/O error, dev sda, sector 7024861 >     end_request: I/O error, dev sda, sector 7024893 >     ext4-fs error (device dm-0) in ext4_reserve_inode_write: Journal > has aborted >     ext4-fs error (device dm-0) in ext4_reserve_inode_write: Journal > has aborted >     end_request: I/O error, dev sda, sector 7024965 >                   ... >     Buffer I/O error on device dm-0, logical block 1 >     lost page write due to I/O error on dm-0 >     sd 0:2:0:0: rejecting I/O to offline device >     sd 0:2:0:0: rejecting I/O to offline device >                   ... > > xen 4.0-stable + kernel [2]2.6.32.16: >      ->System can not boot, megasas fail at boot time > xen: registering gsi 33 triggering 0 polarity 1 > xen: --> irq=33 > megaraid_sas 0000:03:00.0: PCI INT A -> GSI 33 (level, low) -> IRQ 33 > megaraid_sas 0000:03:00.0: setting latency timer to 64 > megasas: FW now in Ready state > scsi0 : LSI SAS based MegaRAID driver > scsi 0:0:0:0: Direct-Access    ATA     WDC WD1002FBYS-1 0C10 > PQ: 0 ANSI: 5 > scsi 0:0:1:0: Direct-Access    ATA     WDC WD1002FBYS-1 0C10 > PQ: 0 ANSI: 5 > scsi 0:0:2:0: Direct-Access    ATA     WDC WD1002FBYS-1 0C10 > PQ: 0 ANSI: 5 > scsi 0:0:32:0: Enclosure        DP      > BACKPLANE       1.07 PQ: 0 ANSI: 5 > scsi 0:2:0:0: Direct-Access    DELL    PERC 6/i        > 1.22 PQ: 0 ANSI: 5 > megasas: [ 0]waiting for 25 commands to complete > megasas: [ 5]waiting for 25 commands to complete >              .... > > any idea about this problem? > > Thanks. > ----------------------------------------------------- > Best regards, > Chao-Rui > > References > > Visible links > 1. http://2.6.32.16/ > 2. http://2.6.32.16/> _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2010-Jul-20 17:17 UTC
Re: [Xen-users] xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device
On Wed, Jul 21, 2010 at 01:11:59AM +0800, Chao-Rui Chang wrote:> Pasi, > > Yes, it works fine without xen. >Ok. Did you try Xen 4.0.1-rc4 ? -- Pasi> 2010/7/20 Pasi KÀrkkÀinen <[1]pasik@iki.fi> > > On Tue, Jul 20, 2010 at 03:27:19PM +0800, Chao-Rui Chang wrote: > >   Dear folks, > > > >   I have problem with xen-4 + megasas driver (dell raid card 6i). > >   The raid controller goes offline randomly. > > > >   This may be the xen hypervisor problem. > > > > I don''t think it''s a problem with Xen hypervisor. > It''s more probably a problem with the dom0 kernel. > > Does the same kernel work OK when it''s booted on native baremetal > (without Xen) ? > > -- Pasi > >   The following is the log message from xen 4.0 and 4.1-unstable: > > > >   xen 4.1-unstable + kernel [1][2]2.6.32.16: > >   Ã* Ã* Ã* Ã* Ã*  ->System can boot, but megasas fail after about > 3-4 hours > >   Ã* Ã* Ã* Ã*  megasas: [ 0]waiting for 25 commands to complete > >   Ã* Ã* Ã* Ã*  megasas: [ 5]waiting for 25 commands to complete > >   Ã* Ã* Ã* Ã*  megasas: [10]waiting for 25 commands to complete > >   Ã* Ã* Ã* Ã*  megasas: [15]waiting for 25 commands to complete > >   Ã* Ã* Ã* Ã*  megasas: [20]waiting for 25 commands to complete > >   Ã* Ã* Ã* Ã*  megasas: [25]waiting for 25 commands to complete > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* >  ... > >   Ã* Ã* Ã* Ã*  megasas: cannot recover from previous reset > failures > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 46755821 > >   Ã* Ã* Ã* Ã*  Buffer I/O error on device dm-0, logical block > 5818372 > >   Ã* Ã* Ã* Ã*  lost page write due to I/O error on dm-0 > >   Ã* Ã* Ã* Ã*  Buffer I/O error on device dm-0, logical block > 5818373 > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  ... > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024621 > >   Ã* Ã* Ã* Ã*  Aborting journal on device dm-0. > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024653 > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  ... > >   Ã* Ã* Ã* Ã*  ext4-fs error (device dm-0) in ext4_dirty_inode: > Journal has > >   aborted > >   Ã* Ã* Ã* Ã*  sd 0:2:0:0: rejecting I/O to offline device > >   Ã* Ã* Ã* Ã*  ext4_abort called. > >   Ã* Ã* Ã* Ã*  ext4-fs error (device dm-0): ext4_journal_start_sb: > Detected > >   aborted journal > >   Ã* Ã* Ã* Ã*  Remounting filesystem read-only > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024861 > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024893 > >   Ã* Ã* Ã* Ã*  ext4-fs error (device dm-0) in > ext4_reserve_inode_write: Journal > >   has aborted > >   Ã* Ã* Ã* Ã*  ext4-fs error (device dm-0) in > ext4_reserve_inode_write: Journal > >   has aborted > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024965 > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  ... > >   Ã* Ã* Ã* Ã*  Buffer I/O error on device dm-0, logical block 1 > >   Ã* Ã* Ã* Ã*  lost page write due to I/O error on dm-0 > >   Ã* Ã* Ã* Ã*  sd 0:2:0:0: rejecting I/O to offline device > >   Ã* Ã* Ã* Ã*  sd 0:2:0:0: rejecting I/O to offline device > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  ... > > > >   xen 4.0-stable + kernel [2][3]2.6.32.16: > >   Ã* Ã* Ã* Ã* Ã*  ->System can not boot, megasas fail at boot time > >   xen: registering gsi 33 triggering 0 polarity 1 > >   xen: --> irq=33 > >   megaraid_sas 0000:03:00.0: PCI INT A -> GSI 33 (level, low) -> > IRQ 33 > >   megaraid_sas 0000:03:00.0: setting latency timer to 64 > >   megasas: FW now in Ready state > >   scsi0 : LSI SAS based MegaRAID driver > >   scsi 0:0:0:0: Direct-AccessÃ* Ã* Ã* Ã*  ATAÃ* Ã* Ã* Ã* Ã*  WDC > WD1002FBYS-1 0C10 > >   PQ: 0 ANSI: 5 > >   scsi 0:0:1:0: Direct-AccessÃ* Ã* Ã* Ã*  ATAÃ* Ã* Ã* Ã* Ã*  WDC > WD1002FBYS-1 0C10 > >   PQ: 0 ANSI: 5 > >   scsi 0:0:2:0: Direct-AccessÃ* Ã* Ã* Ã*  ATAÃ* Ã* Ã* Ã* Ã*  WDC > WD1002FBYS-1 0C10 > >   PQ: 0 ANSI: 5 > >   scsi 0:0:32:0: EnclosureÃ* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  DPÃ* Ã* Ã* Ã* > Ã* Ã* > >   BACKPLANEÃ* Ã* Ã* Ã* Ã* Ã* Ã*  1.07 PQ: 0 ANSI: 5 > >   scsi 0:2:0:0: Direct-AccessÃ* Ã* Ã* Ã*  DELLÃ* Ã* Ã* Ã*  PERC > 6/iÃ* Ã* Ã* Ã* Ã* Ã* Ã* Ã* > >   1.22 PQ: 0 ANSI: 5 > >   megasas: [ 0]waiting for 25 commands to complete > >   megasas: [ 5]waiting for 25 commands to complete > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  .... > > > >   any idea about this problem? > > > >   Thanks. > >   ----------------------------------------------------- > >   Best regards, > >   Chao-Rui > > > > References > > > >   Visible links > >   1. [4]http://2.6.32.16/ > >   2. [5]http://2.6.32.16/ > > > _______________________________________________ > > Xen-users mailing list > > [6]Xen-users@lists.xensource.com > > [7]http://lists.xensource.com/xen-users > > References > > Visible links > 1. mailto:pasik@iki.fi > 2. http://2.6.32.16/ > 3. http://2.6.32.16/ > 4. http://2.6.32.16/ > 5. http://2.6.32.16/ > 6. mailto:Xen-users@lists.xensource.com > 7. http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Chao-Rui Chang
2010-Jul-20 18:05 UTC
Re: [Xen-users] xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device
Pasi, I have tried xen-4.0, 4.0.1-rc1,rc2, latest hg version xen-4.0: failed at boot time xen-4.0.1*: failed after several hours I will try 4.0.1-rc4 now. Thanks. 2010/7/21 Pasi Kärkkäinen <pasik@iki.fi>> On Wed, Jul 21, 2010 at 01:11:59AM +0800, Chao-Rui Chang wrote: > > Pasi, > > > > Yes, it works fine without xen. > > > > Ok. Did you try Xen 4.0.1-rc4 ? > > -- Pasi > > > 2010/7/20 Pasi KÀrkkÀinen <[1]pasik@iki.fi> > > > > On Tue, Jul 20, 2010 at 03:27:19PM +0800, Chao-Rui Chang wrote: > > >   Dear folks, > > > > > >   I have problem with xen-4 + megasas driver (dell raid card > 6i). > > >   The raid controller goes offline randomly. > > > > > >   This may be the xen hypervisor problem. > > > > > > > I don''t think it''s a problem with Xen hypervisor. > > It''s more probably a problem with the dom0 kernel. > > > > Does the same kernel work OK when it''s booted on native baremetal > > (without Xen) ? > > > > -- Pasi > > >   The following is the log message from xen 4.0 and > 4.1-unstable: > > > > > >   xen 4.1-unstable + kernel [1][2]2.6.32.16: > > >   Ã* Ã* Ã* Ã* Ã*  ->System can boot, but megasas fail after > about > > 3-4 hours > > >   Ã* Ã* Ã* Ã*  megasas: [ 0]waiting for 25 commands to > complete > > >   Ã* Ã* Ã* Ã*  megasas: [ 5]waiting for 25 commands to > complete > > >   Ã* Ã* Ã* Ã*  megasas: [10]waiting for 25 commands to > complete > > >   Ã* Ã* Ã* Ã*  megasas: [15]waiting for 25 commands to > complete > > >   Ã* Ã* Ã* Ã*  megasas: [20]waiting for 25 commands to > complete > > >   Ã* Ã* Ã* Ã*  megasas: [25]waiting for 25 commands to > complete > > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* > Ã* > >  ... > > >   Ã* Ã* Ã* Ã*  megasas: cannot recover from previous reset > > failures > > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector > 46755821 > > >   Ã* Ã* Ã* Ã*  Buffer I/O error on device dm-0, logical block > > 5818372 > > >   Ã* Ã* Ã* Ã*  lost page write due to I/O error on dm-0 > > >   Ã* Ã* Ã* Ã*  Buffer I/O error on device dm-0, logical block > > 5818373 > > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  > ... > > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024621 > > >   Ã* Ã* Ã* Ã*  Aborting journal on device dm-0. > > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024653 > > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  > ... > > >   Ã* Ã* Ã* Ã*  ext4-fs error (device dm-0) in > ext4_dirty_inode: > > Journal has > > >   aborted > > >   Ã* Ã* Ã* Ã*  sd 0:2:0:0: rejecting I/O to offline device > > >   Ã* Ã* Ã* Ã*  ext4_abort called. > > >   Ã* Ã* Ã* Ã*  ext4-fs error (device dm-0): > ext4_journal_start_sb: > > Detected > > >   aborted journal > > >   Ã* Ã* Ã* Ã*  Remounting filesystem read-only > > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024861 > > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024893 > > >   Ã* Ã* Ã* Ã*  ext4-fs error (device dm-0) in > > ext4_reserve_inode_write: Journal > > >   has aborted > > >   Ã* Ã* Ã* Ã*  ext4-fs error (device dm-0) in > > ext4_reserve_inode_write: Journal > > >   has aborted > > >   Ã* Ã* Ã* Ã*  end_request: I/O error, dev sda, sector 7024965 > > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  ... > > >   Ã* Ã* Ã* Ã*  Buffer I/O error on device dm-0, logical block > 1 > > >   Ã* Ã* Ã* Ã*  lost page write due to I/O error on dm-0 > > >   Ã* Ã* Ã* Ã*  sd 0:2:0:0: rejecting I/O to offline device > > >   Ã* Ã* Ã* Ã*  sd 0:2:0:0: rejecting I/O to offline device > > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  ... > > > > > >   xen 4.0-stable + kernel [2][3]2.6.32.16: > > >   Ã* Ã* Ã* Ã* Ã*  ->System can not boot, megasas fail at boot > time > > >   xen: registering gsi 33 triggering 0 polarity 1 > > >   xen: --> irq=33 > > >   megaraid_sas 0000:03:00.0: PCI INT A -> GSI 33 (level, low) > -> > > IRQ 33 > > >   megaraid_sas 0000:03:00.0: setting latency timer to 64 > > >   megasas: FW now in Ready state > > >   scsi0 : LSI SAS based MegaRAID driver > > >   scsi 0:0:0:0: Direct-AccessÃ* Ã* Ã* Ã*  ATAÃ* Ã* Ã* Ã* Ã*  > WDC > > WD1002FBYS-1 0C10 > > >   PQ: 0 ANSI: 5 > > >   scsi 0:0:1:0: Direct-AccessÃ* Ã* Ã* Ã*  ATAÃ* Ã* Ã* Ã* Ã*  > WDC > > WD1002FBYS-1 0C10 > > >   PQ: 0 ANSI: 5 > > >   scsi 0:0:2:0: Direct-AccessÃ* Ã* Ã* Ã*  ATAÃ* Ã* Ã* Ã* Ã*  > WDC > > WD1002FBYS-1 0C10 > > >   PQ: 0 ANSI: 5 > > >   scsi 0:0:32:0: EnclosureÃ* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  DPÃ* Ã* Ã* > Ã* > > Ã* Ã* > > >   BACKPLANEÃ* Ã* Ã* Ã* Ã* Ã* Ã*  1.07 PQ: 0 ANSI: 5 > > >   scsi 0:2:0:0: Direct-AccessÃ* Ã* Ã* Ã*  DELLÃ* Ã* Ã* Ã*  > PERC > > 6/iÃ* Ã* Ã* Ã* Ã* Ã* Ã* Ã* > > >   1.22 PQ: 0 ANSI: 5 > > >   megasas: [ 0]waiting for 25 commands to complete > > >   megasas: [ 5]waiting for 25 commands to complete > > >   Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã* Ã*  .... > > > > > >   any idea about this problem? > > > > > >   Thanks. > > >   ----------------------------------------------------- > > >   Best regards, > > >   Chao-Rui > > > > > > References > > > > > >   Visible links > > >   1. [4]http://2.6.32.16/ > > >   2. [5]http://2.6.32.16/ > > > > > _______________________________________________ > > > Xen-users mailing list > > > [6]Xen-users@lists.xensource.com > > > [7]http://lists.xensource.com/xen-users > > > > References > > > > Visible links > > 1. mailto:pasik@iki.fi > > 2. http://2.6.32.16/ > > 3. http://2.6.32.16/ > > 4. http://2.6.32.16/ > > 5. http://2.6.32.16/ > > 6. mailto:Xen-users@lists.xensource.com > > 7. http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Oct-10 20:38 UTC
Re: [Xen-users] xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device
Hi all, Am 20.07.2010 09:27, schrieb Chao-Rui Chang:> I have problem with xen-4 + megasas driver (dell raid card 6i). > The raid controller goes offline randomly. > > ext4-fs error (device dm-0) in ext4_dirty_inode: Journal has aborted > sd 0:2:0:0: rejecting I/O to offline deviceI faced the very same problem today (Intel RS2BL040 a.k.a. LSI MegaSAS 9260 Xen 4.0.1, Debian Linux Kernel 2.6.32-5-xen-amd64 that includes megaraid_sas driver 4.01) and I remembered that I solved it for another server with an Intel SRCSAS18E, Xen 3.3 and some other kernel a while ago. Last time, it was the megaraid_sas driver that caused the filesystems to go offline randomly when running in a dom0 under Xen. The system ran stable when the kernel was running on the bare metal. I don''t now yet if this is a permanent solution but I tried to fix it by repeating the steps from the last time (server is still running after 7 hours...): 1. Grabbed latest driver (4.31) from LSI. I took the Ubuntu package (Ubuntu_10_04_LTS_4_31.zip) because it contains sources adjusted to kernel 2.6.32. There''s also a Debian 5.0.5 driver available containing the sources and binaries for 2.6.26. 2. Unwrapped the driver sources, compiled with cd megaraid_sas-v00.00.04.31-ieee make -C /lib/modules/$(uname -r)/build M=$PWD modules and installed it (you may want so save the old module first) with cp -p megaraid_sas.ko /lib/modules/$(uname -r)/kernel/drivers/scsi/megaraid/megaraid_sas.ko 3. Updated initrd with update-initramfs -u and rebooted. The new driver introduces himself with megasas: 00.00.04.31-ieee Thur June 17 14:13:02 EST 2010 megasas: 0x1000:0x0079:0x8086:0x9260: bus 5:slot 0:func 0 xen: registering gsi 16 triggering 0 polarity 1 xen_allocate_pirq: returning irq 16 for gsi 16 xen: --> irq=16 Already setup the GSI :16 megaraid_sas 0000:05:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 megaraid_sas 0000:05:00.0: setting latency timer to 64 megasas: FW now in Ready state megasas_init_mfi: fw_support_ieee=0 scsi0 : LSI SAS based MegaRAID driver [...] 4. Updated the RAID controller firmware to the latest available release. Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Oct-12 12:43 UTC
[Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hello all, Am 10.10.2010 22:38, schrieb Stephan Austermühle:> I don''t now yet if this is a permanent solution but I tried to fix it by > repeating the steps from the last time (server is still running after 7 > hours...):Looks like I have not found a permanent solution. Any comments about the attached messages? Here''s some more information: # xm info host : n0008 release : 2.6.32-5-xen-amd64 version : #1 SMP Fri Sep 17 22:00:48 UTC 2010 machine : x86_64 nr_cpus : 4 nr_nodes : 1 cores_per_socket : 4 threads_per_core : 1 cpu_mhz : 2400 hw_caps : bfebfbff:28100800:00000000:00001b40:0098e3fd:00000000:00000001:00000000 virt_caps : hvm hvm_directio total_memory : 4058 free_memory : 2221 node_to_cpu : node0:0-3 node_to_memory : node0:2221 node_to_dma32_mem : node0:2221 max_node_id : 0 xen_major : 4 xen_minor : 0 xen_extra : .1 xen_caps : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 xen_scheduler : credit xen_pagesize : 4096 platform_params : virt_start=0xffff800000000000 xen_changeset : Wed Aug 25 09:23:31 2010 +0100 21326:07ac5459b250 xen_commandline : placeholder console=vga dom0_mem=512M cpufreq=xen hpetbroadcast xencons=tty watchdog noreboot iommu=pv cc_compiler : gcc version 4.4.5 20100728 (prerelease) (Debian 4.4.4-8) cc_compile_by : root cc_compile_domain : (none) cc_compile_date : Sun Aug 29 14:45:50 CEST 2010 xend_config_format : 4 # cat /proc/cmdline placeholder root=/dev/mapper/vg00-lv_root ro quiet pciback.hide=(0a:00.0) xen-pciback.hide=(0a:00.0) Best regards, Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Oct-17 10:02 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hello again, Am 12.10.2010 14:43, schrieb Stephan Austermühle:> Looks like I have not found a permanent solution. Any comments about the > attached messages?Latest news with this issue: Updated to Debian kernel 2.6.32-24 that fixes two issues: domU''s network activity stops randomly ("netfront smartpoll bugfix") and a lost interrupt issue ("Fix lost interrupt race in Xen event channels"): http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=596635 http://kerneltrap.org/mailarchive/linux-kernel/2010/8/24/4610615 By the way the kernel update reverted the megaraid_sas driver back to 4.01. Anyway, this doesn''t seem to help. Today the server freezed again. This time without any visible error message---so I don''t know if it was a problem with the megaraid_sas driver. In contrast to the previous hangs or freezes it''s not just the I/O that went offline but the entire system. Let me know if you have any ideas on this issue. Otherwise I will continue my monologue just in case anybody else faces the same issue. Regards, Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Oct-18 06:21 UTC
[Xen-users] System freezes after xen_end_context_switch
Hi again, it''s driving me nuts... My server freezed again (see attached logs). Anybody having hints? Setup: Xen 4.0.1, Debian kernel 2.6.32-24, megaraid_sas driver 4.31-ieee. Best regards, Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2010-Nov-07 21:38 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
On Sun, Oct 17, 2010 at 12:02:35PM +0200, Stephan Austermühle wrote:> Hello again, > > Am 12.10.2010 14:43, schrieb Stephan Austermühle: > > > Looks like I have not found a permanent solution. Any comments about the > > attached messages? > > Latest news with this issue: > > Updated to Debian kernel 2.6.32-24 that fixes two issues: domU''s network > activity stops randomly ("netfront smartpoll bugfix") and a lost > interrupt issue ("Fix lost interrupt race in Xen event channels"): > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=596635 > > http://kerneltrap.org/mailarchive/linux-kernel/2010/8/24/4610615 > > By the way the kernel update reverted the megaraid_sas driver back to 4.01. > > Anyway, this doesn''t seem to help. Today the server freezed again. This > time without any visible error message---so I don''t know if it was a > problem with the megaraid_sas driver. In contrast to the previous hangs > or freezes it''s not just the I/O that went offline but the entire system. > > Let me know if you have any ideas on this issue. Otherwise I will > continue my monologue just in case anybody else faces the same issue. >I just noticed today the default megaraid_sas driver in upstream Linux 2.6.32.25 fails to boot (doesn''t detect disks). I fixed the problem by updating the driver to version 4.33. Maybe that helps? It''s available at least here: https://bugzilla.redhat.com/show_bug.cgi?id=493093 on comment #16 as an attachment. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Nov-09 07:12 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hello everybody, Am 07.11.2010 22:38, schrieb Pasi Kärkkäinen:> I just noticed today the default megaraid_sas driver in upstream Linux 2.6.32.25 > fails to boot (doesn''t detect disks). > > I fixed the problem by updating the driver to version 4.33. > Maybe that helps?Thanks for your hint. I''ve just updated the megaraid_sas driver to 4.33 and the Debian kernel to 2.6.32-5-27 now that the system freezed again last night (after about 10 days of operation). Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mark Adams
2010-Nov-09 09:36 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
All, I''ve had a very similar issue twice with Xen 4.0.1 but using an Areca card (arcmsr) - The sda device (raid1) goes offline, but with just a reboot everything comes back up and the raid device shows as normal. Could this possibly actually be a xen issue? Regards, Mark On Tue, Nov 09, 2010 at 08:12:41AM +0100, Stephan Austermühle wrote:> Hello everybody, > > Am 07.11.2010 22:38, schrieb Pasi Kärkkäinen: > > > I just noticed today the default megaraid_sas driver in upstream Linux 2.6.32.25 > > fails to boot (doesn''t detect disks). > > > > I fixed the problem by updating the driver to version 4.33. > > Maybe that helps? > > Thanks for your hint. I''ve just updated the megaraid_sas driver to 4.33 > and the Debian kernel to 2.6.32-5-27 now that the system freezed again > last night (after about 10 days of operation). > > Stephan >> _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Nov-10 08:02 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi Mark! Am 09.11.2010 10:36, schrieb Mark Adams:> All, I''ve had a very similar issue twice with Xen 4.0.1 but using an > Areca card (arcmsr) - The sda device (raid1) goes offline, but with just > a reboot everything comes back up and the raid device shows as normal. > > Could this possibly actually be a xen issue?At least once there was a fix for a bug in the megaraid_sas driver causing a Xen system to hang. It is not that easy to say if Xen or a driver is responsible for if there''s no visible error... Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Nov-14 15:46 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi all, Am 07.11.2010 22:38, schrieb Pasi Kärkkäinen:> I fixed the problem by updating the driver to version 4.33. > Maybe that helps?Now I know for sure that updating the driver to 4.33 didn''t help. Somewhat still makes it going offline resulting in tons of sd 0:2:0:0: rejecting I/O to offline device messages. Xen is 4.0.1, Linux kernel is 2.6.32-5-xen-amd64 (Debian 2.6.32-27). Andbody having ideas? Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mark Adams
2010-Nov-14 17:06 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi Stephen, are you doing any pci passthrough? I have the same issue, but with an Areca card, which seems to be caused by a possible irq conflict. I''ve not got to the bottom of it yet so not using the passthrough at the moment. The array doesn''t go offline when passthrough is not in use. Regards, Mark On 14 Nov 2010, at 15:46, Stephan Austermühle <au@hcsd.de> wrote:> Hi all, > > Am 07.11.2010 22:38, schrieb Pasi Kärkkäinen: > >> I fixed the problem by updating the driver to version 4.33. >> Maybe that helps? > > Now I know for sure that updating the driver to 4.33 didn''t help. > Somewhat still makes it going offline resulting in tons of > > sd 0:2:0:0: rejecting I/O to offline device > > messages. Xen is 4.0.1, Linux kernel is 2.6.32-5-xen-amd64 (Debian > 2.6.32-27). > > Andbody having ideas? > > Stephan > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Nov-14 17:26 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi Mark, Am 14.11.2010 18:06, schrieb Mark Adams:> Hi Stephen, are you doing any pci passthrough? I have the same issue, but with an Areca card, which seems to be caused by a possible irq conflict. I''ve not got to the bottom of it yet so not using the passthrough at the moment. The array doesn''t go offline when passthrough is not in use.Yes and no. ;-) I was using PCI pass-through for an ISDN card but I stopped using it due to the system hangs (removed the ''pci'' line from the domU config). At the moment no PCI pass-through is active and the RAID still goes offline from time to time. Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mark Adams
2010-Nov-14 19:29 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
On 14 Nov 2010, at 17:26, Stephan Austermühle <au@hcsd.de> wrote:> Hi Mark, > > Am 14.11.2010 18:06, schrieb Mark Adams: > >> Hi Stephen, are you doing any pci passthrough? I have the same issue, but with an Areca card, which seems to be caused by a possible irq conflict. I''ve not got to the bottom of it yet so not using the passthrough at the moment. The array doesn''t go offline when passthrough is not in use. > > Yes and no. ;-) > > I was using PCI pass-through for an ISDN card but I stopped using it due > to the system hangs (removed the ''pci'' line from the domU config). At > the moment no PCI pass-through is active and the RAID still goes offline > from time to time. > > Stephan >Hi, I was doing a very similar thing, except just NICs being passed to use a red-fone box for voip... Have you disabled it completely from your dom0 kernel line also? (so it doesn''t show anything anymore when you do xm pci-list-assignable-devices) I haven''t done this myself as I haven''t seen a hang since I shut down the vm''s that were using the passthrough, but I wonder if it might correct it. I think something is seriously wrong in the passthrough with this kernel/xen version.... Regards, Mark _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mark Adams
2010-Nov-15 17:58 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi Stephan, Do you have remote syslog setup? In the other thread where the similar issue I am having is discussed, konrad is asking for logs when this problem occurs. If you have this it might help get to the bottom of it. Regards, Mark On Sun, Nov 14, 2010 at 06:26:54PM +0100, Stephan Austermühle wrote:> Hi Mark, > > Am 14.11.2010 18:06, schrieb Mark Adams: > > > Hi Stephen, are you doing any pci passthrough? I have the same issue, but with an Areca card, which seems to be caused by a possible irq conflict. I''ve not got to the bottom of it yet so not using the passthrough at the moment. The array doesn''t go offline when passthrough is not in use. > > Yes and no. ;-) > > I was using PCI pass-through for an ISDN card but I stopped using it due > to the system hangs (removed the ''pci'' line from the domU config). At > the moment no PCI pass-through is active and the RAID still goes offline > from time to time. > > Stephan >> _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Nov-15 19:38 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi Mark! Am 15.11.2010 18:58, schrieb Mark Adams:> Do you have remote syslog setup? In the other thread where the similar > issue I am having is discussed, konrad is asking for logs when this > problem occurs. If you have this it might help get to the bottom of it.Yes, remote syslog was one of the first things I''ve setup. Unfortunately no errors were logged. The only thing I can find out from the logs is when the system freezed because of the missings timestamps. Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mark Adams
2010-Nov-16 10:42 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
On Mon, Nov 15, 2010 at 08:38:16PM +0100, Stephan Austermühle wrote:> Hi Mark! > > Am 15.11.2010 18:58, schrieb Mark Adams: > > > Do you have remote syslog setup? In the other thread where the similar > > issue I am having is discussed, konrad is asking for logs when this > > problem occurs. If you have this it might help get to the bottom of it. > > Yes, remote syslog was one of the first things I''ve setup. Unfortunately > no errors were logged. The only thing I can find out from the logs is > when the system freezed because of the missings timestamps. > > Stephan >Can you add some additional logging to your kernel command line as detailed by Pasi in the other thread running about this issue? -------- http://wiki.xen.org/xenwiki/XenHypervisorBootOptions So I think it''s "console_timestamps" -------- Although if I read your message above correctly you might already have this option on? _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Nov-18 08:39 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi Mark! Am 16.11.2010 11:42, schrieb Mark Adams:> Can you add some additional logging to your kernel command line as > detailed by Pasi in the other thread running about this issue? > > http://wiki.xen.org/xenwiki/XenHypervisorBootOptions > > So I think it''s "console_timestamps"I''ve added console_timestamps to the Xen command line and will reboot tonight. Let''s see what happens in the next days... Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2010-Nov-18 20:29 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
On Thu, Nov 18, 2010 at 09:39:19AM +0100, Stephan Austermühle wrote:> Hi Mark! > > Am 16.11.2010 11:42, schrieb Mark Adams: > > > Can you add some additional logging to your kernel command line as > > detailed by Pasi in the other thread running about this issue? > > > > http://wiki.xen.org/xenwiki/XenHypervisorBootOptions > > > > So I think it''s "console_timestamps" > > I''ve added console_timestamps to the Xen command line and will reboot > tonight. Let''s see what happens in the next days... >Make sure you have "loglvl=all guest_loglvl=all" on the xen.gz line. Also: do you have the latest firmware on the RAID card? -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Mark Adams
2010-Dec-08 13:51 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi Stephen, I''ve been testing alot on this as you might of seen on the other thread. Not sure it will help, but I managed to stop the crashing by using msitranslate=0 in the pci domU config, and then setting pci=nomsi in the grub kernel command line of the domU. Regards, Mark On Thu, Nov 18, 2010 at 09:39:19AM +0100, Stephan Austermühle wrote:> Hi Mark! > > Am 16.11.2010 11:42, schrieb Mark Adams: > > > Can you add some additional logging to your kernel command line as > > detailed by Pasi in the other thread running about this issue? > > > > http://wiki.xen.org/xenwiki/XenHypervisorBootOptions > > > > So I think it''s "console_timestamps" > > I''ve added console_timestamps to the Xen command line and will reboot > tonight. Let''s see what happens in the next days... > > Stephan >> _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Stephan Austermühle
2010-Dec-28 18:29 UTC
Re: [Xen-users] Xen 4.0.1 and megaraid_sas causes I/O to hang (xen-4 + megasas(DELL 6i RAID) = Reject I/O to offline device)
Hi,>> I''ve added console_timestamps to the Xen command line and will reboot >> tonight. Let''s see what happens in the next days... >> > Make sure you have "loglvl=all guest_loglvl=all" on the xen.gz line.Surprisingly the system has an uptime of 38 days meanwhile and no hangs or freezes since my last post. Regards, Stephan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users