Michael Kress
2006-Sep-24 23:29 UTC
[Xen-users] Pb with 3ware 9550SX-4LP / high IO activities
Hi, I''m having problems with a 3ware 9550SX-4LP during high IO activities: the kernel produces a dump and halts the machine. This only shows up using a xen kernel. To be more precise: When I activate the write cache on the controller and then produce high IO traffic in Xen0 there''s a kernel dump and the system halts completely. I even can''t go up in the console''s history (with pg-up). Sorry, but there''s also no trace in the syslog file I could give you, just the screen (see below). For producing "high IO traffic" it''s just enough to do a mkfs.ext3 /dev/VolGroup00/lvx01 on a logical volume of 100GB size. I haven''t got any guests runing yet. The mkfs.ext3 goes up to about 530 of 800 inode tables and the system halts. If I do a dd in another process, the system crashes even earlier. The dump which I can gather from the screen: [<d105Z395>] scsi_device_unbusy+0x45z0x80 [scsi_mod] [<d10534fa>] scsi_softirq_done+0xaa/0x120 [scsi_mod] [<c01d4a9b>] blk_done_softirq+0x9b/BxcB [<c012Sa73>] __do_softirq+0x93i8x130 [<c0125b95>] do_softirq+8x85/0xa0 [<c0106b7f>] do_IRQ+0x1fi8x30 [<c023dbb2>] evtchn_do_upcal1+0x92/0x110 [<c8185148>] hypervisor_callback+0x2ci8x34 [<c01e5594>] __copy_from_user_II+8x34/0x50 [<c0146c6b>] yeneric_file_buffered_write+0x22b/0x6c0 [<d1121814>] __ext3_journal_stop+8x24/0x50 [ext3] [<c014740d>] __yeneric_file_aio_write_nolock+0x30di8x580 [<c023dbb2>] evtchn_do_upcal1+0x92/0x110 [<c0152fe6>] zap_pte_ranye+8x286/0x3f0 [<c0147978>] yeneric_file_aio_write+0x88i8x120 [<d1116814>] ext3_file_write+0x44i8xc5 [ext3] [<c01693ba>] do_sync_write+0xcai8x130 [<c0153c57>] zeromap_pte_ranye+8x147/0x1f0 [<c0137170>] autoremove_wake_function+0x0/0x60 [<c021c8a4>] read zero+0x1d4i8x230 [<c01a04d9>] dnotify_parent+8x39/0xa0 [<c01695e6>] vfs_write+0x1c6/0x1d0 [<c01696c1>] sys_write+8xS L0x80 [<c0104f85>] suscall_ca11+0x7/Bxb Can you help me with that problem? I''d like to activate the cache without having these crashes as writing is much more fast! The controller setting "Queuing" doesn''t spoil the effect, i.e. caching: off, queuing: on, crash: no caching: off, queuing: off, crash: no caching: on, queuing: off or on, crash: yes I can use a bare CentOS 4.4 with all updates (currently with kernel 2.6.9) but without xen and the system works flawlessly, even with caching on. My setup: 3ware 9550SX-4LP + BBU 4x250GB Seagate ST3250820AS SATAII with NCQ Board: Supermicro X6DH8-G2+ 2 x Intel xeon 3.6GHz 4GB RAM Xen0: CentOS4.4 with Xen 3.0.2 and a self-compiled-kernel 2.6.16 (the one that "came" along with xen''s setup routine). Thank you Regards Michael -- Michael Kress, kress@hal.saar.de http://www.michael-kress.de / http://kress.net P E N G U I N S A R E C O O L _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jonas Björklund
2006-Sep-25 04:47 UTC
Re: [Xen-users] Pb with 3ware 9550SX-4LP / high IO activities
Hello, On Mon, 25 Sep 2006, Michael Kress wrote:> I''m having problems with a 3ware 9550SX-4LP during high IO activities: > the kernel produces a dump and halts the machine. This only shows up > using a xen kernel.I have the same controller without any problems. xenhost ~ # /usr/local/3ware/tw_cli info Ctl Model Ports Drives Units NotOpt RRate VRate BBU ------------------------------------------------------------------------ c0 9550SX-4LP 4 4 1 0 4 4 - xenhost ~ # /usr/local/3ware/tw_cli info c0 Unit UnitType Status %Cmpl Stripe Size(GB) Cache AVerify IgnECC ------------------------------------------------------------------------------ u0 RAID-10 OK - 64K 130.365 ON OFF OFF Port Status Unit Size Blocks Serial --------------------------------------------------------------- p0 OK u0 69.25 GB 145226112 WD-WMAKE22159 p1 OK u0 69.25 GB 145226112 WD-WMAKE22152 p2 OK u0 69.25 GB 145226112 WD-WMAKE22157 p3 OK u0 69.25 GB 145226112 WD-WMAKE22144 xenhost ~ # uname -a Linux xenhost 2.6.16.28-xen #3 SMP Thu Sep 21 00:02:54 CEST 2006 i686 AMD Athlon(tm) Processor GNU/Linux xenhost ~ # cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 8 model name : AMD Athlon(tm) Processor stepping : 1 cpu MHz : 2133.404 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow ts bogomips : 4268.44 processor : 1 vendor_id : AuthenticAMD cpu family : 6 model : 8 model name : AMD Athlon(tm) Processor stepping : 1 cpu MHz : 2133.404 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 mmx fxsr sse syscall mp mmxext 3dnowext 3dnow ts bogomips : 4268.44 xenhost ~ # _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Michael Kress
2006-Sep-25 23:45 UTC
Re: [Xen-users] Pb with 3ware 9550SX-4LP / high IO activities
Hello, I tried the "in-kernel" driver, i.e. I modified the config to include 3w-9xxx in the kernel 2.6.16, not as a module. I must admit that the mkfs went through this time, but as soon as I do some additional ''dd'' or ''cp'' it doesn''t take me longer than 10 sec to receive this crash (sorry, no more information than the screen, because the machine crashes and because it doesn''t write logs for obvious reasons). Have you got some idea? Thanks - Michael The dump I received: [<c9142ela>] __do_IRQ+BxBa/9X119 [<c916bb29>] al1oc_page_buffers+9x69/9xc9 [<c9196b7a>] do_IRQ+9xla/9x39 [<c923d992>] evtchn_do_upcall+9x92/9x119 [<c9195148>] hypervisor_callback+9x2c/9x34 [<c923d8fa>] force_evtchn_callback+9xa/9x19 [<c01446cf>] add_to_page_cache+9xef/9x199 [<c9146bad>] generic_file_buffered_write+9x16d/9x6c9 [<c92c9191>] twa_post_command_packet+8x41/9x178 [<c9186d59>] file_update_time+9x59/9xd9 [<c914749d>] __generic_file_aio_write_nolock+9x39d/9x589 [<c91d279e>] b1k_run_queue+9x4e/9x79 [<c91e131f>] kobject_put+9xlf/9x39 [<c91e12f9>] kobject_release+9x9/9x19 [<c02862e5>] scsi_end_request+9xe5/9x119 [<c91476d9>] generic_file_aio_write_nolock+9x59/9xd9 [<c91478c3>] generic_f1le_write_nolock+9xa3/9xd9 [<c9137179>] autoremove_wake_function+9x9/9x69 [<c9198e61>] monotonic_clock+9x51/9xa9 [<c93742b4>] schedule+9x3f4/9x779 [<c9172f18>] blkdev file_write+9x38/9x49 [<c01695e6>] vfs_write+9x1c6/9x1d9 [<c91696c1>] sys_write+9x51/9x89 [<c0104f85>] syscall_cal1+9x7/9xb Michael Kress wrote:> Hi, > > I''m having problems with a 3ware 9550SX-4LP during high IO activities: > the kernel produces a dump and halts the machine. This only shows up > using a xen kernel. > > To be more precise: > When I activate the write cache on the controller and then produce high > IO traffic in Xen0 there''s a kernel dump and the system halts > completely. I even can''t go up in the console''s history (with pg-up). > Sorry, but there''s also no trace in the syslog file I could give you, > just the screen (see below). > For producing "high IO traffic" it''s just enough to do a > mkfs.ext3 /dev/VolGroup00/lvx01 > on a logical volume of 100GB size. > I haven''t got any guests runing yet. > The mkfs.ext3 goes up to about 530 of 800 inode tables and the system > halts. If I do a dd in another process, the system crashes even earlier. > > The dump which I can gather from the screen: > [<d105Z395>] scsi_device_unbusy+0x45z0x80 [scsi_mod] > [<d10534fa>] scsi_softirq_done+0xaa/0x120 [scsi_mod] > [<c01d4a9b>] blk_done_softirq+0x9b/BxcB > [<c012Sa73>] __do_softirq+0x93i8x130 > [<c0125b95>] do_softirq+8x85/0xa0 > [<c0106b7f>] do_IRQ+0x1fi8x30 > [<c023dbb2>] evtchn_do_upcal1+0x92/0x110 > [<c8185148>] hypervisor_callback+0x2ci8x34 > [<c01e5594>] __copy_from_user_II+8x34/0x50 > [<c0146c6b>] yeneric_file_buffered_write+0x22b/0x6c0 > [<d1121814>] __ext3_journal_stop+8x24/0x50 [ext3] > [<c014740d>] __yeneric_file_aio_write_nolock+0x30di8x580 > [<c023dbb2>] evtchn_do_upcal1+0x92/0x110 > [<c0152fe6>] zap_pte_ranye+8x286/0x3f0 > [<c0147978>] yeneric_file_aio_write+0x88i8x120 > [<d1116814>] ext3_file_write+0x44i8xc5 [ext3] > [<c01693ba>] do_sync_write+0xcai8x130 > [<c0153c57>] zeromap_pte_ranye+8x147/0x1f0 > [<c0137170>] autoremove_wake_function+0x0/0x60 > [<c021c8a4>] read zero+0x1d4i8x230 > [<c01a04d9>] dnotify_parent+8x39/0xa0 > [<c01695e6>] vfs_write+0x1c6/0x1d0 > [<c01696c1>] sys_write+8xS L0x80 > [<c0104f85>] suscall_ca11+0x7/Bxb > > > Can you help me with that problem? I''d like to activate the cache > without having these crashes as writing is much more fast! > The controller setting "Queuing" doesn''t spoil the effect, i.e. > caching: off, queuing: on, crash: no > caching: off, queuing: off, crash: no > caching: on, queuing: off or on, crash: yes > > I can use a bare CentOS 4.4 with all updates (currently with kernel > 2.6.9) but without xen and the system works flawlessly, even with > caching on. > > My setup: > 3ware 9550SX-4LP + BBU > 4x250GB Seagate ST3250820AS SATAII with NCQ > Board: Supermicro X6DH8-G2+ > 2 x Intel xeon 3.6GHz > 4GB RAM > Xen0: CentOS4.4 with Xen 3.0.2 and a self-compiled-kernel 2.6.16 (the > one that "came" along with xen''s setup routine). > > Thank you > Regards > > Michael > >-- Michael Kress, kress@hal.saar.de http://www.michael-kress.de / http://kress.net P E N G U I N S A R E C O O L _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jonas Björklund
2006-Sep-26 07:43 UTC
Re: [Xen-users] Pb with 3ware 9550SX-4LP / high IO activities
Hello, On Tue, 26 Sep 2006, Michael Kress wrote:> I tried the "in-kernel" driver, i.e. I modified the config to include > 3w-9xxx in the kernel 2.6.16, not as a module. I must admit that the > mkfs went through this time, but as soon as I do some additional ''dd'' or > ''cp'' it doesn''t take me longer than 10 sec to receive this crash (sorry, > no more information than the screen, because the machine crashes and > because it doesn''t write logs for obvious reasons). > Have you got some idea?Do you have a 64-bit PCI or a 32-bit PCI? _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Michael Kress
2006-Sep-27 15:26 UTC
Re: [Xen-users] Pb with 3ware 9550SX-4LP / high IO activities
Jonas Björklund schrieb:> Hello, > > On Tue, 26 Sep 2006, Michael Kress wrote: > >> I tried the "in-kernel" driver, i.e. I modified the config to include >> 3w-9xxx in the kernel 2.6.16, not as a module. I must admit that the >> mkfs went through this time, but as soon as I do some additional ''dd'' or >> ''cp'' it doesn''t take me longer than 10 sec to receive this crash (sorry, >> no more information than the screen, because the machine crashes and >> because it doesn''t write logs for obvious reasons). >> Have you got some idea? > > Do you have a 64-bit PCI or a 32-bit PCI? >Hi, it''s a 64-bit 133MHz PCI-X. see http://www.supermicro.com/products/motherboard/Xeon800/E7520/X6DH8-G2+.cfm There must be something different about the way the kernel and its components are composed and unfortunately I don''t have the knowledge to find it. Under the kernel that came with CentOS (2.6.9) the controller works perfectly, it''s only the 2.6.16 that comes with xen that produces trouble. Is there any more debug options I could activate to provide more details? I don''t want to move away from xen although I''ve already tried (the) openvz (kernel), which works perfectly during high io load, but xen seems more sympathic to me. I hope this messy technical detail doesn''t force me to change to a different product. Thanks for any more hints! ciao - Michael -- Michael Kress, kress@hal.saar.de http://www.michael-kress.de / http://kress.net P E N G U I N S A R E C O O L _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Michael Kress
2006-Sep-27 23:09 UTC
Re: [Xen-users] Pb with 3ware 9550SX-4LP / high IO activities
Michael Kress wrote:> Thanks for any more hints! >Hi, there''s another thread which I accidentally created in the centos mailing list. It brings out some good and interesting ideas, but unfortunately not the solution. see http://lists.centos.org/pipermail/centos/2006-September/070714.html Regards - Michael -- Michael Kress, kress@hal.saar.de http://www.michael-kress.de / http://kress.net P E N G U I N S A R E C O O L _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Michael Kress
2006-Sep-28 05:34 UTC
Re: [Xen-users] Pb with 3ware 9550SX-4LP / high IO activities
Michael Kress wrote:> there''s another thread which I accidentally created in the centos > mailing list. It brings out some good and interesting ideas, but > unfortunately not the solution. > see http://lists.centos.org/pipermail/centos/2006-September/070714.html >Hi! My issue''s solved - see http://lists.centos.org/pipermail/centos/2006-September/070758.html I got one consideration : include Supermicro X6DH8-G2+ in the docs for ''noirqbalance'' Thanks - Michael -- Michael Kress, kress@hal.saar.de http://www.michael-kress.de / http://kress.net P E N G U I N S A R E C O O L _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users