I''ve just built some shiny xen 2.0.5 kernel packages for debian using the stuff in ''experimental'', and cannot seem to create a new domain. The whole machine just reboots. I''ve caught the first line of a kernel oops but haven''t got physical access to the machine at the moment. The console of the domain looks like this: Adding 262136k swap on /dev/hda2. Priority:-1 extents:1 EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on hda1, internal journal hwclock is unable to get I/O port access: the iopl(3) call failed. System time was Tue Apr 5 15:31:42 UTC 2005. Setting the System Clock using the Hardware Clock as reference... hwclock is unable to get I/O port access: the iopl(3) call failed. SysteSegmentation fault Any ideas? It''s an SMP machine if that makes any difference. The exact same setup was working a few stable versions ago. It''s just decided not to reboot anymore so I''ll have to get someone in the office to reboot it for me. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Okay it wasn''t hung afterall, it just took a while to reboot. I''ve managed to get an oops dump from the console: Unable to handle kernel paging request at virtual address c7e70000 printing eip: c88eadbb *pde = ma 0141d067 pa 0001d067 *pte = ma 00000000 pa 55555000 [pg0+140185564/1003249664] journal_commit_transaction+0xc3c/0xf80 [jbd] [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 [find_get_page+39/80] find_get_page+0x27/0x50 [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 [pg0+140194829/1003249664] kjournald+0xcd/0x1f0 [jbd] [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 [ret_from_fork+6/28] ret_from_fork+0x6/0x1c [pg0+140194592/1003249664] commit_timeout+0x0/0x10 [jbd] [pg0+140194624/1003249664] kjournald+0x0/0x1f0 [jbd] [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10 Oops: 0002 [#1] Modules linked in: nfsd exportfs lockd sunrpc tlan 8021q loop ext3 jbd mbcache crc32c libcrc32c iscsi_sfnet scsi_transport_iscsi dm_mod sd_mod scsi_mod e1000 eepro100 CPU: 0 EIP: 0061:[pg0+140197307/1003249664] Not tainted VLI EFLAGS: 00011206 (2.6.10-xen0) EIP is at journal_get_descriptor_buffer+0x6b/0xb0 [jbd] eax: 00000000 ebx: c757eb3c ecx: 00000400 edx: 00001000 esi: 00000000 edi: c7e70000 ebp: c79bbec0 esp: c797bdc0 ds: 007b es: 007b ss: 0069 Process kjournald (pid: 856, threadinfo=c797a000 task=c1288a60) Stack: c05b81c0 00000624 00001000 00000624 c6eae92c c61ce920 c72baf8c 00000000 <I stopped cleaning it up at this point> Apr 6 01:41:52 xen1 kernel: c88e7fdc c79bbec0 c61ce920 00000008 00000622 c10f6a60 c797a000 c797a000 Apr 6 01:41:52 xen1 kernel: 00000000 00000000 00000000 00000000 00000000 c6eaec8c 00000622 00000000 Apr 6 01:41:52 xen1 kernel: Call Trace: Apr 6 01:41:52 xen1 kernel: [pg0+140185564/1003249664] journal_commit_transaction+0xc3c/0xf80 [jbd] Apr 6 01:41:52 xen1 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Apr 6 01:41:52 xen1 kernel: [find_get_page+39/80] find_get_page+0x27/0x50 Apr 6 01:41:52 xen1 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Apr 6 01:41:52 xen1 kernel: [pg0+140194829/1003249664] kjournald+0xcd/0x1f0 [jbd] Apr 6 01:41:52 xen1 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Apr 6 01:41:52 xen1 kernel: [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 Apr 6 01:41:52 xen1 kernel: [ret_from_fork+6/28] ret_from_fork+0x6/0x1c Apr 6 01:41:52 xen1 kernel: [pg0+140194592/1003249664] commit_timeout+0x0/0x10 [jbd] Apr 6 01:41:52 xen1 kernel: [pg0+140194624/1003249664] kjournald+0x0/0x1f0 [jbd] Apr 6 01:41:52 xen1 kernel: [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10 Apr 6 01:41:52 xen1 kernel: Code: 04 8b 85 88 00 00 00 89 04 24 e8 11 8e 86 f7 89 c3 0f ba 28 02 19 c0 85 c0 75 46 8b 95 8c 00 00 00 89 f0 8b 7b 18 89 d1 c1 e9 02 <f3> ab f6 c2 02 74 02 66 ab f6 c2 01 74 01 aa 0f ba 2b 00 89 d8 ipt_limit> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of James Harper > Sent: Tuesday, 5 April 2005 14:38 > To: xen-devel@lists.xensource.com > Subject: [Xen-devel] crash on starting new domain > > I''ve just built some shiny xen 2.0.5 kernel packages for debian using > the stuff in ''experimental'', and cannot seem to create a new domain.The> whole machine just reboots. I''ve caught the first line of a kerneloops> but haven''t got physical access to the machine at the moment. > > The console of the domain looks like this: > > Adding 262136k swap on /dev/hda2. Priority:-1 extents:1 > EXT3-fs warning: mounting fs with errors, running e2fsck isrecommended> EXT3 FS on hda1, internal journal > hwclock is unable to get I/O port access: the iopl(3) call failed. > System time was Tue Apr 5 15:31:42 UTC 2005. > Setting the System Clock using the Hardware Clock as reference... > hwclock is unable to get I/O port access: the iopl(3) call failed. > SysteSegmentation fault > > Any ideas? It''s an SMP machine if that makes any difference. The exact > same setup was working a few stable versions ago. It''s just decidednot> to reboot anymore so I''ll have to get someone in the office to rebootit> for me. > > James > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Can you reproduce this with a 2.0-testing kernel? Are you sure your iscsi modules were actually built against this kernel version? Ian> Okay it wasn''t hung afterall, it just took a while to reboot. > I''ve managed to get an oops dump from the console: > > Unable to handle kernel paging request at virtual address > c7e70000 printing eip: > c88eadbb > *pde = ma 0141d067 pa 0001d067 > *pte = ma 00000000 pa 55555000 > [pg0+140185564/1003249664] > journal_commit_transaction+0xc3c/0xf80 [jbd] > [autoremove_wake_function+0/96] > autoremove_wake_function+0x0/0x60 [find_get_page+39/80] > find_get_page+0x27/0x50 [autoremove_wake_function+0/96] > autoremove_wake_function+0x0/0x60 [pg0+140194829/1003249664] > kjournald+0xcd/0x1f0 [jbd] [autoremove_wake_function+0/96] > autoremove_wake_function+0x0/0x60 > [autoremove_wake_function+0/96] > autoremove_wake_function+0x0/0x60 [ret_from_fork+6/28] > ret_from_fork+0x6/0x1c [pg0+140194592/1003249664] > commit_timeout+0x0/0x10 [jbd] [pg0+140194624/1003249664] > kjournald+0x0/0x1f0 [jbd] [kernel_thread_helper+5/16] > kernel_thread_helper+0x5/0x10 > Oops: 0002 [#1] > Modules linked in: nfsd exportfs lockd sunrpc tlan 8021q loop > ext3 jbd mbcache crc32c libcrc32c iscsi_sfnet > scsi_transport_iscsi dm_mod sd_mod scsi_mod e1000 eepro100 > CPU: 0 > EIP: 0061:[pg0+140197307/1003249664] Not tainted VLI > EFLAGS: 00011206 (2.6.10-xen0) > EIP is at journal_get_descriptor_buffer+0x6b/0xb0 [jbd] > eax: 00000000 ebx: c757eb3c ecx: 00000400 edx: 00001000 > esi: 00000000 edi: c7e70000 ebp: c79bbec0 esp: c797bdc0 > ds: 007b es: 007b ss: 0069 > Process kjournald (pid: 856, threadinfo=c797a000 task=c1288a60) > Stack: c05b81c0 00000624 00001000 00000624 c6eae92c c61ce920 > c72baf8c 00000000 <I stopped cleaning it up at this point> > Apr 6 01:41:52 xen1 > kernel: c88e7fdc c79bbec0 c61ce920 00000008 00000622 c10f6a60 > c797a000 c797a000 > > Apr 6 01:41:52 xen1 kernel: 00000000 00000000 > 00000000 00000000 > 00000000 c6eaec8c 00000622 00000000 > Apr 6 01:41:52 xen1 kernel: Call Trace: > > Apr 6 > 01:41:52 xen1 kernel: [pg0+140185564/1003249664] > journal_commit_transaction+0xc3c/0xf80 [jbd] > Apr 6 01:41:52 xen1 kernel: > [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 > Apr 6 01:41:52 xen1 kernel: > [find_get_page+39/80] find_get_page+0x27/0x50 > Apr 6 01:41:52 xen1 kernel: > [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 > Apr 6 01:41:52 xen1 kernel: > [pg0+140194829/1003249664] kjournald+0xcd/0x1f0 [jbd] > Apr 6 01:41:52 xen1 kernel: > [autoremove_wake_function+0/96] autoremove_wake_function+0x0/0x60 > Apr 6 01:41:52 xen1 > kernel: [autoremove_wake_function+0/96] > autoremove_wake_function+0x0/0x60 > Apr 6 > 01:41:52 xen1 kernel: [ret_from_fork+6/28] ret_from_fork+0x6/0x1c > Apr > 6 01:41:52 > xen1 kernel: [pg0+140194592/1003249664] commit_timeout+0x0/0x10 [jbd] > Apr 6 > 01:41:52 xen1 kernel: [pg0+140194624/1003249664] > kjournald+0x0/0x1f0 [jbd] > > Apr 6 > 01:41:52 xen1 kernel: [kernel_thread_helper+5/16] > kernel_thread_helper+0x5/0x10 > > Apr 6 01:41:52 xen1 kernel: Code: 04 8b 85 88 00 00 00 89 04 > 24 e8 11 8e 86 f7 89 c3 0f ba 28 02 19 c0 85 c0 75 46 8b 95 > 8c 00 00 00 89 f0 8b 7b 18 89 d1 c1 e9 02 <f3> ab f6 c2 02 74 > 02 66 ab f6 c2 01 74 01 aa 0f ba 2b 00 89 d8 > > ipt_limit > > > -----Original Message----- > > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > > bounces@lists.xensource.com] On Behalf Of James Harper > > Sent: Tuesday, 5 April 2005 14:38 > > To: xen-devel@lists.xensource.com > > Subject: [Xen-devel] crash on starting new domain > > > > I''ve just built some shiny xen 2.0.5 kernel packages for > debian using > > the stuff in ''experimental'', and cannot seem to create a new domain. > The > > whole machine just reboots. I''ve caught the first line of a kernel > oops > > but haven''t got physical access to the machine at the moment. > > > > The console of the domain looks like this: > > > > Adding 262136k swap on /dev/hda2. Priority:-1 extents:1 EXT3-fs > > warning: mounting fs with errors, running e2fsck is > recommended > > EXT3 FS on hda1, internal journal > > hwclock is unable to get I/O port access: the iopl(3) call failed. > > System time was Tue Apr 5 15:31:42 UTC 2005. > > Setting the System Clock using the Hardware Clock as reference... > > hwclock is unable to get I/O port access: the iopl(3) call failed. > > SysteSegmentation fault > > > > Any ideas? It''s an SMP machine if that makes any > difference. The exact > > same setup was working a few stable versions ago. It''s just decided > not > > to reboot anymore so I''ll have to get someone in the office > to reboot > it > > for me. > > > > James > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Can you reproduce this with a 2.0-testing kernel?I''m building with the debian packages these days, so not in the really short term.> Are you sure your iscsi modules were actually built against > this kernel version?Hmmm... they were built while running 2.6.9, but for 2.6.10. The iscsi Makefile uses a lot of calls to uname but I''m pretty sure I got all the places where it is used. It runs fine up until the point where the new domain starts. Could there be some interaction between xen vbd support and iscsi? It''s always worked in the past but I''ve jumped forward a few versions of everything. Maybe I''ll try not using disk in xenu and make a barebones initrd.. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Hmmm... they were built while running 2.6.9, but for 2.6.10. > The iscsi Makefile uses a lot of calls to uname but I''m > pretty sure I got all the places where it is used. It runs > fine up until the point where the new domain starts.Are you sure they were built with ARCH=xen ?> Could there be some interaction between xen vbd support and > iscsi? It''s always worked in the past but I''ve jumped forward > a few versions of everything. Maybe I''ll try not using disk > in xenu and make a barebones initrd..There''s been no significant changes to the vbd code in 2.0.5. 2.0-testing (proto 2.0.6) has some new stuff that I doubt has been tested on iSCSI, but is probably OK. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Ian Pratt [mailto:m+Ian.Pratt@cl.cam.ac.uk] > Sent: Tuesday, 5 April 2005 16:58 > To: James Harper; xen-devel@lists.xensource.com > Cc: ian.pratt@cl.cam.ac.uk; ian.pratt@cl.cam.ac.uk > Subject: RE: [Xen-devel] crash on starting new domain > > > > Hmmm... they were built while running 2.6.9, but for 2.6.10. > > The iscsi Makefile uses a lot of calls to uname but I''m > > pretty sure I got all the places where it is used. It runs > > fine up until the point where the new domain starts. > > Are you sure they were built with ARCH=xen ?Ah... possibly not. That will only matter if they use privileged instructions won''t it? I''ll recompile just to be sure and try again.> > > Could there be some interaction between xen vbd support and > > iscsi? It''s always worked in the past but I''ve jumped forward > > a few versions of everything. Maybe I''ll try not using disk > > in xenu and make a barebones initrd.. > > There''s been no significant changes to the vbd code in 2.0.5. > 2.0-testing (proto 2.0.6) has some new stuff that I doubt has been > tested on iSCSI, but is probably OK.I just ran it again and got a slightly different oops, but again just when xenu is starting up its filesystem: TCP: Hash tables configured (established 8192 bind 16384) NET: Registered protocol family 1 NET: Registered protocol family 17 EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. Segmentation fault xen1:~# Apr 5 17:08:56 xen1 kernel: br1: port 2(vif3.0) entering learning state Apr 5 17:08:56 xen1 kernel: Unable to handle kernel paging request at virtual address c7c78000 Apr 5 17:08:56 xen1 kernel: printing eip: Apr 5 17:08:56 xen1 kernel: c01423bf Apr 5 17:08:56 xen1 kernel: *pde = ma 0141d067 pa 0001d067 Apr 5 17:08:56 xen1 kernel: *pte = ma 00000000 pa 55555000 Apr 5 17:08:56 xen1 kernel: [handle_mm_fault+448/480] handle_mm_fault+0x1c0/0x1e0 Apr 5 17:08:56 xen1 kernel: [do_page_fault+412/1683] do_page_fault+0x19c/0x693 Apr 5 17:08:56 xen1 kernel: [tty_write+527/624] tty_write+0x20f/0x270 Apr 5 17:08:56 xen1 kernel: [write_chan+0/544] write_chan+0x0/0x220 Apr 5 17:08:56 xen1 kernel: [sys_recv+51/64] sys_recv+0x33/0x40 Apr 5 17:08:56 xen1 kernel: [sys_socketcall+356/608] sys_socketcall+0x164/0x260 Apr 5 17:08:56 xen1 kernel: [sys_write+81/128] sys_write+0x51/0x80 Apr 5 17:08:56 xen1 kernel: [page_fault+59/64] page_fault+0x3b/0x40 Apr 5 17:08:56 xen1 kernel: Oops: 0002 [#1] Apr 5 17:08:56 xen1 kernel: Modules linked in: nfsd exportfs lockd sunrpc tlan 8021q loop ext3 jbd mbcache crc32c libcrc32c iscsi_sfnet scsi_transport_iscsi dm_mod sd_mod scsi_mod e1000 eepro100 Apr 5 17:08:56 xen1 kernel: CPU: 0 Apr 5 17:08:56 xen1 kernel: EIP: 0061:[do_wp_page+207/1024] Not tainted VLI Apr 5 17:08:56 xen1 kernel: EFLAGS: 00011287 (2.6.10-xen0) Apr 5 17:08:56 xen1 kernel: EIP is at do_wp_page+0xcf/0x400 Apr 5 17:08:56 xen1 kernel: eax: c1002020 ebx: c10f8f00 ecx: 00000400 edx: c1000000 Apr 5 17:08:56 xen1 kernel: esi: c0c66000 edi: c7c78000 ebp: c1018cc0 esp: c28ffe94 Apr 5 17:08:56 xen1 kernel: ds: 0069 es: 0069 ss: 0069 Apr 5 17:08:56 xen1 kernel: Process python (pid: 3896, threadinfo=c28fe000 task=c4f8f5c0) Apr 5 17:08:56 xen1 kernel: Stack: c4abf5b8 c5fa6840 c28fff34 00000400 00000000 c10f8f00 c4abf5b8 c63870e0 Apr 5 17:08:56 xen1 kernel: 403f3000 00000001 c0143580 c63870e0 c4abf5b8 403f3000 c07bffcc c2648400 Apr 5 17:08:56 xen1 kernel: 02066065 00000000 c63870e0 c638710c 00000007 c4abf5b8 c011340c c63870e0 Apr 5 17:08:56 xen1 kernel: Call Trace: Apr 5 17:08:56 xen1 kernel: [handle_mm_fault+448/480] handle_mm_fault+0x1c0/0x1e0 Apr 5 17:08:56 xen1 kernel: [do_page_fault+412/1683] do_page_fault+0x19c/0x693 Apr 5 17:08:56 xen1 kernel: [tty_write+527/624] tty_write+0x20f/0x270 Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 5 Apr 2005, James Harper wrote:> > Can you reproduce this with a 2.0-testing kernel? > > I''m building with the debian packages these days, so not in the really > short term.I have 2.0-testing debs about 80% done. Got busy with work, so haven''t finished them. The biggest issue with producing these debs is the state of kernel source in debian. With our releasing pending, no one wants to upload newer source, so when xen requires a newer base kernel to patch against, it makes my job that much more difficult. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > I have 2.0-testing debs about 80% done. Got busy with work, sohaven''t> finished them. > > The biggest issue with producing these debs is the state of kernelsource> in debian. With our releasing pending, no one wants to upload newer > source, so when xen requires a newer base kernel to patch against, it > makes my job that much more difficult. >Do you have any sort of an ETA? At the moment I''m torn between whether the problem is linux-iscsi, xen, or the actual kernel... If it won''t be ready in the next day or so I''ll just build them the old fashioned way, which is a pain - your packaging makes the whole thing a breeze!!! Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hmmm... this paragraph from the linux-iscsi README file is a bit of a giveaway: " - The linux-2.6.10 has a bug in the SCSI middle layer. This bug can cause an oops on the system. The details of reproducing the bug and the fix can be found at Sourceforge: [ 1115345 ] Segmentation fault on stopping the driver http://sourceforge.net/tracker/index.php?func=detail&aid=1115345&group_i d=26396&atid=387023 This bug is fixed in linux-2.6.11. " I''ll apply the patch and see what happens... James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
It seemed to run a bit longer, but still crashes. Dammit.> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of James Harper > Sent: Wednesday, 6 April 2005 11:54 > To: Ian Pratt; xen-devel@lists.xensource.com > Subject: RE: [Xen-devel] crash on starting new domain > > Hmmm... this paragraph from the linux-iscsi README file is a bit of a > giveaway: > > " > - The linux-2.6.10 has a bug in the SCSI middle layer. This bug can > cause an oops on the system. The details of reproducing the bugand> the fix can be found at Sourceforge: > [ 1115345 ] Segmentation fault on stopping the driver > >http://sourceforge.net/tracker/index.php?func=detail&aid=1115345&group_i> d=26396&atid=387023 > > This bug is fixed in linux-2.6.11. > " > > I''ll apply the patch and see what happens... > > James > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 6 Apr 2005, James Harper wrote:> > > > I have 2.0-testing debs about 80% done. Got busy with work, so > haven''t > > finished them. > > > > The biggest issue with producing these debs is the state of kernel > source > > in debian. With our releasing pending, no one wants to upload newer > > source, so when xen requires a newer base kernel to patch against, it > > makes my job that much more difficult. > > > > Do you have any sort of an ETA? At the moment I''m torn between whether > the problem is linux-iscsi, xen, or the actual kernel... > > If it won''t be ready in the next day or so I''ll just build them the old > fashioned way, which is a pain - your packaging makes the whole thing a > breeze!!!Definately won''t be ready in the next few days. Way overloaded. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Can you reproduce this with a 2.0-testing kernel? Are you sure your > iscsi modules were actually built against this kernel version? >I have tried it against 2.0-testing and it still fails in the same way. The system appears to perform perfectly until the moment a domain starts accessing disk, then it all falls to bits. Any ideas? Thanks james _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel