Hello, I have been using Xen now for several months and have been quite successful until attempting to utilize iscsi within my domU''s. My research has turned up many posts of people using iscsi with no problems, and I have found no others with the problem I seem to be experiencing. If this has been covered on this list in the past, please accept my humble apologies. Configuration: Currently, I have two Xen hypervisors. Both are AMD64 based machines, one running Ubuntu LTS, the other running Ubuntu Edgy. Both machines are running Xen compiled from source. A third machine is acting as an ISCSI target using IET. IET is presenting a 10GB LUN to the domU''s. I have two separate domU''s. One with core-iscsi, another with open-iscsi. Problem: Both initiators exhibit a common symptom. When I am any running I/O over the channel the domU will stop responding and present an error on the console: BUG: soft lockup detected on CPU#0! This doesn''t seem to have any predictable pattern. It may occur when running a mkfs, during a mount, or after writing data (dd id=/dev/zero of=/mnt/zero). I have had lockups after a couple of megs, and at times have successfully written 500M. I am hoping its one of two things - a very poor configuration on my part, or possibly something I need to tweak in the xen kernel. Below are my configurations, any help is appreciated. Many thanks, Ryan (note: most of my configuration is based on the iscsi-xen howto at http://www.performancemagic.com/iscsi-xen-howto/iSCSI.html) IET - ietd.conf: Target iqn.iscsi-target.bogus.com:storage.kolab_ocfs Lun 5 Path=/dev/mapper/vg00-kolab_ocfs2lv,Type=fileio MaxConnections=2 core-iscsi - initiator: CHANNEL="1 1 eth0 192.168.1.10 3260 0" open-iscsi - /etc/iscsi/iscsid.conf: node.active_cnx = 1 node.startup = manual node.session.auth.username = jim node.session.auth.password = othersecret node.session.timeo.replacement_timeout = 120 node.session.err_timeo.abort_timeout = 10 node.session.err_timeo.reset_timeout = 30 node.session.iscsi.InitialR2T = No node.session.iscsi.ImmediateData = Yes node.session.iscsi.FirstBurstLength = 262144 node.session.iscsi.MaxBurstLength = 16776192 node.session.iscsi.DefaultTime2Wait = 0 node.session.iscsi.DefaultTime2Retain = 0 node.session.iscsi.MaxConnections = 1 node.conn[0].iscsi.HeaderDigest = None node.conn[0].iscsi.DataDigest = None node.conn[0].iscsi.MaxRecvDataSegmentLength = 65536 discovery.sendtargets.auth.username = joe discovery.sendtargets.auth.password = secret _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jerry Amundson
2007-Feb-21 20:53 UTC
Re: [Xen-users] Xen domU''s and iscsi initaitor problems.
On 2/21/07, Ryan Kish <rpkish@gmail.com> wrote:> Hello, I have been using Xen now for several months and have been > quite successful until attempting to utilize iscsi within my domU''s.Not surprising.... Didn''t most of the iscsi usage you found involve dom0?> My research has turned up many posts of people using iscsi with no > problems, and I have found no others with the problem I seem to be > experiencing. If this has been covered on this list in the past, > please accept my humble apologies.Accepted. :-)> (note: most of my configuration is based on the iscsi-xen howto at > http://www.performancemagic.com/iscsi-xen-howto/iSCSI.html)Right - the dom0''s are iscsi initiators... Theorizing here, but it''s possible that the CPU and network overhead in dom0 is not conducive to putting iscsi, which itself is CPU and network overhead, within the domU. jerry -- "Pay no attention to that man behind the curtain!" _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> Right - the dom0''s are iscsi initiators... > Theorizing here, but it''s possible that the CPU and network overhead > in dom0 is not conducive to putting iscsi, which itself is CPU and > network overhead, within the domU.I could not see the forest for the trees. Things work much better on dom0 :) Many thanks Jerry. Cheers, Ryan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Wed, 2007-02-21 at 10:49 -0700, Ryan Kish wrote:> Hello, I have been using Xen now for several months and have been > quite successful until attempting to utilize iscsi within my domU''s.This is a double edged sword. If the world is perfect and your targets are always responsive and nothing ever breaks this can work well. If your targets for some reason remain unresponsive, this can be just like a failed nfs hard mount and become quite a show stopper. If, of course the root file system of the guest is on an iscsi nas, the show should stop. If you are just using iscsi to port in a shared cluster FS not needed to boot, this is very annoying. This behavior can be tweaked, but you really need to examine your iscsi init scripts and do a bit of initrd tinkering to make it a well oiled machine. For this reason, many people just have dom-0 login to the targets and export the block devices as usual, doing all sanity checking on dom-0.> My research has turned up many posts of people using iscsi with no > problems, and I have found no others with the problem I seem to be > experiencing. If this has been covered on this list in the past, > please accept my humble apologies.Keep in mind people often post "HEY IT WORKS! Yeah no problems!" , but upon their first loss of power and cold boot they may feel differently. You are reading dated information and their views may not be the same.> Configuration: > Currently, I have two Xen hypervisors. Both are AMD64 based machines, > one running Ubuntu LTS, the other running Ubuntu Edgy. Both machines > are running Xen compiled from source. A third machine is acting as an > ISCSI target using IET. IET is presenting a 10GB LUN to the domU''s. I > have two separate domU''s. One with core-iscsi, another with > open-iscsi. > > Problem: > Both initiators exhibit a common symptom. When I am any running I/O > over the channel the domU will stop responding and present an error on > the console: > > BUG: soft lockup detected on CPU#0!iSCSI is very fickle. You have two things going against you when using initiators in a guest. Both open and core are amazing accomplishments but may not be your best option. #1 - you are attempting to do it over a shared ported ethernet device over a bridge. iSCSI uses tcp/ip therefore there is QUITE a bit of packet overhead to weed through before you can even get to the "meat" of the requested disk operation. #2 - The forementioned problem of what happens if your initiator thinks the target timed out. It behaves just like a failed nfs hard mount (well, actually worse).> > This doesn''t seem to have any predictable pattern. It may occur when > running a mkfs, during a mount, or after writing data (dd > id=/dev/zero of=/mnt/zero). I have had lockups after a couple of megs, > and at times have successfully written 500M. I am hoping its one of > two things - a very poor configuration on my part, or possibly > something I need to tweak in the xen kernel. Below are my > configurations, any help is appreciated.There are some things that you can try. #1 is stop using network-bridge, or modify it and pass some tweaks along. In particular you need to set: bridge_fd to 0 bridge_helo to 0 bridge_maxwait to 0 .. and more in depth, as needed. I recommend letting Ubuntu handle constructing the bridges during init. I first recommend that you try ensuring dom-0 has ample memory to handle the initiators + xen, and use initiators on dom-0 over a private gig-e network that isn''t used for public access to the guest. Don''t bridge this device. Or, use two gig-e nics and bond them, but again don''t bridge them. Secondly, you need quite a bit of sanity from the word go. If your targets fail to respond for any reason (such as Ubuntu starting iscsi before networking), you have a problem. Modify your initrd to put the private nics up, handle the bonding and at least test the target''s IP/Port to make sure something is answering the door when you knock, then go to pivot root. If it does NOT answer the door, set an environment variable such as ISCSI_FUBAR=1 and teach xendomains to use alternate block devices or skip booting of those guests. I deployed a rather massive Xen / iSCSI network of over 40 blades and 5 huge storage servers and ended up retrofitting it to use AoE instead. It got to the point where we would need to look at tcp offload cards just to take tcp/ip out of the equation to get the real performance we wanted ... and those were actually almost just as expensive as going with FC storage. Kind of defeats the purpose. Some people have tried "sharing" the private iscsi network by bridging it and giving each dom-u access to the second nic. Some very successful, some .. not so successful. I really recommend keeping it off the bridge entirely. My strongest recommendation is just use AoE with a private gig network just for it. However, I don''t want to seem gloomy and deter you from accomplishing your goal.> Many thanks, > Ryan >Best, --Tim> (note: most of my configuration is based on the iscsi-xen howto at > http://www.performancemagic.com/iscsi-xen-howto/iSCSI.html) > > IET - ietd.conf: > Target iqn.iscsi-target.bogus.com:storage.kolab_ocfs > Lun 5 Path=/dev/mapper/vg00-kolab_ocfs2lv,Type=fileio > MaxConnections=2 > > core-iscsi - initiator: > CHANNEL="1 1 eth0 192.168.1.10 3260 0" > > open-iscsi - /etc/iscsi/iscsid.conf: > node.active_cnx = 1 > node.startup = manual > node.session.auth.username = jim > node.session.auth.password = othersecret > node.session.timeo.replacement_timeout = 120 > node.session.err_timeo.abort_timeout = 10 > node.session.err_timeo.reset_timeout = 30 > node.session.iscsi.InitialR2T = No > node.session.iscsi.ImmediateData = Yes > node.session.iscsi.FirstBurstLength = 262144 > node.session.iscsi.MaxBurstLength = 16776192 > node.session.iscsi.DefaultTime2Wait = 0 > node.session.iscsi.DefaultTime2Retain = 0 > node.session.iscsi.MaxConnections = 1 > node.conn[0].iscsi.HeaderDigest = None > node.conn[0].iscsi.DataDigest = None > node.conn[0].iscsi.MaxRecvDataSegmentLength = 65536 > discovery.sendtargets.auth.username = joe > discovery.sendtargets.auth.password = secret > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> My strongest recommendation is just use AoE with a private gig network > just for it. However, I don''t want to seem gloomy and deter you from > accomplishing your goal. > > > Best, > --Tim >Tim, Thanks for the well thought out and very complete response. Your reply is a plethora of valuable information that I (and I am sure many others) will be making use of. Cheers, Ryan _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users