Hi all, I will post total seven patches for new pvSCSI driver on following E-mails. New features of the driver are as follows. - Support assignment of each SCSI device(LUN:Logical Unit Number) to guest domains. - Can specify the SCSI device by three ways.(See below.) - Simplified RING mechanism between frontend and backend communication. (Previous version used two RINGs for frontend to backend communication and backend to frontend communication respectively. This version uses one RING as same as VBD.) [ How to use ] a.) by "xm" command # xm scsi-attach <domain> <scsidevice> b.) by config file vscsi=[''scsidevice'',''scsidevice''] You can specify "scsidevice" by three ways for both case. 1.) /dev/sdx or sdx, /dev/stx or stx, /dev/sgx or sgx 2.) scsi_id (result of "scsi_id -gu -s /block/sda") Example: 36000b5d0006a0000006a025700400000 3.) host:chanel:target:lun Example: 4:0:0:10 Any comments are welcome. Best regards, ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Hi all, > > I will post total seven patches for new pvSCSI driver on following > E-mails. >Jun, What version of Xen is this patch supposed to be against? I am keen to develop a front end for the Windows PV drivers I''ve been working on, so I need to build it for my Dom0. I have built the ''tools'' stuff into the Debian package (eg just patched the original tree, applied the Debian patches, and did a dpkg-buildpackage), and am now trying to build the scsiback driver ''out of tree'', and it all builds but complains about ''bind_interdomain_evtchn_to_irqhandler'', which I''m guessing is a symbol that isn''t in the Debian kernel... Any suggestions? Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi James-san, The pvSCSI driver requires Xen 3.2. I think the Debian kernel uses Xen 3.1.x probably. Isn''t it? Thanks On Mon, 18 Feb 2008 23:14:01 +1100 "James Harper" <james.harper@bendigoit.com.au> wrote:> > > > Hi all, > > > > I will post total seven patches for new pvSCSI driver on following > > E-mails. > > > > Jun, > > What version of Xen is this patch supposed to be against? I am keen to > develop a front end for the Windows PV drivers I''ve been working on, so > I need to build it for my Dom0. I have built the ''tools'' stuff into the > Debian package (eg just patched the original tree, applied the Debian > patches, and did a dpkg-buildpackage), and am now trying to build the > scsiback driver ''out of tree'', and it all builds but complains about > ''bind_interdomain_evtchn_to_irqhandler'', which I''m guessing is a symbol > that isn''t in the Debian kernel... > > Any suggestions? > > Thanks > > JamesJun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Hi James-san, > > The pvSCSI driver requires Xen 3.2. > I think the Debian kernel uses Xen 3.1.x probably. Isn''t it?I think it''s worse than that... I think the Xen hypervisor is 3.1.2, but the Debian Xen Linux kernel has patches that are even older. What is it about 3.2 that the pvSCSI driver requires? James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I think I''ve got it working under Debian Etch. I''m now trying to develop a frontend driver for windows, and triggered a BUG() on or around line 328 of scsiback.c, because I wasn''t setting bus, target, and lun in the request. This effectively breaks Dom0 (hotplug scripts refused to work thereafter until a reboot), which means a rogue DomU can crash Dom0. I think you should implement a more graceful failure path. Also, for what reason are the bus, target, and lun required in the request? It look like that''s a leftover from an earlier version and I don''t see that it is required now. James> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of Jun Kamada > Sent: Monday, 18 February 2008 21:11 > To: xen-devel@lists.xensource.com > Cc: kama@jp.fujitsu.com > Subject: [Xen-devel] [Patch 0/7] pvSCSI driver > > Hi all, > > I will post total seven patches for new pvSCSI driver on following > E-mails. > > New features of the driver are as follows. > > - Support assignment of each SCSI device(LUN:Logical Unit Number) to > guest domains. > - Can specify the SCSI device by three ways.(See below.) > - Simplified RING mechanism between frontend and backendcommunication.> (Previous version used two RINGs for frontend to backendcommunication> and backend to frontend communication respectively. This versionuses> one RING as same as VBD.) > > > [ How to use ] > a.) by "xm" command > # xm scsi-attach <domain> <scsidevice> > > b.) by config file > vscsi=[''scsidevice'',''scsidevice''] > > > You can specify "scsidevice" by three ways for both case. > > 1.) /dev/sdx or sdx, /dev/stx or stx, /dev/sgx or sgx > 2.) scsi_id (result of "scsi_id -gu -s /block/sda") > Example: 36000b5d0006a0000006a025700400000 > 3.) host:chanel:target:lun > Example: 4:0:0:10 > > > Any comments are welcome. > > Best regards, > > ----- > Jun Kamada > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi James-san, Thank you for your comment. On Wed, 20 Feb 2008 14:58:48 +1100 "James Harper" <james.harper@bendigoit.com.au> wrote:> I''m now trying to develop a frontend driver for windows, and triggered a > BUG() on or around line 328 of scsiback.c, because I wasn''t setting bus, > target, and lun in the request. This effectively breaks Dom0 (hotplug > scripts refused to work thereafter until a reboot), which means a rogue > DomU can crash Dom0. I think you should implement a more graceful > failure path.Yes, I agree on your opinion. Some modification or addition may be needed about error handling, including Reset/Abort SCSI command. We would like to post new version ASAP. However, we also would like to get a lot of comments on *current* version for the enhancement.> Also, for what reason are the bus, target, and lun required in the > request? It look like that''s a leftover from an earlier version and I > don''t see that it is required now.The LUN assignment to guest can provide HBA sharing from multiple guests. We consider that feature is very useful for many usage scenarios. And also, LUN assignment covers HBA assignment by using wildcard, for example "xm scsi-attach <domain> 4:*:*:*". Needless to say, expansion of "xm" or "xend" is required in that case. :-) Best regards, ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Another issue I''ve come across - you appear to have hardcoded the timeout to 5 seconds. I''m trying to run the HP Library & Tape Tools under windows, and things like unload and erase go well beyond the 5 seconds you allow. I increased the timeout to 30 seconds and the unload works fine but the erase goes for longer than that. I notice you have a timeout field in the request field, but have commented it out. What was the reason for this? Also, I had a system crash when I removed the pvscsi backend driver module. Something else to look at. The timeout is the thing that is causing concern at the moment though. Thanks James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Another issue I''ve come across - you appear to have hardcoded the > timeout to 5 seconds. I''m trying to run the HP Library & Tape Tools > under windows, and things like unload and erase go well beyond the 5 > seconds you allow. I increased the timeout to 30 seconds and theunload> works fine but the erase goes for longer than that. > > I notice you have a timeout field in the request field, but have > commented it out. What was the reason for this? >Just responding to myself, would I be guessing correctly that you removed the timeout field to make the request structure smaller? The top byte of the request_bufflen field could be used as a timeout, as sensible timeout values don''t need to be very exact. Even if we just used the top 4 bits and made it (1 << (timeout + 1)) * 5 * HZ, that would give us a bit over 45 hours, and we''d make 15 mean infinite. By my calculations, the largest that bufflen could be is 27 * PAGE_SIZE = ~110K on x86, so we have plenty of headroom. With the timeout increased, the HP Library & Tape Tools have successfully completed a ''LTO Drive Assessment test'' under Windows. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi James-san, On Thu, 21 Feb 2008 14:39:47 +1100 "James Harper" <james.harper@bendigoit.com.au> wrote:> Just responding to myself, would I be guessing correctly that you > removed the timeout field to make the request structure smaller? The topThat is one reason. However, the main reason is as follows. The time that guests/host will get is not real world''s time on virtualized environment. The time depends on hypervisor''s scheduling. (Is this assumption right ?) For example, if upper layer of the pvSCSI frontend specified 5 seconds as timeout, it should be treated as real world''s time or within a guest domain''s world time ? We didn''t have clear answer when we implemented that part. Therefore, we coded it temporally 5 seconds. James-san, how do you think about this issue ? By the way, we understand that the 5 seconds is too short to support tape device. Best regards,> byte of the request_bufflen field could be used as a timeout, as > sensible timeout values don''t need to be very exact. Even if we just > used the top 4 bits and made it (1 << (timeout + 1)) * 5 * HZ, that > would give us a bit over 45 hours, and we''d make 15 mean infinite. > > By my calculations, the largest that bufflen could be is 27 * PAGE_SIZE > = ~110K on x86, so we have plenty of headroom. > > With the timeout increased, the HP Library & Tape Tools have > successfully completed a ''LTO Drive Assessment test'' under Windows. > > James >Jun Kamada Linux Technology Development Div. Server Systems Unit Fujitsu Ltd. kama@jp.fujitsu.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Hi James-san, > > On Thu, 21 Feb 2008 14:39:47 +1100 > "James Harper" <james.harper@bendigoit.com.au> wrote: > > Just responding to myself, would I be guessing correctly that you > > removed the timeout field to make the request structure smaller? Thetop> > That is one reason. However, the main reason is as follows. > > The time that guests/host will get is not real world''s time on > virtualized environment. The time depends on hypervisor''s scheduling. > (Is this assumption right ?) > > > For example, if upper layer of the pvSCSI frontend specified 5 seconds > as timeout, it should be treated as real world''s time or within aguest> domain''s world time ? > > We didn''t have clear answer when we implemented that part. Therefore, > we coded it temporally 5 seconds. > > James-san, how do you think about this issue ?I don''t think the exact value of the timeout matters that much. At worst, a 5 second timeout is going to be at least 5 seconds, and probably not much more than that. It''s the Linux SCSI subsystem itself that handles the timeout anyway.> By the way, we understand that the 5 seconds is too short to support > tape device.Yes, way too short. For running the HP LT&T (Library and Tape Tools), even 60 seconds is too short for some operations. FYI, I have the HP LT&T working nicely under windows now. All tests succeed, even a firmware update to the tape drive worked. A Read/Write test on a HP LTO2 drive with LTO1 media gives me 13.3MB/s (approx 800MB/minute) for both read and write operations. I assume that a CD or DVD burner would work also, although I don''t have one to test. Are you planning on requesting that pvSCSI get merged into the Xen tree once you have the timeout and unload issues sorted out? James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi James-san, Sorry for late reply. On Thu, 21 Feb 2008 16:30:09 +1100 "James Harper" <james.harper@bendigoit.com.au> wrote:> Are you planning on requesting that pvSCSI get merged into the Xen tree > once you have the timeout and unload issues sorted out?Yes, I will re-post it ASAP. In addition to the issues mentioned above, I would like to implement Reset/Abort function. Best regards, ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I will post total seven patches for new pvSCSI driver on following > E-mails.Thanks for doing this, being able to pass SCSI devices through to guests is likely to be a useful facility. I have a couple of comments on the design: -- You''ve ended up re-implementing a lot of Linux SCSI stuff in the backend. I don''t understand why this was necessary. Would you mind explaining, please? -- The code seems to be a bit undecided about whether the exposed devices are supposed to represent SCSI adapters or SCSI targets. It looks like the frontend initially tries to treat them as a bunch of targets, and then conditionally gloms them back together into hosts depending on xenstore fields? Having a host per target would make sense, as would having a single host with all of the targets hanging off of it, but I don''t understand why this split model is useful. Perhaps I''m just missing something. -- I don''t understand the distinction between comfront and scsifront. What was the reason for this split? -- There don''t seem to be many comments in these patches. Xen and Linux are both generally pretty comment-light, but an entire new device class without a single meaningful comment still kind of stands out. I''ll reply to the individual patches with more detailed comments. A lot of my complaints will doubtless turn out to just be because I''m not very used to Linux SCSI. I''ve not looked at the xend changes, because I''m not really competent to evaluate them. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Steven-san, I appreciate your sending a lot of helpful comments. I will answer the questions and comments described below now. For the other questions/comments about each source code, I would like to reply on another mail later. On Wed, 27 Feb 2008 11:16:10 +0000 Steven Smith <steven.smith@eu.citrix.com> wrote:> I have a couple of comments on the design: > > -- You''ve ended up re-implementing a lot of Linux SCSI stuff in the > backend. I don''t understand why this was necessary. Would you > mind explaining, please?# If I misunderstood your question, please inform. In order to provide LUN assignment to guest domain, backend driver is modified. Modification point is as follows. - Ring, EventChannel and GrantTable are independently allocated for each LUN assignment. Previsous version used them for each host (HBA). This is the main difference between previous and new version. And also, we removed code for FC transport layer.> -- The code seems to be a bit undecided about whether the exposed > devices are supposed to represent SCSI adapters or SCSI targets. > It looks like the frontend initially tries to treat them as a bunch > of targets, and then conditionally gloms them back together into > hosts depending on xenstore fields? Having a host per target would > make sense, as would having a single host with all of the targets > hanging off of it, but I don''t understand why this split model is > useful. Perhaps I''m just missing something.Frontend driver try to attach each LUN according to the information on xenstore. New version support only LUN assignment to guest domain. If you want SCSI device/host assignment to guest, you have to specify all LUNs under the device/host. I understand wildcard description, such like "1:*:*:*", is needed.> -- I don''t understand the distinction between comfront and scsifront. > What was the reason for this split?I intended to seperate two types of code, primitive code for communication between frontend and backend, and SCSI specific code. However, the separation may be incomplete.> -- There don''t seem to be many comments in these patches. Xen and > Linux are both generally pretty comment-light, but an entire new > device class without a single meaningful comment still kind of > stands out.I agree to your comment completely. I should add more comment to source code.> I''ll reply to the individual patches with more detailed comments. A > lot of my complaints will doubtless turn out to just be because I''m > not very used to Linux SCSI. I''ve not looked at the xend changes, > because I''m not really competent to evaluate them. > > Steven.Best regards, ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > I have a couple of comments on the design: > > > > -- You''ve ended up re-implementing a lot of Linux SCSI stuff in the > > backend. I don''t understand why this was necessary. Would you > > mind explaining, please? > > # If I misunderstood your question, please inform. > > In order to provide LUN assignment to guest domain, backend driver is > modified. Modification point is as follows. > > - Ring, EventChannel and GrantTable are independently allocated for > each LUN assignment. Previsous version used them for each host (HBA). > This is the main difference between previous and new version. > > And also, we removed code for FC transport layer.I''m afraid I haven''t looked closely at the previous revisions of this patch, so I don''t know about any differences between them. I was referring more to bits like requset_map_sg, which is an almost direct copy and paste of drivers/scsi/scsi_lib.c::scsi_req_map_sg, and scsiback_merge_bio(), which is identical to scsi_merge_bio() except for some whitespace changes. Having to carry our own implementation of core Linux SCSI support seems like it''ll be a significant maintenance burden, and I''d like to understand why it was necessary.> > -- The code seems to be a bit undecided about whether the exposed > > devices are supposed to represent SCSI adapters or SCSI targets. > > It looks like the frontend initially tries to treat them as a bunch > > of targets, and then conditionally gloms them back together into > > hosts depending on xenstore fields? Having a host per target would > > make sense, as would having a single host with all of the targets > > hanging off of it, but I don''t understand why this split model is > > useful. Perhaps I''m just missing something. > Frontend driver try to attach each LUN according to the information on > xenstore. New version support only LUN assignment to guest domain. If > you want SCSI device/host assignment to guest, you have to specify all > LUNs under the device/host. I understand wildcard description, such > like "1:*:*:*", is needed.Okay, so a device in xenstore corresponds to a LUN, and you map them to particular hosts based on the device name? That''s kind of ugly, but it''s probably the most direct way of doing it through xenstore. What I don''t understand is why you need this at all. It seems like it would make more sense to either: a) Hang every LUN off of the same scsi host, or b) Give each LUN its own scsi host. Is there some reason why you might want to do something like this: Host A -------+----- LUN 1 | +----- LUN 2 Host B ------------- LUN 3 i.e. partition the virtual LUNs between multiple hosts in the guest, but keeping some of them together? Perhaps I''m just missing something, but I can''t think of any use cases which would benefit from that, and trying to support it noticeably complicates the frontend.> > -- I don''t understand the distinction between comfront and scsifront. > > What was the reason for this split? > I intended to seperate two types of code, primitive code for > communication between frontend and backend, and SCSI specific code. > However, the separation may be incomplete.Okay.> > -- There don''t seem to be many comments in these patches. Xen and > > Linux are both generally pretty comment-light, but an entire new > > device class without a single meaningful comment still kind of > > stands out. > I agree to your comment completely. I should add more comment to > source code.Thanks. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven-san, On Thu, 28 Feb 2008 11:13:31 +0000 Steven Smith <steven.smith@eu.citrix.com> wrote:> What I don''t understand is why you need this at all. It seems like it > would make more sense to either: > > a) Hang every LUN off of the same scsi host, or > b) Give each LUN its own scsi host. > > Is there some reason why you might want to do something like this: > > Host A -------+----- LUN 1 > | > +----- LUN 2 > > Host B ------------- LUN 3 > > i.e. partition the virtual LUNs between multiple hosts in the guest, > but keeping some of them together? Perhaps I''m just missing > something, but I can''t think of any use cases which would benefit from > that, and trying to support it noticeably complicates the frontend.Can I explain a numbering logic of assigning LUNs to guests? Basically, each guest looks same SCSI tree as host except for following two points. 1.) The "host" in 4-tuples "host:channel:id:lun" on guest may not be same as that on host. 2.) Tree on the guest may be sparse when some LUN doesn''t assign to the guest. Therefore, "a1:b:c:d" on host becomes "a2:b:c:d" on guest. (a1 != a2 generally) I think the numbering logic is same as b) you mentioned above. Is it right? Thanks, ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > What I don''t understand is why you need this at all. It seems like it > > would make more sense to either: > > > > a) Hang every LUN off of the same scsi host, or > > b) Give each LUN its own scsi host. > > > > Is there some reason why you might want to do something like this: > > > > Host A -------+----- LUN 1 > > | > > +----- LUN 2 > > > > Host B ------------- LUN 3 > > > > i.e. partition the virtual LUNs between multiple hosts in the guest, > > but keeping some of them together? Perhaps I''m just missing > > something, but I can''t think of any use cases which would benefit from > > that, and trying to support it noticeably complicates the frontend. > Can I explain a numbering logic of assigning LUNs to guests?That was what I was hoping you''d do, yes. :)> Basically, each guest looks same SCSI tree as host except for following > two points. > > 1.) The "host" in 4-tuples "host:channel:id:lun" on guest may not be > same as that on host. > 2.) Tree on the guest may be sparse when some LUN doesn''t assign to > the guest. > > Therefore, "a1:b:c:d" on host becomes "a2:b:c:d" on guest. (a1 != a2 > generally)Okay, why do you require that the device in the guest has the same channel:id:lun as the device on the host? That seems like a somewhat gratuitous restriction to me.> I think the numbering logic is same as b) you mentioned above. Is it > right?No, you''ve gone for option c: c) The topology inside the guest reflects a subset of the host topology which I hadn''t previously considered. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Steven-san, On Mon, 3 Mar 2008 11:38:57 +0000 Steven Smith <steven.smith@eu.citrix.com> wrote:> > > What I don''t understand is why you need this at all. It seems like it > > > would make more sense to either: > > > > > > a) Hang every LUN off of the same scsi host, or > > > b) Give each LUN its own scsi host. > > > > > > Is there some reason why you might want to do something like this: > > > > > > Host A -------+----- LUN 1 > > > | > > > +----- LUN 2 > > > > > > Host B ------------- LUN 3 > > > > > > i.e. partition the virtual LUNs between multiple hosts in the guest, > > > but keeping some of them together? Perhaps I''m just missing > > > something, but I can''t think of any use cases which would benefit from > > > that, and trying to support it noticeably complicates the frontend. > > Can I explain a numbering logic of assigning LUNs to guests? > That was what I was hoping you''d do, yes. :) > > > Basically, each guest looks same SCSI tree as host except for following > > two points. > > > > 1.) The "host" in 4-tuples "host:channel:id:lun" on guest may not be > > same as that on host. > > 2.) Tree on the guest may be sparse when some LUN doesn''t assign to > > the guest. > > > > Therefore, "a1:b:c:d" on host becomes "a2:b:c:d" on guest. (a1 != a2 > > generally) > Okay, why do you require that the device in the guest has the same > channel:id:lun as the device on the host? That seems like a somewhat > gratuitous restriction to me. > > > I think the numbering logic is same as b) you mentioned above. Is it > > right? > No, you''ve gone for option c: > > c) The topology inside the guest reflects a subset of the host > topology > > which I hadn''t previously considered.The reason why we took the option c is as follows. - Some storage management software running on guest may asume physical topology. (However, I''m not sure whether there is such a software or not.) - The "host" is Linux specific number and Scsi-Host structure for dummy consumes relatively large memory space. Therefore, we decided to compress the "host" number. (Not sparse. Contiguous.) Explicit declaration like below may be one solution. Of cource some default setting is needed. On Dom0 On Guest ------------------------ "1:2:3:4" ---> "5:6:7:8" Best regards, Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > > > What I don''t understand is why you need this at all. It seems like it > > > > would make more sense to either: > > > > > > > > a) Hang every LUN off of the same scsi host, or > > > > b) Give each LUN its own scsi host....> > > Can I explain a numbering logic of assigning LUNs to guests? > > That was what I was hoping you''d do, yes. :) > > > > > Basically, each guest looks same SCSI tree as host except for following > > > two points. > > > > > > 1.) The "host" in 4-tuples "host:channel:id:lun" on guest may not be > > > same as that on host. > > > 2.) Tree on the guest may be sparse when some LUN doesn''t assign to > > > the guest. > > > > > > Therefore, "a1:b:c:d" on host becomes "a2:b:c:d" on guest. (a1 != a2 > > > generally) > > Okay, why do you require that the device in the guest has the same > > channel:id:lun as the device on the host? That seems like a somewhat > > gratuitous restriction to me. > > > > > I think the numbering logic is same as b) you mentioned above. Is it > > > right? > > No, you''ve gone for option c: > > > > c) The topology inside the guest reflects a subset of the host > > topology > > > > which I hadn''t previously considered. > The reason why we took the option c is as follows. > > - Some storage management software running on guest may asume physical > topology. (However, I''m not sure whether there is such a software or > not.)There are three obvious ways for them to make that kind of assumption: 1) There''s some SCSI command which applies to a collection of devices, and that collection depends on the topology. Bus resets are the obvious one here. All of these commands will need special handling anyway, to prevent VMs from interfering with each other (and I don''t think you currently support any of them, anyway). 2) There are some magic LUNs/targets/whatevers which the application tries to access at a particular address. sam4r10 requires that either LUN 0 or the REPORT LUNS well-known LUN be present, so that''s pretty plausible. I think your current implementation may already have problems here if a user decides to only connect a subset of a device''s LUNs, yes? 3) There''s some SCSI command which returns LUNs in its results. REPORT LUNs is the obvious one here. The frontend will currently report incorrect results for these commands if the user has only connected a subset of the LUNs. This kind of suggests that we should be plumbing things through to the guest with a granularity of whole targets, rather than individual logical units. The alternative is a much more complicated scsi emulation which can fix up the LUN-sensitive commands.> - The "host" is Linux specific number and Scsi-Host structure for > dummy consumes relatively large memory space. Therefore, we decided > to compress the "host" number. (Not sparse. Contiguous.)Are you implying that frontend host numbers won''t always match up with backend host numbers? If hosts are expensive to construct then that''s a good reason to avoid model (b) (one host per LUN/target) (although my desktop has a scsi host for each SATA port, so they can''t be *that* expensive). It doesn''t rule out model (a) (one host shared by all LUNs).> Explicit declaration like below may be one solution. Of cource some > default setting is needed. > > > On Dom0 On Guest > ------------------------ > "1:2:3:4" ---> "5:6:7:8"Allowing this kind of mapping sounds reasonable to me. It would also make it possible (hopefully) to add support for some of the weirder SCSI logical unit addressing modes without changing the frontends (e.g. hierarchical addressing with 64 bit LUNs). That might involve a certain amount of munging of REPORT LUNS commands in the backend, though. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> This kind of suggests that we should be plumbing things through to the > guest with a granularity of whole targets, rather than individual > logical units. The alternative is a much more complicated scsi > emulation which can fix up the LUN-sensitive commands.I think we should probably have the option of doing either.> Allowing this kind of mapping sounds reasonable to me. It would also > make it possible (hopefully) to add support for some of the weirder > SCSI logical unit addressing modes without changing the frontends > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a > certain amount of munging of REPORT LUNS commands in the backend, > though.Not sure how much it matters, but any ''munging'' of scsi commands would be a real drag for Windows drivers. The Windows SCSI layer is very strict on lots of things, and is a real pain if you are not talking to a physical PCI scsi device. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi James-san and Steven-san, Thank you for your comments. In order to avoid my misunderstanding, could you teach me what the ''munging'' is? It means to reject the some SCSI commands or to modify inside of the command(CDB) and response(SENSE) on the backend ? Thanks, On Wed, 5 Mar 2008 13:34:48 +1100 "James Harper" <james.harper@bendigoit.com.au> wrote:> > This kind of suggests that we should be plumbing things through to the > > guest with a granularity of whole targets, rather than individual > > logical units. The alternative is a much more complicated scsi > > emulation which can fix up the LUN-sensitive commands. > > I think we should probably have the option of doing either. > > > Allowing this kind of mapping sounds reasonable to me. It would also > > make it possible (hopefully) to add support for some of the weirder > > SCSI logical unit addressing modes without changing the frontends > > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a > > certain amount of munging of REPORT LUNS commands in the backend, > > though. > > Not sure how much it matters, but any ''munging'' of scsi commands would > be a real drag for Windows drivers. The Windows SCSI layer is very > strict on lots of things, and is a real pain if you are not talking to a > physical PCI scsi device. > > James > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-develJun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Hi James-san and Steven-san, > > Thank you for your comments. > > In order to avoid my misunderstanding, could you teach me what the > ''munging'' is? It means to reject the some SCSI commands or to modify > inside of the command(CDB) and response(SENSE) on the backend ? >:) In this context it just means modifying the packets ''on the fly'' in a way that we''d probably rather not. I guess it''s kind of a NAT for SCSI... maybe we''d call it SAT for Scsi Address Translation :) James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi James-san, On Wed, 5 Mar 2008 20:56:32 +1100 "James Harper" <james.harper@bendigoit.com.au> wrote:> In this context it just means modifying the packets ''on the fly'' in a > way that we''d probably rather not. I guess it''s kind of a NAT for > SCSI... maybe we''d call it SAT for Scsi Address Translation :)OK, I understood. Thanks. Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
For a more precise definition of "munge" and for future reference see: http://foldoc.org/index.cgi?query=munge&action=Search http://foldoc.org/ Hope that helps! Dan> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com]On Behalf Of Jun Kamada > Sent: Wednesday, March 05, 2008 3:00 AM > To: James Harper > Cc: kama@jp.fujitsu.com; Steven Smith; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [Patch 0/7] pvSCSI driver > > > Hi James-san, > > On Wed, 5 Mar 2008 20:56:32 +1100 > "James Harper" <james.harper@bendigoit.com.au> wrote: > > In this context it just means modifying the packets ''on the > fly'' in a > > way that we''d probably rather not. I guess it''s kind of a NAT for > > SCSI... maybe we''d call it SAT for Scsi Address Translation :) > > OK, I understood. Thanks. > > > Jun Kamada > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Dan-san, On Thu, 6 Mar 2008 16:48:42 -0700 "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> For a more precise definition of "munge" and for future > reference see: > > http://foldoc.org/index.cgi?query=munge&action=Search > > http://foldoc.org/ > > Hope that helps! > DanIt''s very helpful for me. Thanks. Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, Problems discussed in this context, what the portion of whole SCSI tree should be exposed to guest and how the numbering logic of guest''s tree should be, is very fundamental and difficult, I think. In my current thought, following two options are reasonable solutions. How do you think about them? Could you please comment me? Option 1 (LUN assignment) - Specify the assignment like below: "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest) The lun1 must be same as the lun2. - Munging :-) REPORT LUNS command on Dom0 according to the number of LUNs actually attached to the guest. Option 2 (Target Assignment) - Specify the assignment like below: "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest) All LUNs under id1 are assigned to one guest. - Munging for LUN is not needed. For each option, how host/bus/device reset command should be? Best regards, On Wed, 5 Mar 2008 13:34:48 +1100 "James Harper" <james.harper@bendigoit.com.au> wrote:> > This kind of suggests that we should be plumbing things through to the > > guest with a granularity of whole targets, rather than individual > > logical units. The alternative is a much more complicated scsi > > emulation which can fix up the LUN-sensitive commands. > > I think we should probably have the option of doing either. > > > Allowing this kind of mapping sounds reasonable to me. It would also > > make it possible (hopefully) to add support for some of the weirder > > SCSI logical unit addressing modes without changing the frontends > > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a > > certain amount of munging of REPORT LUNS commands in the backend, > > though. > > Not sure how much it matters, but any ''munging'' of scsi commands would > be a real drag for Windows drivers. The Windows SCSI layer is very > strict on lots of things, and is a real pain if you are not talking to a > physical PCI scsi device. > > James > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-develJun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Hi, > > Problems discussed in this context, what the portion of whole SCSI > tree should be exposed to guest and how the numbering logic of guest''s > tree should be, is very fundamental and difficult, I think. > > In my current thought, following two options are reasonable solutions. > How do you think about them? Could you please comment me? > > Option 1 (LUN assignment) > - Specify the assignment like below: > "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest) > The lun1 must be same as the lun2. > - Munging :-) REPORT LUNS command on Dom0 according to the number of > LUNs actually attached to the guest. > > Option 2 (Target Assignment) > - Specify the assignment like below: > "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest) > All LUNs under id1 are assigned to one guest. > - Munging for LUN is not needed.I think it would help to have some real-life examples about where each option would and wouldn''t make sense. It may be that you need to implement both options. I''m not familiar enough with the variety of scsi devices out there to be able to judge.> For each option, how host/bus/device reset command should be?I have thought about this some more. Normally, a reset will be issued because of some error, normally a timeout I assume. You could implement something like: . if the reset requested is a device reset, and the DomU ''owns'' all luns attached to the device, then allow the device reset. . if the reset requested is a device reset, and the DomU ''owns'' only some of the luns attached to the device, then only allow the device reset if all the other ''owners'' have requested a device reset also. . the above two rules might work for host and bus resets too, as long as all ''owners'' agree to a reset. The problem might be if you had a device with three luns, and three DomU''s with a single lun each. If the device had hung and required a reset, then any DomU using it would notice the timeout and issue a reset, but if one DomU wasn''t using its lun at the time it might not notice. Maybe you need another communication channel where Dom0 can ask each DomU for permission to do the reset. This reset stuff seems like a lot of extra work for probably not much benefit though. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Problems discussed in this context, what the portion of whole SCSI > tree should be exposed to guest and how the numbering logic of guest''s > tree should be, is very fundamental and difficult, I think. > > In my current thought, following two options are reasonable solutions. > How do you think about them? Could you please comment me? > > Option 1 (LUN assignment) > - Specify the assignment like below: > "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest) > The lun1 must be same as the lun2. > - Munging :-) REPORT LUNS command on Dom0 according to the number of > LUNs actually attached to the guest.I think this is the most flexible approach. One thing to watch out for here is that some old systems get quite confused if lun0 is missing but some of the higher luns are present. That''s easy to handle if you allow an arbitrary mapping between dom0 and guest luns, but is hard if you require them to be identical. This might not be an issue in the cases which we care about, though.> Option 2 (Target Assignment) > - Specify the assignment like below: > "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest) > All LUNs under id1 are assigned to one guest. > - Munging for LUN is not needed. > > For each option, how host/bus/device reset command should be?It''s possible that we''ll be able to get away with just supporting LOGICAL UNIT RESET commands, and completely ignoring lower granularity resets. I''m not sure how widely supported they are on actual hardware, but it might be good enough for a first implementation. You might even be able to get away with not supporting any kind of reset at all, and just accepting that error recovery is going to suck. Steven.> On Wed, 5 Mar 2008 13:34:48 +1100 > "James Harper" <james.harper@bendigoit.com.au> wrote: > > > > This kind of suggests that we should be plumbing things through to the > > > guest with a granularity of whole targets, rather than individual > > > logical units. The alternative is a much more complicated scsi > > > emulation which can fix up the LUN-sensitive commands. > > > > I think we should probably have the option of doing either. > > > > > Allowing this kind of mapping sounds reasonable to me. It would also > > > make it possible (hopefully) to add support for some of the weirder > > > SCSI logical unit addressing modes without changing the frontends > > > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a > > > certain amount of munging of REPORT LUNS commands in the backend, > > > though. > > > > Not sure how much it matters, but any ''munging'' of scsi commands would > > be a real drag for Windows drivers. The Windows SCSI layer is very > > strict on lots of things, and is a real pain if you are not talking to a > > physical PCI scsi device. > > > > James > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > Jun Kamada > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Steven-san and James-san, Thank you for your comments. We have had a internal discussion based on your comments and reached following thoughts. I consider that the thoughts can provide both flexibility and ease of implementation. We would like to start modification of the pvSCSI driver according to the thoughts. How do you think about it? The thoughts is reasonable? If you have any comments, could you please? ----- 1.) Allow specifying arbitrary mapping between Dom0''s SCSI tree and Guest''s SCSI tree. This includes "lun". ( Dom0''s IDs [host1:channel1:id1:lun1] ---> Guest''s IDs [host2:channel2:id2:lun2] ) 2.) Guest has responsibility to have mapping and transform between Dom0''s IDs and Guest''s IDs. It depends on guest OS''s implementation which level(e.g. only "host" or all of 4-tuples or no-transform) of mapping/transformation will be supported. If guest decides to support lun transformation and in case of "lun1 != lun2", the guest''s frontend driver should maintain LUN value in CDB data structure. 3.) As for REPORT LUNS command, Dom0 performs munging. 4.) Dom0 accepts only LOGICAL UNIT RESET command. 5.) Of course, the backend driver performs sanity check of IDs that the guest already transformed. And I would like to implement pvSCSI frontend driver for Linux by following mapping/transformation policy. (Please note that another guest OS such as Windows can take another policy, of cource.) - The guest looks identical tree as Dom0 looks except for "host". (This comes by the reason that arbitrary "host" mapping is difficult for current Linux implementation.) - Of course, the guest''s tree is sparse if some LUNs were not attached to the guest. Linux kernel allows the situation that lun=0 does not exist, therefore sparse tree is not a problem. Best regards, On Mon, 10 Mar 2008 12:00:59 +0000 Steven Smith <steven.smith@eu.citrix.com> wrote:> > Problems discussed in this context, what the portion of whole SCSI > > tree should be exposed to guest and how the numbering logic of guest''s > > tree should be, is very fundamental and difficult, I think. > > > > In my current thought, following two options are reasonable solutions. > > How do you think about them? Could you please comment me? > > > > Option 1 (LUN assignment) > > - Specify the assignment like below: > > "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest) > > The lun1 must be same as the lun2. > > - Munging :-) REPORT LUNS command on Dom0 according to the number of > > LUNs actually attached to the guest. > I think this is the most flexible approach. > > One thing to watch out for here is that some old systems get quite > confused if lun0 is missing but some of the higher luns are present. > That''s easy to handle if you allow an arbitrary mapping between dom0 > and guest luns, but is hard if you require them to be identical. This > might not be an issue in the cases which we care about, though. > > > Option 2 (Target Assignment) > > - Specify the assignment like below: > > "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest) > > All LUNs under id1 are assigned to one guest. > > - Munging for LUN is not needed. > > > > For each option, how host/bus/device reset command should be? > It''s possible that we''ll be able to get away with just supporting > LOGICAL UNIT RESET commands, and completely ignoring lower granularity > resets. I''m not sure how widely supported they are on actual > hardware, but it might be good enough for a first implementation. You > might even be able to get away with not supporting any kind of reset > at all, and just accepting that error recovery is going to suck. > > Steven. > > > On Wed, 5 Mar 2008 13:34:48 +1100 > > "James Harper" <james.harper@bendigoit.com.au> wrote: > > > > > > This kind of suggests that we should be plumbing things through to the > > > > guest with a granularity of whole targets, rather than individual > > > > logical units. The alternative is a much more complicated scsi > > > > emulation which can fix up the LUN-sensitive commands. > > > > > > I think we should probably have the option of doing either. > > > > > > > Allowing this kind of mapping sounds reasonable to me. It would also > > > > make it possible (hopefully) to add support for some of the weirder > > > > SCSI logical unit addressing modes without changing the frontends > > > > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a > > > > certain amount of munging of REPORT LUNS commands in the backend, > > > > though. > > > > > > Not sure how much it matters, but any ''munging'' of scsi commands would > > > be a real drag for Windows drivers. The Windows SCSI layer is very > > > strict on lots of things, and is a real pain if you are not talking to a > > > physical PCI scsi device. > > > > > > James > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > > > Jun Kamada > > > >----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Looking through the SCSI spec, I don''t think we''re going to be able to get away with passing requests through from the frontend all the way to the physical disk without sanity checking the actual CDB in the backend. There are a couple of commands which look scary: -- CHANGE ALIAS/REPORT ALIAS -- the alias list is shared across everything in the I_T nexus. That will lead to interesting issues if you ever have multiple guests modifying it at the same time. -- EXTENDED COPY -- allows you to copy arbitrary data between logical units, sometimes even ones not in the same target device. That''s obviously going to need to be controlled in a VM setting. -- Some mode pages, as modified by MODE SELECT, can apply across multiple LUs. Even more exciting, the level of sharing can in principle vary between devices, even for the same page. -- WRITE BUFFER commands can be used to change the microcode on a device. I''ve no idea what the implications of letting an untrusted user push microcode into a device would be, but I doubt it''s a good idea. -- I''m not sure whether we want to allow untrusted guests to issue SET PRIORITY commands. -- We''ve already been over REPORT LUNS :) Plus whatever weird things the various device manufacturers decide to introduce. What this means is that the REPORT LUNS issue fundamentally isn''t restricted to just the REPORT LUNS command, but instead affects an unknown and potentially large set of other commands. The only way I can see to deal with this is to white-list commands individually once they''ve been confirmed to be safe, and have the backend block any commands which haven''t been checked yet. That''s going to be a fair amount of work, and it''ll screw up the whole ``transparent pass through'''' thing, but I can''t see any other way of solving this problem safely. (And even that assumes that the hardware people got everything right. Most devices will be designed on the assumption that only trusted system components can submit CDBs, so it wouldn''t surprise me if some of them can be made to do bad things if a malicious CDB comes in. There''s not really a great deal we can do about this, though.) Backtracking a little, the fundamental goal here is to make some logical units which are accessible to dom0 appear inside the guest. Guest operating systems are unlikely to be very happy about having logical units floating around not attached to scsi hosts, and so we need (somehow) to come up with a scsi host which has the right set of logical units attached to it. There are lots of valid use cases in which there don''t exist physical hosts with the right set of LUs, and so somebody needs to invent one, and then emulate it. That somebody will necessarily be either the frontend or the backend. Doing the emulation also gives you the option of filtering out things like TCQ support in INQUIRY commands, which might be supported by the physical device but certainly isn''t supported by the pvSCSI protocol. If you emulate the HBA in the backend, you get a design like this: -- There is usually only one xenbus scsi device attached to any given VM, and that device represents the emulated HBA. -- scsifront creates a struct scsi_host (or equivalent) for each xenbus device, and those provide your interface to the rest of the guest operating system. -- When the guest OS submits a request to the frontend driver, it gets packaged up and shipped over the ring to the backend pretty much completely unchanged. -- The backend figures out what the request is doing, and either: a) Routes it to a physical device, or b) Synthesises an answer (for things like REPORT LUNS), or c) Fails the request (for things like WRITE BUFFER), as appropriate. If you emulate the HBA in the frontend, you get a design which looks like this: -- Each logical unit exposed to the guest has its own xenbus scsi device. -- scsifront creates a single struct scsi_host, representing the emulated HBA. -- When the guest OS submits a request to the frontend driver, it either: a) Routes it to a Xen scsifront and passes it off to the backend, or b) Synthesises an answer, or c) Fails the request, as appropriate. -- When a request reaches the backend, it does a basic check to make sure that it''s dealing with one of the whitelisted requests, and then sends it directly to the relevant physical device. The routing problem is trivial here, because there is only ever one physical device (struct scsi_device in Linux-speak) associated with any xenbus device, and the request is just dropped directly into the relevant request queue. The first approach gives you a simple frontend at the expense of a complicated backend, while the second one gives you a simple backend at the expense of a complicated frontend. It seems likely that there will be more frontend implementations than backend, which suggests that putting the HBA emulation in the backend is a better choice. The main difference from a performance point of view is that the second approach will use a ring for each device, whereas the first has a single ring shared across all devices, so you''ll get more requests in flight with the second scheme. I''d expect that just making the rings larger would have more effect, though, and that''s easier when there''s just one of them. Steven. On Wed, Mar 12, 2008 at 03:23:00PM +0900, Jun Kamada wrote:> Date: Wed, 12 Mar 2008 15:23:00 +0900 > From: Jun Kamada <kama@jp.fujitsu.com> > To: Steven Smith <steven.smith@eu.citrix.com> > Subject: Re: [Xen-devel] [Patch 0/7] pvSCSI driver > Cc: kama@jp.fujitsu.com, James Harper <james.harper@bendigoit.com.au>, > xen-devel@lists.xensource.com > > Hi Steven-san and James-san, > > Thank you for your comments. > > We have had a internal discussion based on your comments and reached > following thoughts. I consider that the thoughts can provide both > flexibility and ease of implementation. > > We would like to start modification of the pvSCSI driver according to > the thoughts. How do you think about it? The thoughts is reasonable? > If you have any comments, could you please? > > > ----- > 1.) Allow specifying arbitrary mapping between Dom0''s SCSI tree and > Guest''s SCSI tree. This includes "lun". > ( Dom0''s IDs [host1:channel1:id1:lun1] ---> > Guest''s IDs [host2:channel2:id2:lun2] ) > 2.) Guest has responsibility to have mapping and transform between > Dom0''s IDs and Guest''s IDs. It depends on guest OS''s implementation > which level(e.g. only "host" or all of 4-tuples or no-transform) of > mapping/transformation will be supported. > If guest decides to support lun transformation and in case of > "lun1 != lun2", the guest''s frontend driver should maintain LUN > value in CDB data structure. > 3.) As for REPORT LUNS command, Dom0 performs munging. > 4.) Dom0 accepts only LOGICAL UNIT RESET command. > 5.) Of course, the backend driver performs sanity check of IDs that the > guest already transformed. > > > And I would like to implement pvSCSI frontend driver for Linux by > following mapping/transformation policy. (Please note that another guest > OS such as Windows can take another policy, of cource.) > > - The guest looks identical tree as Dom0 looks except for "host". > (This comes by the reason that arbitrary "host" mapping is difficult > for current Linux implementation.) > - Of course, the guest''s tree is sparse if some LUNs were not attached > to the guest. Linux kernel allows the situation that lun=0 does not > exist, therefore sparse tree is not a problem. > > > Best regards, > > > On Mon, 10 Mar 2008 12:00:59 +0000 > Steven Smith <steven.smith@eu.citrix.com> wrote: > > > > Problems discussed in this context, what the portion of whole SCSI > > > tree should be exposed to guest and how the numbering logic of guest''s > > > tree should be, is very fundamental and difficult, I think. > > > > > > In my current thought, following two options are reasonable solutions. > > > How do you think about them? Could you please comment me? > > > > > > Option 1 (LUN assignment) > > > - Specify the assignment like below: > > > "host1:channel1:id1:lun1"(Dom0) -> "host2:channel2:id2:lun2"(guest) > > > The lun1 must be same as the lun2. > > > - Munging :-) REPORT LUNS command on Dom0 according to the number of > > > LUNs actually attached to the guest. > > I think this is the most flexible approach. > > > > One thing to watch out for here is that some old systems get quite > > confused if lun0 is missing but some of the higher luns are present. > > That''s easy to handle if you allow an arbitrary mapping between dom0 > > and guest luns, but is hard if you require them to be identical. This > > might not be an issue in the cases which we care about, though. > > > > > Option 2 (Target Assignment) > > > - Specify the assignment like below: > > > "host1:channel1:id1"(Dom0) -> "host2:channel2:id2"(guest) > > > All LUNs under id1 are assigned to one guest. > > > - Munging for LUN is not needed. > > > > > > For each option, how host/bus/device reset command should be? > > It''s possible that we''ll be able to get away with just supporting > > LOGICAL UNIT RESET commands, and completely ignoring lower granularity > > resets. I''m not sure how widely supported they are on actual > > hardware, but it might be good enough for a first implementation. You > > might even be able to get away with not supporting any kind of reset > > at all, and just accepting that error recovery is going to suck. > > > > Steven. > > > > > On Wed, 5 Mar 2008 13:34:48 +1100 > > > "James Harper" <james.harper@bendigoit.com.au> wrote: > > > > > > > > This kind of suggests that we should be plumbing things through to the > > > > > guest with a granularity of whole targets, rather than individual > > > > > logical units. The alternative is a much more complicated scsi > > > > > emulation which can fix up the LUN-sensitive commands. > > > > > > > > I think we should probably have the option of doing either. > > > > > > > > > Allowing this kind of mapping sounds reasonable to me. It would also > > > > > make it possible (hopefully) to add support for some of the weirder > > > > > SCSI logical unit addressing modes without changing the frontends > > > > > (e.g. hierarchical addressing with 64 bit LUNs). That might involve a > > > > > certain amount of munging of REPORT LUNS commands in the backend, > > > > > though. > > > > > > > > Not sure how much it matters, but any ''munging'' of scsi commands would > > > > be a real drag for Windows drivers. The Windows SCSI layer is very > > > > strict on lots of things, and is a real pain if you are not talking to a > > > > physical PCI scsi device. > > > > > > > > James > > > > > > > > _______________________________________________ > > > > Xen-devel mailing list > > > > Xen-devel@lists.xensource.com > > > > http://lists.xensource.com/xen-devel > > > > > > Jun Kamada > > > > > > > > > ----- > Jun Kamada > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
James Harper wrote:> This reset stuff seems like a lot of extra work for probably not much > benefit though.This ends up being the crux of it... It all depends on how important the use of scsi is to the consumer. In otherwords, a scsi disk, layered under LVM, filesystems, etc, the nuances of the resets and inter-relations between luns and targets isn''t that meaningful and they will happily live in a world with these things are emulated. However, if the scsi disk is talked to directly via things like sg tools, or things like multipathing software (where failover is disk & target specific), it matters more. And if the scsi disk is handed all the way to the database, it matter *much* *much* more. In fact, this is one reason you find very enterprise databases supported in virtualized environments. All this hints at levels of pass-thru. Granted, you can always take one step at a time. -- james _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jun Kamada wrote:> ----- > 1.) Allow specifying arbitrary mapping between Dom0''s SCSI tree and > Guest''s SCSI tree. This includes "lun". > ( Dom0''s IDs [host1:channel1:id1:lun1] ---> > Guest''s IDs [host2:channel2:id2:lun2] )It would really be nice, when considering a model of FC NPIV, or IOV-based ports, where you allow a model where the mapping can be stronger and done in a single step. E.g. map everything from a particular scsi_host into the DomU. Note: I''m not lobbying for a change in emulation, but rather trying to automate the arbitrary and individual mappings when there is a higher level (relative to the scsi tree) association to the DomU. Note: given that channel # is specific to the host#, and id # is specific to the channel #, and lun # is specific to the id # there''s no real reason why they couldn''t be the same, or at least overlap, with the Dom0 values. It''s all up to whomever is doing the transformation or emulation.> 2.) Guest has responsibility to have mapping and transform between > Dom0''s IDs and Guest''s IDs. It depends on guest OS''s implementation > which level(e.g. only "host" or all of 4-tuples or no-transform) of > mapping/transformation will be supported. > If guest decides to support lun transformation and in case of > "lun1 != lun2", the guest''s frontend driver should maintain LUN > value in CDB data structure.Wow. This seems odd. In my mind, this really is based on the abstraction you choose between the DomU and Dom0. You''re either exporting SCSI Disks, SCSI targets, or SCSI Hosts. Each of these dictates differences in the way the emulation is done. I would have thought the translation always occurs on the Dom0 side.> 3.) As for REPORT LUNS command, Dom0 performs munging.This is in line with my last statement - it''s on the Dom0 side.> 4.) Dom0 accepts only LOGICAL UNIT RESET command.Note: At least for Linux stacks, and I know it''s true for older Windows releases as well, the scsi stacks don''t generate LOGICAL UNIT RESETS. They are either Target Resets or Bus Resets.> 5.) Of course, the backend driver performs sanity check of IDs that the > guest already transformed.And if you''re checking it - why are you (the Dom0) managing the transformation ? Sounds like the work got done twice in your proposal. -- james _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Steven-san, On Thu, 13 Mar 2008 14:30:10 +0000 Steven Smith <steven.smith@eu.citrix.com> wrote:> Backtracking a little, the fundamental goal here is to make some > logical units which are accessible to dom0 appear inside the guest. > Guest operating systems are unlikely to be very happy about having > logical units floating around not attached to scsi hosts, and so we > need (somehow) to come up with a scsi host which has the right set of > logical units attached to it. There are lots of valid use cases in > which there don''t exist physical hosts with the right set of LUs, and > so somebody needs to invent one, and then emulate it. That somebody > will necessarily be either the frontend or the backend. > > Doing the emulation also gives you the option of filtering out things > like TCQ support in INQUIRY commands, which might be supported by the > physical device but certainly isn''t supported by the pvSCSI protocol. > > If you emulate the HBA in the backend, you get a design like this: > > -- There is usually only one xenbus scsi device attached to any given > VM, and that device represents the emulated HBA. > > -- scsifront creates a struct scsi_host (or equivalent) for each > xenbus device, and those provide your interface to the rest of the > guest operating system. > > -- When the guest OS submits a request to the frontend driver, it gets > packaged up and shipped over the ring to the backend pretty much > completely unchanged. > > -- The backend figures out what the request is doing, and either: > > a) Routes it to a physical device, or > b) Synthesises an answer (for things like REPORT LUNS), or > c) Fails the request (for things like WRITE BUFFER), > > as appropriate. > > If you emulate the HBA in the frontend, you get a design which looks > like this: > > -- Each logical unit exposed to the guest has its own xenbus scsi > device. > > -- scsifront creates a single struct scsi_host, representing the > emulated HBA. > > -- When the guest OS submits a request to the frontend driver, it > either: > > a) Routes it to a Xen scsifront and passes it off to the backend, or > b) Synthesises an answer, or > c) Fails the request, > > as appropriate. > > -- When a request reaches the backend, it does a basic check to make > sure that it''s dealing with one of the whitelisted requests, and > then sends it directly to the relevant physical device. The > routing problem is trivial here, because there is only ever one > physical device (struct scsi_device in Linux-speak) associated with > any xenbus device, and the request is just dropped directly into > the relevant request queue. > > The first approach gives you a simple frontend at the expense of a > complicated backend, while the second one gives you a simple backend > at the expense of a complicated frontend. It seems likely that there > will be more frontend implementations than backend, which suggests > that putting the HBA emulation in the backend is a better choice.I agree with your thoughts. On the other hand, I also consider that the "more frontend implementation" suggests each guest OS has each own emulation policy, therefore emulating on the frontend is suitable, maybe. It''s very difficult to decide which approach I should take. Each approach has both good points and bad points. :-< However, I would like to take the first approach, emulation on the backend, according to your and James Smart-san''s advise, and to start implementation. :-)> The main difference from a performance point of view is that the > second approach will use a ring for each device, whereas the first has > a single ring shared across all devices, so you''ll get more requests > in flight with the second scheme. I''d expect that just making the > rings larger would have more effect, though, and that''s easier when > there''s just one of them. >I expect the Netchannel2 for solving performance issues.> Looking through the SCSI spec, I don''t think we''re going to be able to > get away with passing requests through from the frontend all the way > to the physical disk without sanity checking the actual CDB in the > backend. There are a couple of commands which look scary: > > -- CHANGE ALIAS/REPORT ALIAS -- the alias list is shared across > everything in the I_T nexus. That will lead to interesting issues > if you ever have multiple guests modifying it at the same time. > > -- EXTENDED COPY -- allows you to copy arbitrary data between logical > units, sometimes even ones not in the same target device. That''s > obviously going to need to be controlled in a VM setting. > > -- Some mode pages, as modified by MODE SELECT, can apply across > multiple LUs. Even more exciting, the level of sharing can in > principle vary between devices, even for the same page. > > -- WRITE BUFFER commands can be used to change the microcode on a > device. I''ve no idea what the implications of letting an untrusted > user push microcode into a device would be, but I doubt it''s a good > idea. > > -- I''m not sure whether we want to allow untrusted guests to issue SET > PRIORITY commands. > > -- We''ve already been over REPORT LUNS :) > > Plus whatever weird things the various device manufacturers decide to > introduce. > > What this means is that the REPORT LUNS issue fundamentally isn''t > restricted to just the REPORT LUNS command, but instead affects an > unknown and potentially large set of other commands. The only way I > can see to deal with this is to white-list commands individually once > they''ve been confirmed to be safe, and have the backend block any > commands which haven''t been checked yet. That''s going to be a fair > amount of work, and it''ll screw up the whole ``transparent pass > through'''' thing, but I can''t see any other way of solving this problem > safely.I will take the approach that start with mandatory SCSI commands by white-list, and expands the other commands.> (And even that assumes that the hardware people got everything right. > Most devices will be designed on the assumption that only trusted > system components can submit CDBs, so it wouldn''t surprise me if some > of them can be made to do bad things if a malicious CDB comes in. > There''s not really a great deal we can do about this, though.)Best regards, ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi James-san, Thank you for your comments. On Fri, 14 Mar 2008 15:16:44 -0400 James Smart <James.Smart@Emulex.Com> wrote:> Jun Kamada wrote: > > ----- > > 1.) Allow specifying arbitrary mapping between Dom0''s SCSI tree and > > Guest''s SCSI tree. This includes "lun". > > ( Dom0''s IDs [host1:channel1:id1:lun1] ---> > > Guest''s IDs [host2:channel2:id2:lun2] ) > > It would really be nice, when considering a model of FC NPIV, or IOV-based > ports, where you allow a model where the mapping can be stronger and done > in a single step. E.g. map everything from a particular scsi_host into the DomU. > > Note: I''m not lobbying for a change in emulation, but rather trying to automate > the arbitrary and individual mappings when there is a higher level (relative to > the scsi tree) association to the DomU.I have a same thoughts that the interface such like wild-card (for example 1:0:*:*) is needed.> Note: given that channel # is specific to the host#, > and id # is specific to the channel #, > and lun # is specific to the id # > there''s no real reason why they couldn''t be the same, or at least > overlap, with the Dom0 values. It''s all up to whomever is doing > the transformation or emulation. > > > 2.) Guest has responsibility to have mapping and transform between > > Dom0''s IDs and Guest''s IDs. It depends on guest OS''s implementation > > which level(e.g. only "host" or all of 4-tuples or no-transform) of > > mapping/transformation will be supported. > > If guest decides to support lun transformation and in case of > > "lun1 != lun2", the guest''s frontend driver should maintain LUN > > value in CDB data structure. > > Wow. This seems odd. In my mind, this really is based on the abstraction > you choose between the DomU and Dom0. You''re either exporting SCSI Disks, > SCSI targets, or SCSI Hosts. Each of these dictates differences in the way > the emulation is done. > > I would have thought the translation always occurs on the Dom0 side.I would like to take backend side emulation approach as mentioned on another mail. Thank you for your advise.> > 3.) As for REPORT LUNS command, Dom0 performs munging. > > This is in line with my last statement - it''s on the Dom0 side. > > > 4.) Dom0 accepts only LOGICAL UNIT RESET command. > > Note: At least for Linux stacks, and I know it''s true for older Windows > releases as well, the scsi stacks don''t generate LOGICAL UNIT RESETS. > They are either Target Resets or Bus Resets.This is very difficult issue and there is no good solution. :-<> > 5.) Of course, the backend driver performs sanity check of IDs that the > > guest already transformed. > > And if you''re checking it - why are you (the Dom0) managing the transformation ? > Sounds like the work got done twice in your proposal. > > -- james > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-develBest regards, ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > The first approach gives you a simple frontend at the expense of a > > complicated backend, while the second one gives you a simple backend > > at the expense of a complicated frontend. It seems likely that there > > will be more frontend implementations than backend, which suggests > > that putting the HBA emulation in the backend is a better choice. > I agree with your thoughts. On the other hand, I also consider that > the "more frontend implementation" suggests each guest OS has each own > emulation policy, therefore emulating on the frontend is suitable, > maybe. It''s very difficult to decide which approach I should take. > Each approach has both good points and bad points. :-< > > However, I would like to take the first approach, emulation on the > backend, according to your and James Smart-san''s advise, and to start > implementation. :-)It''s a tricky decision, but I think this is the best path.> > The main difference from a performance point of view is that the > > second approach will use a ring for each device, whereas the first has > > a single ring shared across all devices, so you''ll get more requests > > in flight with the second scheme. I''d expect that just making the > > rings larger would have more effect, though, and that''s easier when > > there''s just one of them. > I expect the Netchannel2 for solving performance issues.It''ll avoid this particular issue, yes.> > Looking through the SCSI spec, I don''t think we''re going to be able to > > get away with passing requests through from the frontend all the way > > to the physical disk without sanity checking the actual CDB in the > > backend. There are a couple of commands which look scary:...> > What this means is that the REPORT LUNS issue fundamentally isn''t > > restricted to just the REPORT LUNS command, but instead affects an > > unknown and potentially large set of other commands. The only way I > > can see to deal with this is to white-list commands individually once > > they''ve been confirmed to be safe, and have the backend block any > > commands which haven''t been checked yet. That''s going to be a fair > > amount of work, and it''ll screw up the whole ``transparent pass > > through'''' thing, but I can''t see any other way of solving this problem > > safely. > I will take the approach that start with mandatory SCSI commands by > white-list, and expands the other commands.Thank you. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel