Add vscsiif.h Signed-off-by: Tomonari Horikoshi <t.horikoshi@jp.fujitsu.com> Signed-off-by: Jun Kamada <kama@jp.fujitsu.com> Signed-off-by: Akira Hayakawa <hayakawa.akira@jp.fujitsu.com> ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jun, Assuming an eventual merge with Xen proper, could you put this file in xen/include/public/io off the root of the hg tree, instead of inside the Linux specific include files? Thanks James> -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel- > bounces@lists.xensource.com] On Behalf Of Jun Kamada > Sent: Monday, 18 February 2008 21:11 > To: xen-devel@lists.xensource.com > Cc: kama@jp.fujitsu.com > Subject: [Xen-devel] [Patch 3/7] pvSCSI driver > > Add vscsiif.h > > Signed-off-by: Tomonari Horikoshi <t.horikoshi@jp.fujitsu.com> > Signed-off-by: Jun Kamada <kama@jp.fujitsu.com> > Signed-off-by: Akira Hayakawa <hayakawa.akira@jp.fujitsu.com> > > ----- > Jun Kamada_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Jun, > > Assuming an eventual merge with Xen proper, could you put this file in > xen/include/public/io off the root of the hg tree, instead of insidethe> Linux specific include files? >Nevermind. You have another patch that puts another copy in there too. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> diff -r f76e90b4f7ad -r a8a56c423e78 include/xen/interface/io/vscsiif.h > --- /dev/null Thu Jan 01 00:00:00 1970 +0000 > +++ b/include/xen/interface/io/vscsiif.h Thu Feb 14 11:14:46 2008 +0900...> + > +#define VSCSIIF_CMND_SCSI 1 /* scsi */ > +#define VSCSIIF_CMND_SCSI_RESET 2 /* scsi */Do you ever actually use the reset command?> + > +/* ---------------------------------------------------------------------- > + Definition of Ring Structures > + ---------------------------------------------------------------------- */ > + > +#define VSCSIIF_DEFAULT_CAN_QUEUE 256What does this mean?> +#define VSCSIIF_MAX_COMMAND_SIZE 16Hmm. SBC-3 includes a couple of 32 byte commands, which won''t fit into this request structure. Admittedly, none of them look terribly useful, but it would be unfortunate to discover that we have to do another PV scsi protocol in a year''s time because some critical request doesn''t fit.> +#define VSCSIIF_SG_TABLESIZE 27Where did 27 come from?> +struct vscsiif_request { > + uint16_t rqid; > + uint8_t cmd; > + /* SCSI */ > + uint8_t cmnd[VSCSIIF_MAX_COMMAND_SIZE]; > + uint8_t cmd_len;So cmd_len applies to cmnd rather than cmd, yes?> + uint16_t id, lun, channel;Okay, so each of your PV devices maps to the moral equivalent of an HBA, yes? You can have several targets attached to a single PV device?> + uint8_t sc_data_direction;What does this mean?> + uint8_t use_sg;From looking at your front and back implementations, I think this is actually a count of the number of segments you have in your SG table, yes? If so, it could probably cope with being renamed to something a bit more appropriate. Also, I think you have two bytes of implicit padding here, don''t you? We usually try to avoid that in shared structures, because (a) it''s confusing, (b) it''s wasteful of ring space, and (c) it can cause portability problems if guests are built with different compilers.> + uint32_t request_bufflen;Is this not redundant with the scatter-gather list, below? Or is there supposed to be some padding on the SG list which isn''t included in the buffer length?> + /*int32_t timeout_per_command;*/ > + struct scsiif_request_segment { > + grant_ref_t gref; > + uint16_t offset; > + uint16_t length; > + } seg[VSCSIIF_SG_TABLESIZE]; > +};That gives you a 248 byte request structure, doesn''t it? That means you can only have 16 requests outstanding on a single page ring, which may not be enough to get good performance.> +#define VSCSIIF_SENSE_BUFFERSIZE 96Where did 96 come from? SPC-4 revision 10 section 6.28 seems to imply that sense data can be up to 252 bytes.> + > +struct vscsiif_response { > + uint16_t rqid;Another two bytes of implicit padding.> + int32_t rslt; > + uint8_t sense_len; > + uint8_t sense_buffer[VSCSIIF_SENSE_BUFFERSIZE]; > +};For this sort of ring, responses have to be padded up to match request sizes. This means that you''ve effectively got another 143 bytes of padding after the end of this structure, which could be used to increase the sense buffer size to 239 bytes (240 if you rearrange the response fields to avoid the padding). That''s still not really big enough, but it''s at least closer.> + > +DEFINE_RING_TYPES(vscsiif, struct vscsiif_request, struct vscsiif_response); > + > +#endifIt might be worth considering abandoning the use of fixed-sized request and response structures, and instead using a more byte-oriented protocol in which requests can be almost any size. This avoids the need to size things like the CDB area to match the largest possible request, because each request will be precisely as large as it needs to be. The main downsides are: -- It''s much harder to access structures directly while they''re on the ring. On the other hand, you almost certainly need to copy the request into backend-local storage before processing it anyway, to avoid TOCTOU races and the associated security problems, and making that copy handle the ring is fairly easy. -- If you allow requests and responses to be different sizes, you need to put them on separate rings, so that you don''t find that e.g. you''ve had a series of responses which were bigger than their associated requests, and the response area has now overflowed on top of some requests. Again, this isn''t terribly difficult to deal with. -- The ring.h macros won''t work. You''ll need to invent your own scheme. The main alternative here is request chaining, like the netchannel protocol. This is generally somewhat tricky to get right, and I''d expect that just going for variable-sized requests would be easier. If you do decide to go for this then you may want to pad requests to a multiple of 8 bytes so that the memcpy()s are suitable aligned. I think the current netchannel2 plan also calls for variable-sized messages with split front->back and back->front rings. It might be possible to share some code there (although at present there doesn''t exist any code to share). I''d also strongly recommend supporting multi-page rings. That will allow you to have more requests in flight at any one time, which should lead to better performance. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I think the current netchannel2 plan also calls for variable-sized > messages with split front->back and back->front rings. It might be > possible to share some code there (although at present there doesn''t > exist any code to share). > > I''d also strongly recommend supporting multi-page rings. That will > allow you to have more requests in flight at any one time, whichshould> lead to better performance.The PV SCSI stuff is great work and I''m very keen to get it into mainline. However, I''d very much like to see it use the same flexible ring structure that''s being used for netchannel2. The main features are as followed: * A pair of rings, one for communication in each direction (responses don''t go in the same ring as per original netchannel) * The rings are fixed in size at allocation time, but the area of memory they are allocated in may be bigger than a page, i.e. a list of grant refs is communicated over xenbus. * The data placed on the rings consists of ''self describing'' messages containing a type and a length. Messages simply wrap over the ring boundaries. The producer simply needs to wait until there is enough free space on the ring before placing a message. * Both the frontend and the backend remove data from the rings and place it in their own internal data structures eagerly. This is in contrast to the netchannel where free buffers and TX packets were left waiting on the rings until they were required. Use of the eager approach enables control messages to be muxed over the same ring. Both ends will advertise the number of outstanding requests they''re prepared to queue internally using a message communicated over the ring, and will error attempts to queue more. Since the backend needs to copy the entries before verification anyhow this minimal additional overhead. Best, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Ian-san and Steven-san, Thank you for comments. In fact, previous version of pvSCSI driver used 2 rings for frontend to backend and backend to frontend communication respectively. The backend also queued requests from frontend and released the ring immediately. This may be very similer concept to the Netchannel2. However, this version went back to simple 1 ring architecture as same as VBD. We expect the performance will not be degraded because many transactions are distributed into multiple-rings. We would like to enhance it as second step after this version is merged into Xen tree, if possible. Best regards, On Wed, 27 Feb 2008 12:23:28 -0000 "Ian Pratt" <Ian.Pratt@eu.citrix.com> wrote:> > I think the current netchannel2 plan also calls for variable-sized > > messages with split front->back and back->front rings. It might be > > possible to share some code there (although at present there doesn''t > > exist any code to share). > > > > I''d also strongly recommend supporting multi-page rings. That will > > allow you to have more requests in flight at any one time, which > should > > lead to better performance. > > > The PV SCSI stuff is great work and I''m very keen to get it into > mainline. However, I''d very much like to see it use the same flexible > ring structure that''s being used for netchannel2. The main features are > as followed: > > * A pair of rings, one for communication in each direction (responses > don''t go in the same ring as per original netchannel) > > * The rings are fixed in size at allocation time, but the area of memory > they are allocated in may be bigger than a page, i.e. a list of grant > refs is communicated over xenbus. > > * The data placed on the rings consists of ''self describing'' messages > containing a type and a length. Messages simply wrap over the ring > boundaries. The producer simply needs to wait until there is enough free > space on the ring before placing a message. > > * Both the frontend and the backend remove data from the rings and place > it in their own internal data structures eagerly. This is in contrast to > the netchannel where free buffers and TX packets were left waiting on > the rings until they were required. Use of the eager approach enables > control messages to be muxed over the same ring. Both ends will > advertise the number of outstanding requests they''re prepared to queue > internally using a message communicated over the ring, and will error > attempts to queue more. Since the backend needs to copy the entries > before verification anyhow this minimal additional overhead. > > > Best, > Ian > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-develJun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> In fact, previous version of pvSCSI driver used 2 rings for frontend > to backend and backend to frontend communication respectively. The > backend also queued requests from frontend and released the ring > immediately. This may be very similer concept to the Netchannel2.Cool, that sounds better. Did you still have fixed length command structs, or allow variable length messages? (I''m very keen we use the latter) Also, were the rings multi-page?> We would like to enhance it as second step after this version is > merged into Xen tree, if possible.The problem with this approach is that it would change the ABI. The ABI isn''t guaranteed in the unstable tree, but it would have to be locked before 3.3 could be released (or the code removed/disabled prior to release). It''s preferable to get stuff like this fixed up before it goes in the tree as in our experience developers often get retasked by their management to other work items as soon as the code goes in, and don''t get around to the fixups. Against that, getting it in the tree exposes it to more testing earlier, which is helpful. If you''re confident that the former is not going to happen to you, let''s talk about which minor cleanups are important. Thanks very much for your work on this project! Best, Ian> > > Best regards, > > > On Wed, 27 Feb 2008 12:23:28 -0000 > "Ian Pratt" <Ian.Pratt@eu.citrix.com> wrote: > > > > I think the current netchannel2 plan also calls for variable-sized > > > messages with split front->back and back->front rings. It mightbe> > > possible to share some code there (although at present there > doesn''t > > > exist any code to share). > > > > > > I''d also strongly recommend supporting multi-page rings. Thatwill> > > allow you to have more requests in flight at any one time, which > > should > > > lead to better performance. > > > > > > The PV SCSI stuff is great work and I''m very keen to get it into > > mainline. However, I''d very much like to see it use the sameflexible> > ring structure that''s being used for netchannel2. The main features > are > > as followed: > > > > * A pair of rings, one for communication in each direction(responses> > don''t go in the same ring as per original netchannel) > > > > * The rings are fixed in size at allocation time, but the area of > memory > > they are allocated in may be bigger than a page, i.e. a list ofgrant> > refs is communicated over xenbus. > > > > * The data placed on the rings consists of ''self describing''messages> > containing a type and a length. Messages simply wrap over the ring > > boundaries. The producer simply needs to wait until there is enough > free > > space on the ring before placing a message. > > > > * Both the frontend and the backend remove data from the rings and > place > > it in their own internal data structures eagerly. This is incontrast> to > > the netchannel where free buffers and TX packets were left waitingon> > the rings until they were required. Use of the eager approachenables> > control messages to be muxed over the same ring. Both ends will > > advertise the number of outstanding requests they''re prepared to > queue > > internally using a message communicated over the ring, and willerror> > attempts to queue more. Since the backend needs to copy the entries > > before verification anyhow this minimal additional overhead. > > > > > > Best, > > Ian > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > Jun Kamada >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/2/08 09:23, "Ian Pratt" <Ian.Pratt@eu.citrix.com> wrote:>> In fact, previous version of pvSCSI driver used 2 rings for frontend >> to backend and backend to frontend communication respectively. The >> backend also queued requests from frontend and released the ring >> immediately. This may be very similer concept to the Netchannel2. > > Cool, that sounds better. Did you still have fixed length command > structs, or allow variable length messages? (I''m very keen we use the > latter)The netchannels2 model is 16-byte granularity ''cells'' in the comms rings. A descriptor can span multiple cells and its length is implicitly known from its type (which resides in the first byte of the first cell). Individual cells do not wrap in the ring (since the ring is a multiple of 16 bytes in size) but multi-cell descriptors may wrap. Further, netchannel2 allows descriptors to be chained into longer messages. I think Jose is going to use the msb of the descriptor-identifier byte for this purpose. We''ve toed-and-froed on details of this protocol a few times, so I''ve cc''ed Jose in case the current state of the design is different from what I believe. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> In fact, previous version of pvSCSI driver used 2 rings for frontend > to backend and backend to frontend communication respectively. The > backend also queued requests from frontend and released the ring > immediately. This may be very similer concept to the Netchannel2.Clean support for variable-sized requests is also quite important (whether netchannel1-style chaining of fixed size requests or true variable-size requests). This is even more useful for SCSI, because things like the sense buffer size can vary quite dramatically between requests, and sizing it for the worst case would be a very wasteful use of ring space.> However, this version went back to simple 1 ring architecture as same > as VBD. We expect the performance will not be degraded because many > transactions are distributed into multiple-rings.I''m not quite sure what you mean here. You currently have a single ring per LUN; are you expecting heavy workloads to always be spread over several LUNs?> We would like to enhance it as second step after this version is > merged into Xen tree, if possible.As Ian points out, stuff in the stable trees is supposed to have a stable ABI, which can make some changes difficult. Of course, we might be able to declare the tree stable except for SCSI support, but that''s a bit of a change from our current model. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Of course, we might be able to declare the tree stable except for SCSI > support, but that''s a bit of a change from our current model. >The pieces of pvSCSI are (I think): . xend changes . scripts in /etc/xen/scripts . linux backend . linux frontend The linux backend and linux frontend could be built as modules outside of the main linux tree (eg just with linux-headers-2.6.18 under Debian - I''ve done this for the backend at least). The scripts can be added easily enough. Would it be possible to add a plugin functionality to xend and the supporting libraries? If so, then it becomes very easy to build it all without having to worry so much about getting it in to the main tree... Just a thought. James _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Of course, we might be able to declare the tree stable except for SCSI > > support, but that''s a bit of a change from our current model. > > The pieces of pvSCSI are (I think): > . xend changes > . scripts in /etc/xen/scripts > . linux backend > . linux frontend > > The linux backend and linux frontend could be built as modules outside > of the main linux tree (eg just with linux-headers-2.6.18 under Debian - > I''ve done this for the backend at least). > > The scripts can be added easily enough.I was wondering if we could follow the kernel.org Linux policy of marking the drivers EXPERIMENTAL until the interface had stabilised. The problem there is possibly that the interface headers would also be in the Xen tree, where we don''t have a way of marking stuff in that way... If the only user is Linux, maybe it doesn''t matter. FWIW kernel.org occasionally exposes non-stable interfaces to userspace with a warning not to use them (not ideal practice though).> Would it be possible to add a plugin functionality to xend and the > supporting libraries? If so, then it becomes very easy to build it all > without having to worry so much about getting it in to the main tree...You''d need to add some kind of plugin support to both Xend and xm separately, The Xend one would do the business of accepting configuration details and setting up the pvSCSI channel. The xm one would add config file parsing for pvSCSI and send the requests to Xend. It certainly doesn''t seem like it would be conceptually hard to fit into the model already used by Xend. It''s harder to say just how difficult it''d be to shoehorn into the code but I wouldn''t have thought it''d actually need to be *that* big a change. Python is actually rather well suited for this sort of thing. The original Xend architecture was IIRC designed from almost the start to make plugin-type operation with code reuse fairly easy, which was a good decision and is partly why the code was an initial jump in complexity. That modularity has been largely retained in the design, I think; it might make sense for somebody to look more closely at plugin-ifying Xend. It''d make it easier for external developers to add device types, and it''d mean experimental device channels could be easily tested before merge into mainline Xen. Cheers, Mark -- Push Me Pull You - Distributed SCM tool (http://www.cl.cam.ac.uk/~maw48/pmpu/) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Of course, we might be able to declare the tree stable except for SCSI > > support, but that''s a bit of a change from our current model. > The pieces of pvSCSI are (I think): > . xend changes > . scripts in /etc/xen/scripts > . linux backend > . linux frontend > > The linux backend and linux frontend could be built as modules outside > of the main linux tree (eg just with linux-headers-2.6.18 under Debian - > I''ve done this for the backend at least). > > The scripts can be added easily enough. > > Would it be possible to add a plugin functionality to xend and the > supporting libraries? If so, then it becomes very easy to build it all > without having to worry so much about getting it in to the main tree...I have no idea; xend scares and confuses me. Making it easier to add new device classes to xend would be a good thing, but I don''t think it''s the most direct way of getting pvSCSI merged. The next stable release of Xen isn''t due for another five months or so, and if we can''t decide on a sane interface by then then there''s something fairly seriously wrong. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi all, Sorry for my limited reply. I could not follow high speed discussion due to barrier of language and timezone. :-) On Thu, 28 Feb 2008 12:06:37 +0000 Steven Smith <steven.smith@eu.citrix.com> wrote:> > However, this version went back to simple 1 ring architecture as same > > as VBD. We expect the performance will not be degraded because many > > transactions are distributed into multiple-rings. > I''m not quite sure what you mean here. You currently have a single > ring per LUN; are you expecting heavy workloads to always be spread > over several LUNs?Our focussing domain is (was?) such an environment. However, it''s not only one as you mentioned. I understood that we have to improve performance for each LUN access.> > We would like to enhance it as second step after this version is > > merged into Xen tree, if possible. > As Ian points out, stuff in the stable trees is supposed to have a > stable ABI, which can make some changes difficult. > > Of course, we might be able to declare the tree stable except for SCSI > support, but that''s a bit of a change from our current model.I would like to merge our pvSCSI code into Xen tree. Therefore, I am going to fix the code according to many comments I received. Could you teach me the deadline of fixing the code? (Someone wrote previous mail maybe...) ----- Jun Kamada Linux Technology Development Div. Server Systems Unit Fujitsu Ltd. kama@jp.fujitsu.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > > However, this version went back to simple 1 ring architecture as same > > > as VBD. We expect the performance will not be degraded because many > > > transactions are distributed into multiple-rings. > > I''m not quite sure what you mean here. You currently have a single > > ring per LUN; are you expecting heavy workloads to always be spread > > over several LUNs? > Our focussing domain is (was?) such an environment. However, it''s not > only one as you mentioned. I understood that we have to improve > performance for each LUN access.Thanks.> > > We would like to enhance it as second step after this version is > > > merged into Xen tree, if possible. > > As Ian points out, stuff in the stable trees is supposed to have a > > stable ABI, which can make some changes difficult. > > > > Of course, we might be able to declare the tree stable except for SCSI > > support, but that''s a bit of a change from our current model. > I would like to merge our pvSCSI code into Xen tree. Therefore, I am > going to fix the code according to many comments I received. Could you > teach me the deadline of fixing the code? (Someone wrote previous mail > maybe...)I think the 3.3 release is planned for late June, so you''ve got plenty of time. Steven. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Steven-san, On Fri, 29 Feb 2008 13:47:02 +0000 Steven Smith <steven.smith@eu.citrix.com> wrote:> > > Of course, we might be able to declare the tree stable except for SCSI > > > support, but that''s a bit of a change from our current model. > > I would like to merge our pvSCSI code into Xen tree. Therefore, I am > > going to fix the code according to many comments I received. Could you > > teach me the deadline of fixing the code? (Someone wrote previous mail > > maybe...) > I think the 3.3 release is planned for late June, so you''ve got plenty > of time.OK, I have relatively enough time to modify the code. However, I am worried about difficulty of modification for Netchannel2. I suppose it will be big change for our code. So, I would like to get a sample code of it, if possible. Could you tell me when I can get the sample code of the Netchannel2 ? ----- Jun Kamada _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel