sven.kretzschmar@gmx.de
2003-Oct-13 22:41 UTC
[Xen-devel] Further problems with HyperSCSI and vbds...
After applying the new patches / Changesets from the Xen project team (thanks again :) vbds and vds are working with local harddisks (e.g. /dev/hda) as expected. Also, I am able now to load and use the HyperSCSI module... ...in a quite restricted way :-( As long as there is no vbd involved, everything works as expected: *) In domain0 I can fdisk /dev/sda (which is "emulated" by the HyperSCSI kernel module) *) I can put a filesystem on /dev/sdaX and mount it in domain0. But as soon as I use a vbd to access it(via attaching a physical /dev/sdaX partition to the vbd _or_ via attaching a vd, which uses a /dev/sdaX partition, to the vbd), it does not work anymore. (Even when using xen_refresh_dev.) Fdisk can not open /dev/xvda in this case (unable to read). mkfs.ext2 starts, but then complains about a "short read" on block 0. It then continues to write the filesystem, but I found out, that it does not really access the physical disk via HyperSCSI at all. It also seems that it does not even access the /dev/sda "fake" device on the local server, because there is no network access from the client to the server, where the physical disk is located. Trying to mount /dev/xvda then results again in "short read on block 0" and being unable to read the superblock, etc. I think the problem here is, that HyperSCSI attaches /dev/sda without really knowing anything about Xen ;-) Xen also knows nothing about this "faked" physical SCSI device on /dev/sda, only xenolinux does, because of the loaded HyperSCSI kernel module driver. So, perhaps the virtual block driver in xenolinux tries to access the faked physical /dev/sda device via Xen, but as Xen does not know it, this somehow does not really work. (Btw: Shouldn''t this result in some printk() error messages in the xenolinux virtual block driver ?) The virtual block driver in xenolinux should instead recognize that this is not a physical device registered with Xen and should try to forward these disk requests and ioctls directly to the /dev/sda(X) device, instead of sending it to Xen. Of course, this should only by allowed for devices (or device drivers) loaded in domain0 ?? Of course these are only assumptions and loud thoughts ... ;-) I think, one has at least to change some code in xl_block.c and xl_scsi.c to reach that goal. Perhaps one could try to register the scsi-devices which are provided by the HyperSCSI module as xenolinux virtual scsi block devices ? (The code in xlscsi_init(xen_disk_info_t *xdi) in xl_scsi.c makes me think this could perhaps work... I know that this might violate the design principle of Xen to be the only component which has direct access to the hardware. However, the /dev/sd* devices from HyperSCSI are not really local hardware, it''s only a "faked" physical disk. I would be interested in some thoughts about that from the Xen project team and list readers, because I consider HyperSCSI to be an important feature for xenolinux domains. It would allow you to store the whole filesystems of a lot of domains from several physical machines, which are running xen/xenolinux, on one big fileserver. As HyperSCSI is a very quick and efficient protocol / implementation, this would be a lot quicker and remarkably more efficient than using NFS for the same task. Also HyperSCSI can use not only SCSI devices (disks, tapes, etc.) but also IDE devices like IDE-disks and IDE-CD-Writers as real physical devices to be accessed over the LAN ( http://nst.dsi.a-star.edu.sg/mcsa/hyperscsi ). Sorry for the little HyperSCSI hype, I only wanted to explain my interest in HyperSCSI in connection with Xen. I hope there''s a not so complicated solution for this problem. Regards, Sven -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++ ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Keir Fraser
2003-Oct-14 07:17 UTC
[Xen-devel] Re: Further problems with HyperSCSI and vbds...
> I think the problem here is, that HyperSCSI attaches /dev/sda > without really knowing anything about Xen ;-) > Xen also knows nothing about this "faked" physical SCSI > device on /dev/sda, only xenolinux does, because of the loaded > HyperSCSI kernel module driver.Yes, you''ve hit the nail on the head. Although you construct VBDs out of carved up hd* and sd* partitions, those partitions have to be on devices that Xen knows about. So, when you try and access the VBD Xen maps the request to a non-existent local SCSI disc :-)> So, perhaps the virtual block driver in xenolinux tries to access the > faked physical /dev/sda device via Xen, but as Xen does not know it, > this somehow does not really work. (Btw: Shouldn''t this result in some > printk() error messages in the xenolinux virtual block driver ?)I''ll add the debugging back into the xenolinux driver. In any case, a bit more noise from our development tree would be no bad thing!> The virtual block driver in xenolinux should instead recognize that > this is not a physical device registered with Xen and should try to > forward these disk requests and ioctls directly to the /dev/sda(X) device, > instead of sending it to Xen. > Of course, this should only by allowed for devices (or device drivers) > loaded in domain0 ??Why do you want to construct VBDs if only domain 0 is going to access them? However, if that''s all you want to do then yes --- modificatiosn to xl_scsi.c will suffice.> I know that this might violate the design principle of Xen to be the > only component which has direct access to the hardware. > However, the /dev/sd* devices from HyperSCSI are not really local > hardware, it''s only a "faked" physical disk.DOM0 is allowed unrestricted access to hardware already. Otherwise X wouldn''t work :-)> I would be interested in some thoughts about that from the Xen project > team and list readers, because I consider HyperSCSI to be an important > feature for xenolinux domains. > It would allow you to store the whole filesystems of a lot of domains from > several physical machines, which are running xen/xenolinux, on one big > fileserver. > As HyperSCSI is a very quick and efficient protocol / implementation, this > would be a lot quicker and remarkably more efficient than using NFS for > the same task.There are a few options to allow HyperSCSI access from all domains: 1. NFS-mount HyperSCSI partitions via domain 0 (this will work already) 2. NFS-mount VBDs which map onto chunks of HyperSCSI disk, via domain 0 (this might work if you hack DOM0''s xl_scsi.c a bit so that DOM0 VBDs can map onto HyperSCSI block devices). 3. Add proper support for HyperSCSI to Xen. You''d need some scheme for validating transmits which use the HyperSCSI transport, and demusing received frames to the appropriate domain. I don''t know anything about teh protocol, so I don''t know how easy this would be (e.g. how much state Xen would need to keep lying around). -- Keir ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt
2003-Oct-14 08:06 UTC
Re: [Xen-devel] Re: Further problems with HyperSCSI and vbds...
Sven, Sorry it''s taken so long for us to understand what you''re trying to do.> 3. Add proper support for HyperSCSI to Xen. You''d need some scheme > for validating transmits which use the HyperSCSI transport, and > demusing received frames to the appropriate domain. I don''t know > anything about teh protocol, so I don''t know how easy this would be > (e.g. how much state Xen would need to keep lying around).The main thing would be turning the VFR into more of an L2 switch than a router, with each domain having its own MAC[*]. We could then add a rule to grant a domain TX permission for a particular 802 protocol number. HyperSCSI presumably has some high-level server-based authentication and privilege verification? If so, it should be pretty straightforward. [*] each domain already has its own MAC for purposes of DHCP, but for normal TX packets we currently replace the MAC with the Ethernet card''s real MAC. This was congruent with the view of the VFR as a router rather than a switch, and also to keep the local sys-admins happy who would otherwise see potentially thousands of new MAC addresses. Ian ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Sven Kretzschmar
2003-Oct-15 17:13 UTC
[Xen-devel] Solution for problems with HyperSCSI and vbds ?
I have thought a little bit about Ian''s and Keir''s proposals how to make HyperSCSI and vbds work (also in domains > 0).> 1. NFS-mount HyperSCSI partitions via domain 0 (this will work >already)Although this is possible as a temporary workaround, it would take away the advantages HyperSCSI offers (speed, very low overhead, especially with the configuration ;-); also it would add another point of failiure or instability by using NFS for this. Also, if you want to start more than 15 domains on a server, the direct mounting of /dev/sd* devices via NFS (vbds don''t work yet with HyperSCSI), would hit the frontier of the maximum of allowed and mapped sd*X devices (e.g. only sda1 - sda15 work, the next minor device no. is already for sdb. And my Raid10 Array used for testing only exports one 160G /dev/sda device). That means, in order to break this frontier one would definitely need vbds via vds. Or perhaps a COW device files somewhen in the future (But of course this would also hit performance). So I dont really favour this possibility.> 2. NFS-mount VBDs which map onto chunks of HyperSCSI disk, via domain >0 (this might work if you hack DOM0''s xl_scsi.c a bit so that DOM0 >VBDs can map onto HyperSCSI block devices).Much better but still there''s NFS involved for no real need, only as a workaround. So, I don''t favour this for the same reasons as in 1.)>> 3. Add proper support for HyperSCSI to Xen. You''d need some scheme >> for validating transmits which use the HyperSCSI transport, and >> demusing received frames to the appropriate domain. I don''t know >> anything about teh protocol, so I don''t know how easy this would be >> (e.g. how much state Xen would need to keep lying around). > >[Ian:]The main thing would be turning the VFR into more of an L2 switch >than a router, with each domain having its own MAC[*]. We could then >add a rule to grant a domain TX permission for a particular 802 >protocol number. HyperSCSI presumably has some high-level >server-based authentication and privilege verification? If so, it >should be pretty straightforward.This is much better, though more complicated too ;-) However, I wouldn''t do this based on protocols or routing HyperSCSI ether packets or the need to use HyperSCSI kernel modules in domains > 0 (Perhaps too complicated and only a special solution for this problem). Here are my first thoughts about a solution: I try to describe it roughly from the view of domain #1 (not DOM0): Preconditions: /dev/xvda is a vbd which is attached to a vd which is mapped to a HyperSCSI partition (e.g. /dev/sda5). A simpler case would be a direct mapping from a vbd to /dev/sda5; without a vd in between. I will only use this case for the sake of simplicity now...: Some Application in domain #1 accesses /dev/xvda. The virtual block device driver mapps this to /dev/sda and forwards the request to Xen (perhaps it also tags this request as a request to a "special device" before forwarding the request to Xen). Xen realizes that there is no physical device connected to /dev/sda (or registered with Xen ? Maybe it can then also recognize that the request was marked as targeting a "special device"). Because of that condition, it forwards this block device request to DOM0 now in which a "request handler" kernel module will listen for block device requests which may be forwarded to DOM0 from Xen to be handled in DOM0 (It will need to register a callback function with Xen in order to do so). This callback function is now called by Xen to forward the block device request to the kernel module loaded in DOM0. This "block device request handler" kernel module checks the data for the block device request (e.g. ioctl, read or write) and just tries to execute the requested operation on the designated device (/dev/sda in our example) in DOM0 and gives back the result and/or data to Xen. If there''s no device driver attached to /dev/sda (like, in this example, the HyperSCSI kernel module) the handler module will return the error condition to Xen. Xen in turn hands back the result code and data to the virtual block device driver in Domain #1 which forwards it to the application who triggered the block device access request. Done. Sorry for this prosa above being a little bit unspecific, I currently don''t have that much time to make it shine... ;-) I have tried to simplify some of the internals of Xen, also because I don''t understand it completly yet ;-) If I made errors, please feel free to correct me. This is somewhat similar to the proposal to load "normal" linux device drivers exclusively in DOM0 to access hardware (also by Domains > 0) via a lot of already written linux device drivers, instead of letting Xen handle all access to hardware alone. But in this case only for block devices. I would like to do a _little_ restricted case study for that by using the HyperSCSI / vbd problem as an example. However I would need some "small" help and hints from the Xen team: What would be the cleanest way to do the communication between a kernel module loaded in DOM0 and Xen ? Does the Xen-API offers functions for registering callbacks (to kernel modules) ? If yes, what are the names and how are they used ? Where are these functions defined (in which source files) ? Can these callbacks be done asynchronously somehow (That is, Xen should be able to call the kernel module at some time to initiate the block device request in DOM0, and then be called back later asynchronously with the results by DOM0, because device accesses can take some time....) ? Are there any special caveats I have to pay attention at ? I think all this might also be interesting for accessing other block devices for which no Xen drivers exists. A very similar thing could perhaps also be done for character devices ? Thanks in advance for any help you could offer me to get this working... :) I would of course also like to hear some opinions or concerns from other members of this list or the Xen team about my above proposed possible(?) solution ;-) Regards, Sven ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt
2003-Oct-15 17:48 UTC
Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?
> >[Ian:]The main thing would be turning the VFR into more of an L2 switch > >than a router, with each domain having its own MAC[*]. We could then > >add a rule to grant a domain TX permission for a particular 802 > >protocol number. HyperSCSI presumably has some high-level > >server-based authentication and privilege verification? If so, it > >should be pretty straightforward. > > This is much better, though more complicated too ;-) > > However, I wouldn''t do this based on protocols or routing HyperSCSI > ether packets or the need to use HyperSCSI kernel modules in > domains > 0 (Perhaps too complicated and only a special solution for this > problem).I still like my proposal ;-) It''s pretty straight forward to implement, is relatively clean, and will have good performance. However, if you''re exporting a single disk from the HyperSCSI server its not much help.> The virtual block device driver mapps this to /dev/sda and forwards > the request to Xen (perhaps it also tags this request as a request > to a "special device" before forwarding the request to Xen). > Xen realizes that there is no physical device connected to /dev/sda > (or registered with Xen ? Maybe it can then also recognize that > the request was marked as targeting a "special device"). > Because of that condition, it forwards this block device request > to DOM0 now in which a "request handler" kernel module will listen for > block device requests which may be forwarded to DOM0 from > Xen to be handled in DOM0 (It will need to register a callback > function with Xen in order to do so).I think your best solution is not to use Xen vbd''s at all. If you don''t like NFS, how about having domains >0 using "enhanced network block devices" which talk to a simple server running in domain0. The storage for the nbd server can be files, partitions or logical volumes on /dev/sda. This should require writing no code, and will give pretty good performance. It gives good control over storage allocations etc. http://www.it.uc3m.es/~ptb/nbd/ [It appears to work as a rootfs, but I haven''t verified] Best, Ian ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Sven Kretzschmar
2003-Oct-15 21:19 UTC
Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?
On 15.10.2003 at 18:48 Ian Pratt wrote:>> >[Ian:]The main thing would be turning the VFR into more of an L2 switch >> >than a router, with each domain having its own MAC[*]. We could then >> >add a rule to grant a domain TX permission for a particular 802 >> >protocol number. HyperSCSI presumably has some high-level >> >server-based authentication and privilege verification? If so, itYes, it has (even encryption, if needed).>> >should be pretty straightforward. >> >> This is much better, though more complicated too ;-) >> >> However, I wouldn''t do this based on protocols or routing HyperSCSI >> ether packets or the need to use HyperSCSI kernel modules in >> domains > 0 (Perhaps too complicated and only a special solution for this >> problem). > >I still like my proposal ;-):)) Sorry for being so rude on it ;-) ..Besides my other points mentioned, I just want to avoid the necessity to load a kernel module in domains > 0 in order to use the /dev/sda device. It should just be usable like a standard hardware device supported by the kernel -- KISS-Principle, at least from the point of view of domains > 0 or clients using domains >0. (Yes, sometimes I am a very restrictive purist ;-).> >It''s pretty straight forward to implement, is relatively clean, >and will have good performance.I would like to build up a "production strength" environment with as high remote access disk performance (speed) as reasonably possible. But if I accept thinking about loading a kernel module in domains > 0 in order to get HyperSCSI attached devices to work somehow, then your proposal (VFR routing of ether packets to and from domains > 0) is perhaps likely to result in better performance than using enbd devices additionaly. However, I somehow don''t like the thought of 100+ domains from e.g. 3 different physical servers to connect to the HyperSCSI physical server directly. Thinking about just 3 DOM0 HyperSCSI clients connecting to the HyperSCSI-Server directly feels somehow more comfortable. (e.g. a lot easier administration, less points of failiure.) The 3 DOM0s in this example can then export the HyperSCSI device(s) via whatever means to the domains > 0.> >However, if you''re exporting a single disk from the HyperSCSI >server its not much help. > >> The virtual block device driver mapps this to /dev/sda and forwards >> the request to Xen (perhaps it also tags this request as a request >> to a "special device" before forwarding the request to Xen). >> Xen realizes that there is no physical device connected to /dev/sda >> (or registered with Xen ? Maybe it can then also recognize that >> the request was marked as targeting a "special device"). >> Because of that condition, it forwards this block device request >> to DOM0 now in which a "request handler" kernel module will listen for >> block device requests which may be forwarded to DOM0 from >> Xen to be handled in DOM0 (It will need to register a callback >> function with Xen in order to do so). > >I think your best solution is not to use Xen vbd''s at all. If >you don''t like NFS, how about having domains >0 using "enhanced >network block devices" which talk to a simple server running in >domain0. The storage for the nbd server can be files, partitions >or logical volumes on /dev/sda. > >This should require writing no code, and will give pretty good >performance. It gives good control over storage allocations etc. > >http://www.it.uc3m.es/~ptb/nbd/Thanks a lot for pointing me to this solution ! I will look into it during the next days (especially performance ;-). A propos: Did you ever make benchmarks about the average or maximum throughput of your VFR implementation in XEN ? This would be interesting when routing enbd IP packets from DOM0 to the other domains on the same machine (in terms of a possible average/maximum reachable performance). Also, did you make some benchmarks about the amount of performance degradation by using vbds/vds for disk access compared with using the block device directly (test in DOM0)? Could mounting /dev/sda via enbd be more performant or at least nearly equally performant to using vds and vbds because of the additional overhead of vd/vbd use... ??> >[It appears to work as a rootfs, but I haven''t verified]I''ll try it....(initrd required, I think :-( ) ;-) Best regards, Sven ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel
Ian Pratt
2003-Oct-15 22:07 UTC
Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?
> Thinking about just 3 DOM0 HyperSCSI clients connecting > to the HyperSCSI-Server directly feels somehow more comfortable. > (e.g. a lot easier administration, less points of failiure.) > The 3 DOM0s in this example can then export the HyperSCSI > device(s) via whatever means to the domains > 0.Of course, the absolutely proper solution is to put HyperSCSI into Xen, so that Xen''s block device interface could be used by guest OSes to talk directly with the remote disk. However, I wouldn''t want to contemplate putting a big gob of code like HyperSCSI into Xen until we have implemented the plan for ring-1 loadable modules support. This would then give us a shared-memory block device interface between guestOSes and the HyperSCSI driver (also running in ring1). The HyperSCSI driver would then talk to the network interface, again using shared-memory.> Thanks a lot for pointing me to this solution ! > I will look into it during the next days (especially performance ;-).I''m looking forward to hearing how you get on.> A propos: > Did you ever make benchmarks about the average or maximum > throughput of your VFR implementation in XEN ?The throughput between domains and the real network interface is _very_ good, easily able to saturate a 1Gb/s NIC, probably good for rather more. However, I''m afraid to say that we recently discovered that our inter domain performance is pretty abysmal -- worse than our performance over the real network, which is simultaneously amusing and sad. The problem is that we currently don''t get the asynchronous `pipelining'' when doing inter-domain networking that gives good performance when going to an external interface: since the communication is currently synchronous we don''t get back pressure to allow a queue to build up as would happen with a real NIC. The net result is that we end up bouncing in and out of xen several times for each packet. I volunteered to fix this, but I''m afraid I haven''t had time as yet. I''m confident we should end up with really good inter domain networking performance, using pipelining and page flipping.> Also, did you make some benchmarks about the amount > of performance degradation by using vbds/vds for disk access > compared with using the block device directly (test in DOM0)?Performance of vbds and raw partitions should be identical. Disks are slow -- you have to really work at it to cock the performance up ;-)> Could mounting /dev/sda via enbd be more performant or > at least nearly equally performant to using vds and vbds > because of the additional overhead of vd/vbd use... ??Performance using enbd should be pretty good once we''ve sorted out inter domain networking. Ian ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Xen-devel mailing list Xen-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xen-devel