thr3ads.net - Xen devel - [Xen-devel] Further problems with HyperSCSI and vbds... [Oct 2003]

If this information is useful, please help other people find it:
Share via:

sven.kretzschmar@gmx.de

2003-Oct-13 22:41 UTC

[Xen-devel] Further problems with HyperSCSI and vbds...

After applying the new patches / Changesets from the Xen project team
(thanks again :)  vbds and vds are working with local harddisks
(e.g. /dev/hda) as expected.
Also, I am able now to load and use the HyperSCSI module...
...in a quite restricted way :-(

As long as there is no vbd involved, everything works as expected:

*) In domain0 I can fdisk /dev/sda  (which is "emulated" by the
HyperSCSI
    kernel module)
*) I can put a filesystem on /dev/sdaX and mount it in domain0.

But as soon as I use a vbd to access it(via attaching a physical 
/dev/sdaX partition to the vbd _or_ via attaching a vd, which uses
a /dev/sdaX partition, to the vbd), it does not work anymore. 
(Even when using xen_refresh_dev.)

Fdisk can not open /dev/xvda in this case (unable to read).
mkfs.ext2 starts, but then complains about a "short read" on 
block 0. It then continues to write the filesystem, but I found out, 
that it does not really access the physical disk via HyperSCSI
at all. It also seems that it does not even access the /dev/sda "fake"
device on the local server, because there is no network access
from the client to the server, where the physical disk 
is located.
Trying to mount /dev/xvda then results again in "short read on block
0"
and being unable to read the superblock, etc.

I think the problem here is, that HyperSCSI attaches /dev/sda 
without really knowing anything about Xen ;-)
Xen also knows nothing about this "faked" physical SCSI 
device on /dev/sda, only xenolinux does, because of the loaded 
HyperSCSI kernel module driver.

So, perhaps the virtual block driver in xenolinux tries to access the 
faked physical /dev/sda device via Xen, but as Xen does not know it, 
this somehow does not really work. (Btw: Shouldn''t this result in some 
printk() error messages in the xenolinux virtual block driver ?)
The virtual block driver in xenolinux should instead recognize that 
this is not a physical device registered with Xen and should try to
forward these disk requests and ioctls directly to the /dev/sda(X) device,
instead of sending it to Xen.
Of course, this should only by allowed for devices (or device drivers)
loaded in domain0 ??
Of course these are only assumptions and loud thoughts ... ;-)

I think, one has at least to change some code in xl_block.c and 
xl_scsi.c to reach that goal.
Perhaps one could try to register the scsi-devices which are provided
by the HyperSCSI module as xenolinux virtual scsi block devices ? 
(The code in xlscsi_init(xen_disk_info_t *xdi) in xl_scsi.c makes me
think this could perhaps work...

I know that this might violate the design principle of Xen to be the
only component which has direct access to the hardware.
However, the /dev/sd* devices from HyperSCSI are not really local
hardware, it''s only a "faked" physical disk.

I would be interested in some thoughts about that from the Xen project
team and list readers, because I consider HyperSCSI to be an important
feature for xenolinux domains.
It would allow you to store the whole filesystems of a lot of domains from
several physical machines, which are running xen/xenolinux, on one big
fileserver.
As HyperSCSI is a very quick and efficient protocol / implementation, this
would be a lot quicker and remarkably more efficient than using NFS for
the same task.
Also HyperSCSI can use not only SCSI devices (disks, tapes, etc.) but also
IDE devices like IDE-disks and IDE-CD-Writers as real physical devices
to be accessed over the LAN ( http://nst.dsi.a-star.edu.sg/mcsa/hyperscsi ).
Sorry for the little HyperSCSI hype, I only wanted to explain my interest in
HyperSCSI in connection with Xen.

I hope there''s a not so complicated solution for this problem.


Regards,

Sven


-- 
NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien...
Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService

Jetzt kostenlos anmelden unter http://www.gmx.net

+++ GMX - die erste Adresse für Mail, Message, More! +++



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Keir Fraser

2003-Oct-14 07:17 UTC

head link

[Xen-devel] Re: Further problems with HyperSCSI and vbds...

> I think the problem here is, that HyperSCSI attaches /dev/sda 
> without really knowing anything about Xen ;-)
> Xen also knows nothing about this "faked" physical SCSI 
> device on /dev/sda, only xenolinux does, because of the loaded 
> HyperSCSI kernel module driver.
Yes, you''ve hit the nail on the head. Although you construct VBDs out
of carved up hd* and sd* partitions, those partitions have to be on
devices that Xen knows about. So, when you try and access the VBD Xen
maps the request to a non-existent local SCSI disc :-)
 > So, perhaps the virtual block driver in xenolinux tries to access the 
> faked physical /dev/sda device via Xen, but as Xen does not know it, 
> this somehow does not really work. (Btw: Shouldn''t this result in
some
> printk() error messages in the xenolinux virtual block driver ?)
I''ll add the debugging back into the xenolinux driver. In any case, a
bit more noise from our development tree would be no bad thing!
> The virtual block driver in xenolinux should instead recognize that 
> this is not a physical device registered with Xen and should try to
> forward these disk requests and ioctls directly to the /dev/sda(X) device,
> instead of sending it to Xen.
> Of course, this should only by allowed for devices (or device drivers)
> loaded in domain0 ??
Why do you want to construct VBDs if only domain 0 is going to access
them? However, if that''s all you want to do then yes --- modificatiosn
to xl_scsi.c will suffice.
> I know that this might violate the design principle of Xen to be the
> only component which has direct access to the hardware.
> However, the /dev/sd* devices from HyperSCSI are not really local
> hardware, it''s only a "faked" physical disk.
DOM0 is allowed unrestricted access to hardware already. Otherwise X
wouldn''t work :-)
 > I would be interested in some thoughts about that from the Xen project
> team and list readers, because I consider HyperSCSI to be an important
> feature for xenolinux domains.
> It would allow you to store the whole filesystems of a lot of domains from
> several physical machines, which are running xen/xenolinux, on one big
> fileserver.
> As HyperSCSI is a very quick and efficient protocol / implementation, this
> would be a lot quicker and remarkably more efficient than using NFS for
> the same task.
There are a few options to allow HyperSCSI access from all domains:

 1. NFS-mount HyperSCSI partitions via domain 0 (this will work
already)

 2. NFS-mount VBDs which map onto chunks of HyperSCSI disk, via domain
0 (this might work if you hack DOM0''s xl_scsi.c a bit so that DOM0
VBDs can map onto HyperSCSI block devices).

 3. Add proper support for HyperSCSI to Xen. You''d need some scheme
for validating transmits which use the HyperSCSI transport, and
demusing received frames to the appropriate domain. I don''t know
anything about teh protocol, so I don''t know how easy this would be
(e.g. how much state Xen would need to keep lying around).

 -- Keir


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2003-Oct-14 08:06 UTC

head link

Re: [Xen-devel] Re: Further problems with HyperSCSI and vbds...

Sven,

Sorry it''s taken so long for us to understand what you''re
trying
to do. 
>  3. Add proper support for HyperSCSI to Xen. You''d need some
scheme
> for validating transmits which use the HyperSCSI transport, and
> demusing received frames to the appropriate domain. I don''t know
> anything about teh protocol, so I don''t know how easy this would
be
> (e.g. how much state Xen would need to keep lying around).
The main thing would be turning the VFR into more of an L2 switch
than a router, with each domain having its own MAC[*]. We could then
add a rule to grant a domain TX permission for a particular 802
protocol number. HyperSCSI presumably has some high-level
server-based authentication and privilege verification? If so, it
should be pretty straightforward. 

[*] each domain already has its own MAC for purposes of DHCP, but
for normal TX packets we currently replace the MAC with the
Ethernet card''s real MAC. This was congruent with the view of the
VFR as a router rather than a switch, and also to keep the local
sys-admins happy who would otherwise see potentially thousands of
new MAC addresses. 

Ian


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Sven Kretzschmar

2003-Oct-15 17:13 UTC

head link

[Xen-devel] Solution for problems with HyperSCSI and vbds ?

I have thought a little bit about Ian''s and Keir''s proposals
how to make HyperSCSI and vbds work (also in domains > 0).
> 1. NFS-mount HyperSCSI partitions via domain 0 (this will work
>already)
Although this is possible as a temporary workaround, it would take away
the advantages HyperSCSI offers (speed, very low overhead, especially
with the configuration ;-); also it would add another point of failiure or
instability by using NFS for this.
Also, if you want to start more than 15 domains on a server, the
direct mounting of /dev/sd* devices via NFS (vbds don''t work yet
with HyperSCSI), would hit the frontier of the maximum of
allowed and mapped sd*X devices (e.g. only sda1 - sda15 work,
the next minor device no. is already for sdb. And my Raid10 Array
used for testing only exports one 160G /dev/sda device).
That means, in order to break this frontier one would definitely need
vbds via vds. Or perhaps a COW device files somewhen in the 
future (But of course this would also hit performance).
So I dont really favour this possibility.
> 2. NFS-mount VBDs which map onto chunks of HyperSCSI disk, via domain
>0 (this might work if you hack DOM0''s xl_scsi.c a bit so that DOM0
>VBDs can map onto HyperSCSI block devices).Much better but still there''s NFS involved for no real need, 
only as a workaround.
So, I don''t favour this for the same reasons as in 1.)
>>  3. Add proper support for HyperSCSI to Xen. You''d need some
scheme
>> for validating transmits which use the HyperSCSI transport, and
>> demusing received frames to the appropriate domain. I don''t
know
>> anything about teh protocol, so I don''t know how easy this
would be
>> (e.g. how much state Xen would need to keep lying around).
>
>[Ian:]The main thing would be turning the VFR into more of an L2 switch
>than a router, with each domain having its own MAC[*]. We could then
>add a rule to grant a domain TX permission for a particular 802
>protocol number. HyperSCSI presumably has some high-level
>server-based authentication and privilege verification? If so, it
>should be pretty straightforward. 
This is much better, though more complicated too ;-)

However, I wouldn''t do this based on protocols or routing HyperSCSI
ether packets or the need to use HyperSCSI kernel modules in 
domains > 0 (Perhaps too complicated and only a special solution for this
problem).


Here are my first thoughts about a solution:

I try to describe it roughly from the view of domain #1 (not DOM0):

Preconditions: /dev/xvda is a vbd which is attached to a vd which is
mapped to a HyperSCSI partition (e.g. /dev/sda5).
A simpler case would be a direct mapping from a vbd to /dev/sda5; 
without a vd in between. I will only use this case for the sake of
simplicity now...:

Some Application in domain #1 accesses /dev/xvda.
The virtual block device driver mapps this to /dev/sda and forwards
the request to Xen (perhaps it also tags this request as a request
to a "special device" before forwarding the request to Xen).
Xen realizes that there is no physical device connected to /dev/sda
(or registered with Xen ? Maybe it can then also recognize that
the request was marked as targeting a "special device").
Because of that condition, it forwards this block device request
to DOM0 now in which a "request handler" kernel module will listen for
block device requests which may be forwarded to DOM0 from 
Xen to be handled in DOM0 (It will need to register a callback 
function with Xen in order to do so).
This callback function is now called by Xen to forward the block 
device request to the kernel module loaded in DOM0.
This "block device request handler" kernel module checks the 
data for the block device request (e.g. ioctl, read or write) and just
tries to execute the requested operation on the designated device 
(/dev/sda in our example) in DOM0 and gives back the result and/or 
data to Xen.
If there''s no device driver attached to /dev/sda (like, in this
example,
the HyperSCSI kernel module) the handler module will return the error
condition to Xen.
Xen in turn hands back the result code and data to the virtual block
device driver in Domain #1 which forwards it to the application who
triggered the block device access request. Done.

Sorry for this prosa above being a little bit unspecific, I currently
don''t have that much time to make it shine... ;-)
I have tried to simplify some of the internals of Xen, also because
I don''t understand it completly yet ;-)
If I made errors, please feel free to correct me.

This is somewhat similar to the proposal to load "normal" linux
device drivers exclusively in DOM0 to access hardware (also by
Domains > 0) via a lot of already written linux device drivers, instead
of letting Xen handle all access to hardware alone.
But in this case only for block devices.

I would like to do a _little_ restricted case study for that by using the 
HyperSCSI / vbd problem as an example.

However I would need some "small" help and hints from the 
Xen team:


What would be the cleanest way to do the communication
between a kernel module loaded in DOM0 and Xen ?

Does the Xen-API offers functions for registering 
callbacks (to kernel modules) ?

If yes, what are the names and how are they used ?
Where are these functions defined (in which source files) ?

Can these callbacks be done asynchronously somehow
(That is, Xen should be able to call the kernel module at some time
to initiate the block device request in DOM0, and then be called
back later asynchronously with the results by DOM0, because
device accesses can take some time....) ?

Are there any special caveats I have to pay attention at ?


I think all this might also be interesting for accessing other
block devices for which no Xen drivers exists.
A very similar thing could perhaps also be done for
character devices ?

Thanks in advance for any help you could offer me to get
this working... :)

I would of course also like to hear some opinions or concerns
from other members of this list or the Xen team about my
above proposed possible(?) solution ;-)

Regards,

Sven





-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2003-Oct-15 17:48 UTC

head link

Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?

> >[Ian:]The main thing would be turning the VFR into more of an L2 switch
> >than a router, with each domain having its own MAC[*]. We could then
> >add a rule to grant a domain TX permission for a particular 802
> >protocol number. HyperSCSI presumably has some high-level
> >server-based authentication and privilege verification? If so, it
> >should be pretty straightforward. 
> 
> This is much better, though more complicated too ;-)
> 
> However, I wouldn''t do this based on protocols or routing
HyperSCSI
> ether packets or the need to use HyperSCSI kernel modules in 
> domains > 0 (Perhaps too complicated and only a special solution for
this
> problem).
I still like my proposal ;-)

It''s pretty straight forward to implement, is relatively clean,
and will have good performance. 

However, if you''re exporting a single disk from the HyperSCSI
server its not much help.
> The virtual block device driver mapps this to /dev/sda and forwards
> the request to Xen (perhaps it also tags this request as a request
> to a "special device" before forwarding the request to Xen).
> Xen realizes that there is no physical device connected to /dev/sda
> (or registered with Xen ? Maybe it can then also recognize that
> the request was marked as targeting a "special device").
> Because of that condition, it forwards this block device request
> to DOM0 now in which a "request handler" kernel module will
listen for
> block device requests which may be forwarded to DOM0 from 
> Xen to be handled in DOM0 (It will need to register a callback 
> function with Xen in order to do so).
I think your best solution is not to use Xen vbd''s at all.  If
you don''t like NFS, how about having domains >0 using "enhanced
network block devices" which talk to a simple server running in
domain0. The storage for the nbd server can be files, partitions
or logical volumes on /dev/sda.

This should require writing no code, and will give pretty good
performance. It gives good control over storage allocations etc.

http://www.it.uc3m.es/~ptb/nbd/

[It appears to work as a rootfs, but I haven''t verified]

Best,
Ian


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Sven Kretzschmar

2003-Oct-15 21:19 UTC

head link

Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?

On 15.10.2003 at 18:48 Ian Pratt wrote:
>> >[Ian:]The main thing would be turning the VFR into more of an L2
switch
>> >than a router, with each domain having its own MAC[*]. We could
then
>> >add a rule to grant a domain TX permission for a particular 802
>> >protocol number. HyperSCSI presumably has some high-level
>> >server-based authentication and privilege verification? If so, it
Yes, it has (even encryption, if needed).
>> >should be pretty straightforward. 
>> 
>> This is much better, though more complicated too ;-)
>> 
>> However, I wouldn''t do this based on protocols or routing
HyperSCSI
>> ether packets or the need to use HyperSCSI kernel modules in 
>> domains > 0 (Perhaps too complicated and only a special solution for
this
>> problem).
>
>I still like my proposal ;-)
:)) Sorry for being so rude on it ;-)
..Besides my other points mentioned, I just want to avoid the 
necessity to load a kernel module in domains > 0 in order to use 
the /dev/sda device. 
It should just be usable like a standard hardware device supported 
by the kernel -- KISS-Principle, at least from the point of view of 
domains > 0 or clients using domains >0.
(Yes, sometimes I am a very restrictive purist ;-).
>
>It''s pretty straight forward to implement, is relatively clean,
>and will have good performance. 
I would like to build up a "production strength" environment
with as high remote access disk performance (speed) as reasonably
possible. 

But if I accept thinking about loading a kernel module in
domains > 0 in order to get HyperSCSI attached devices 
to work somehow, then your proposal (VFR routing of ether packets
to and from domains > 0) is perhaps likely to result in better
performance than using enbd devices additionaly.
However, I somehow don''t like the thought of 100+ domains
from e.g. 3 different physical servers to connect to the HyperSCSI 
physical server directly.
Thinking about just 3 DOM0 HyperSCSI clients connecting 
to the HyperSCSI-Server directly feels somehow more comfortable.
(e.g. a lot easier administration, less points of failiure.)
The 3 DOM0s in this example can then export the HyperSCSI
device(s) via whatever means to the domains > 0.
>
>However, if you''re exporting a single disk from the HyperSCSI
>server its not much help.
>
>> The virtual block device driver mapps this to /dev/sda and forwards
>> the request to Xen (perhaps it also tags this request as a request
>> to a "special device" before forwarding the request to Xen).
>> Xen realizes that there is no physical device connected to /dev/sda
>> (or registered with Xen ? Maybe it can then also recognize that
>> the request was marked as targeting a "special device").
>> Because of that condition, it forwards this block device request
>> to DOM0 now in which a "request handler" kernel module will
listen for
>> block device requests which may be forwarded to DOM0 from 
>> Xen to be handled in DOM0 (It will need to register a callback 
>> function with Xen in order to do so).
>
>I think your best solution is not to use Xen vbd''s at all.  If
>you don''t like NFS, how about having domains >0 using
"enhanced
>network block devices" which talk to a simple server running in
>domain0. The storage for the nbd server can be files, partitions
>or logical volumes on /dev/sda.
>
>This should require writing no code, and will give pretty good
>performance. It gives good control over storage allocations etc.
>
>http://www.it.uc3m.es/~ptb/nbd/
Thanks a lot for pointing me to this solution !
I will look into it during the next days (especially performance ;-).

A propos:
Did you ever make benchmarks about the average or maximum
throughput of your VFR implementation in XEN ?
This would be interesting when routing enbd IP packets 
from DOM0 to the other domains on the same
machine (in terms of a possible average/maximum reachable 
performance).
Also, did you make some benchmarks about the amount
of performance degradation by using vbds/vds for disk access
compared with using the block device directly (test in DOM0)?
Could mounting /dev/sda via enbd be more performant or
at least nearly equally performant to using vds and vbds 
because of the additional overhead of vd/vbd use... ??

>
>[It appears to work as a rootfs, but I haven''t verified]
I''ll try it....(initrd required, I think :-( ) ;-)

Best regards,


Sven






-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Ian Pratt

2003-Oct-15 22:07 UTC

head link

Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?

> Thinking about just 3 DOM0 HyperSCSI clients connecting 
> to the HyperSCSI-Server directly feels somehow more comfortable.
> (e.g. a lot easier administration, less points of failiure.)
> The 3 DOM0s in this example can then export the HyperSCSI
> device(s) via whatever means to the domains > 0.
Of course, the absolutely proper solution is to put HyperSCSI
into Xen, so that Xen''s block device interface could be used by
guest OSes to talk directly with the remote disk.

However, I wouldn''t want to contemplate putting a big gob of code
like HyperSCSI into Xen until we have implemented the plan for
ring-1 loadable modules support. This would then give us a
shared-memory block device interface between guestOSes and the
HyperSCSI driver (also running in ring1). The HyperSCSI driver
would then talk to the network interface, again using
shared-memory.
 > Thanks a lot for pointing me to this solution !
> I will look into it during the next days (especially performance ;-).
I''m looking forward to hearing how you get on.
> A propos:
> Did you ever make benchmarks about the average or maximum
> throughput of your VFR implementation in XEN ?
The throughput between domains and the real network interface is
_very_ good, easily able to saturate a 1Gb/s NIC, probably good
for rather more.

However, I''m afraid to say that we recently discovered that our
inter domain performance is pretty abysmal -- worse than our 
performance over the real network, which is simultaneously
amusing and sad. 

The problem is that we currently don''t get the asynchronous
`pipelining'' when doing inter-domain networking that gives good
performance when going to an external interface: since the
communication is currently synchronous we don''t get back pressure
to allow a queue to build up as would happen with a real NIC. The
net result is that we end up bouncing in and out of xen several
times for each packet.

I volunteered to fix this, but I''m afraid I haven''t had time
as
yet. I''m confident we should end up with really good inter domain
networking performance, using pipelining and page flipping.
> Also, did you make some benchmarks about the amount
> of performance degradation by using vbds/vds for disk access
> compared with using the block device directly (test in DOM0)?
Performance of vbds and raw partitions should be identical. Disks
are slow -- you have to really work at it to cock the performance
up ;-)
> Could mounting /dev/sda via enbd be more performant or
> at least nearly equally performant to using vds and vbds 
> because of the additional overhead of vd/vbd use... ??
Performance using enbd should be pretty good once we''ve sorted
out inter domain networking.


Ian


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xen-devel

Xen devel - Oct 2003 - Further problems with HyperSCSI and vbds...

[Xen-devel] Further problems with HyperSCSI and vbds...

[Xen-devel] Re: Further problems with HyperSCSI and vbds...

Re: [Xen-devel] Re: Further problems with HyperSCSI and vbds...

[Xen-devel] Solution for problems with HyperSCSI and vbds ?

Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?

Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?

Re: [Xen-devel] Solution for problems with HyperSCSI and vbds ?