Hi zfs gurus, I am wondering whether the reliability of solaris/zfs is still guaranteed if I will be running zfs not directly over real hardware, but over Xen virtualization ? The plan is to assign physical raw access to the disks to the xen guest. I remember zfs having problems with hardware that lies about disk write ordering, wonder how that is handled over Xen, or if that issue has been completely resolved Thanks and Best Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090517/9f23306c/attachment.html>
Is anyone even using ZFS under Xen in production in some form. If so, what''s your impression of reliability ? Regards On Sun, May 17, 2009 at 2:16 PM, Ahmed Kamal < email.ahmedkamal at googlemail.com> wrote:> Hi zfs gurus, > > I am wondering whether the reliability of solaris/zfs is still guaranteed > if I will be running zfs not directly over real hardware, but over Xen > virtualization ? The plan is to assign physical raw access to the disks to > the xen guest. I remember zfs having problems with hardware that lies about > disk write ordering, wonder how that is handled over Xen, or if that issue > has been completely resolved > > Thanks and Best Regards >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090520/1cd76141/attachment.html>
On Wed, May 20, 2009 at 4:06 AM, Ahmed Kamal <email.ahmedkamal at googlemail.com> wrote:> Is anyone even using ZFS under Xen in production in some form. If so, what''s > your impression of reliability ?I''m using zfs-fuse on Linux domU on top of LVM on Linux dom0/Xen 3.1. Not exactly a recommended configuration, but it works, no problem so far. The server has battery-backed hardware raid controller, so that might help the reliability part. -- Fajar
On Wed, May 20, 2009 at 12:06:49AM +0300, Ahmed Kamal wrote:> Is anyone even using ZFS under Xen in production in some form. If so, what''s > your impression of reliability ?Hmm, somebody needs to out itself. Short answer: yes. Details: Well, i''ve installed an IntelServer (2x QuadCore E5440, 2.8 GHz) with 4x 300GB-SATA (2x 2-way mirrors) in a small company (one of the ebay rated Top10 sellers in germany) in October 2008 running snv_b98 as dom0 ;-) domU1 is a win2003 32bit small business server running a MS SQL server, domU2 is a win2008 32bit standard server, which is used as terminal server (<=10 users) basically to run several Sage[.de] products. Both domUs are using the latest PV driver, which - depending on the recordsize to write - reaches about the same xfer rates as in dom0. (see http://iws.cs.uni-magdeburg.de/~elkner/xVM/ for more info: the "benchmarks" were made on a X4600M, however the IntelServer produces about the same numbers). pool1/win2003sbs.dsk volsize 48G - pool1/win2003sbs.dsk volblocksize 8K - pool1/win2008ss.dsk volsize 24G - pool1/win2008ss.dsk volblocksize 8K - It is in production since ~ february 2009 and stable as expected: Sometimes, when the win2003 domU is too long idle, it doesn''t wake up anymore. No problem, wrote a simple CGI-script, so that the users have a simple UI to check the state of the domUs (basically a ping), to pull the cable (virsh destroy) as well as start/suspend/resume them. Usually they use this bookmark ~ 2-3/week when they start working. Users are quite happy with it, since clicking on a link is not much more work than switching on their own PC - so not annoying/painful at all. Initially I gave win2003 2x16 GB partitions (C:, D: for SQL Data), winn2008 a single 16 GB partition. When the sage products were installed, it turned out, that the 16GB was almost filled, so the volsize of pool1/win2008ss.dsk was increased to 24GB and the C: partition dynam. extendend in Win2008 - no problem. Last month the SQL Server has filled up its D: partition (16GB). So it started to ''reboot'' several times a day. However, rising the pool1/win2003sbs.dsk volsize to 48 GB (i.e. D: to 32GB) solved that problem. The only little hurdle here was, that on could not extend the partition per win partitionmanager''s context menu: one had to use the command line tool ... Things, which are a little bit annoying is the ZFS send command. Takes pretty long, but since it is usually running at night, it is not a real problem. BTW: The server was installed from remote "misusing" a company internal linux server (jumpstart of course). Since seriell port was not connected, I had to ask an on-site user to initiate the PXE boot, but this was not a problem, too. So the summary: all people (incl. admins) are happy. However, if you need to decide, whether to use Xen, test your setup before going into production and ask your boss, whether he can live with innovative ... solutions ;-) Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768
> > However, if you need to decide, whether to use Xen, test your setup > before going into production and ask your boss, whether he can live with > innovative ... solutions ;-) >Thanks a lot for the informative reply. It has been definitely helpful I am however interested in the reliability of running the ZFS stack as Xen domU (and not dom0). For instance, I am worried that the emulated disk controller would not obey flushes, or write ordering thus stabbing zfs in the back. Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090522/a627382f/attachment.html>
On Fri, May 22, 2009 at 2:44 PM, Ahmed Kamal < email.ahmedkamal at googlemail.com> wrote:> However, if you need to decide, whether to use Xen, test your setup >> before going into production and ask your boss, whether he can live with >> innovative ... solutions ;-) >> > > Thanks a lot for the informative reply. It has been definitely helpful > I am however interested in the reliability of running the ZFS stack as Xen > domU (and not dom0). For instance, I am worried that the emulated disk > controller would not obey flushes, or write ordering thus stabbing zfs in > the back. > > Regards >I''ve gotten very good performance numbers for I/O out of a 2008.11 PV domU with a zfs zvol as the storage device/install disk Blake -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090522/e70a5f53/attachment.html>
Blake wrote:> On Fri, May 22, 2009 at 2:44 PM, Ahmed Kamal > <email.ahmedkamal at googlemail.com > <mailto:email.ahmedkamal at googlemail.com>> wrote: > > However, if you need to decide, whether to use Xen, test your > setup > before going into production and ask your boss, whether he can > live with > innovative ... solutions ;-) > > > Thanks a lot for the informative reply. It has been definitely helpful > I am however interested in the reliability of running the ZFS > stack as Xen domU (and not dom0). For instance, I am worried that > the emulated disk controller would not obey flushes, or write > ordering thus stabbing zfs in the back. > > Regards > > > I''ve gotten very good performance numbers for I/O out of a 2008.11 PV > domU with a zfs zvol as the storage device/install diskWe run several 2008.11 PV domUs with ZFS root installed on the virtual disk. The virtual disk, however is an iSCSI LUN attached to the dom0 from a Unified Storage cluster. We get very good performance and have not seen anything to indicate the controller is misbehaving. We also have few domUs where we expose the entire raw disk to the domU and then run ZFS on it. This has also been working well. --joe
On Sun, May 17, 2009 at 02:16:01PM +0300, Ahmed Kamal wrote:> I am wondering whether the reliability of solaris/zfs is still guaranteed if > I will be running zfs not directly over real hardware, but over Xen > virtualization ? The plan is to assign physical raw access to the disks to > the xen guest. I remember zfs having problems with hardware that lies about > disk write ordering, wonder how that is handled over Xen, or if that issue > has been completely resolvedYou can read the frontend sources here: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/xen/io/xdf.c If both dom0 and domU are Solaris, then any disk cache flushes are passed through via FLUSH_DISKCACHE. If the dom0 is Linux, then we attempt to emulate a flush by using WRITE_BARRIER (annoyingly, this requires us to write a block as well, so in this case, we cache one). In the backend (xdb or xpvtap), we pass along the flush request, either via an in kernel flush: 683 (void) ldi_ioctl(vdp->xs_ldi_hdl, 684 DKIOCFLUSHWRITECACHE, NULL, FKIOCTL, kcred, NULL); or via VDFlush() in the VirtualBox code we use (which essentially ends up as an fsync()). Thus as long as the ioctl and/or fsync are obeyed, things should be good. Hope that''s clearer. regards john