Eugen Leitl
2011-Aug-06 20:35 UTC
[zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver
----- Forwarded message from "John A. Sullivan III" <jsullivan at opensourcedevel.com> ----- From: "John A. Sullivan III" <jsullivan at opensourcedevel.com> Date: Sat, 06 Aug 2011 16:30:04 -0400 To: vserver at list.linux-vserver.org Subject: Re: [vserver] hybrid zfs pools as iSCSI targets for vserver Reply-To: vserver at list.linux-vserver.org X-Mailer: Evolution 2.30.3 On Sat, 2011-08-06 at 21:40 +0200, Eugen Leitl wrote:> I''ve recently figured out how to make low-end hardware (e.g. HP N36L) > work well as zfs hybrid pools. The system (Nexenta Core + napp-it) > exports the zfs pools as CIFS, NFS or iSCSI (Comstar). > > 1) is this a good idea? > > 2) any of you are running vserver guests on iSCSI targets? Happy with it? >Yes, we have been using iSCSI to hold vserver guests for a couple of years now and are generally unhappy with it. Besides our general distress at Nexenta, there is the constraint of the Linux file system. Someone please correct me if I''m wrong because this is a big problem for us. As far as I know, Linux file system block size cannot exceed the maximum memory page size and is limited to no more than 4KB. iSCSI appears to acknowledge every individual block that is sent. That means the most data one can stream without an ACK is 4KB. That means the throughput is limited by the latency of the network rather than the bandwidth. Nexenta is built on OpenSolaris and has a significantly higher internal network latency than Linux. It is not unusual for us to see round trip times from host to Nexenta well upwards of 100us (micro-seconds). Let''s say it was even as good as 100us. One could send up to 10,000 packets per second * 4KB = 40MBps maximum throughput for any one iSCSI conversation. That''s pretty lousy disk throughput. Other than that, iSCSI is fabulous because it appears as a local block device. We typically mount a large data volume into the VServer host and the mount rbind it into the guest file systems. A magically well working file server without a file server or the hassles of a network file system. Our single complaint other than about Nexenta themselves is the latency constrained throughput. Any one have a way around that? Thanks - John ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
Eugen Leitl
2011-Aug-06 20:44 UTC
[zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver
----- Forwarded message from Gordan Bobic <gordan at bobich.net> ----- From: Gordan Bobic <gordan at bobich.net> Date: Sat, 06 Aug 2011 21:37:30 +0100 To: vserver at list.linux-vserver.org Subject: Re: [vserver] hybrid zfs pools as iSCSI targets for vserver Reply-To: vserver at list.linux-vserver.org User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Red Hat/3.1.11-2.el6_1 Lightning/1.0b2 Thunderbird/3.1.11 On 08/06/2011 09:30 PM, John A. Sullivan III wrote:> On Sat, 2011-08-06 at 21:40 +0200, Eugen Leitl wrote: >> I''ve recently figured out how to make low-end hardware (e.g. HP N36L) >> work well as zfs hybrid pools. The system (Nexenta Core + napp-it) >> exports the zfs pools as CIFS, NFS or iSCSI (Comstar). >> >> 1) is this a good idea? >> >> 2) any of you are running vserver guests on iSCSI targets? Happy with it? >> > Yes, we have been using iSCSI to hold vserver guests for a couple of > years now and are generally unhappy with it. Besides our general > distress at Nexenta, there is the constraint of the Linux file system. > > Someone please correct me if I''m wrong because this is a big problem for > us. As far as I know, Linux file system block size cannot exceed the > maximum memory page size and is limited to no more than 4KB.I''m pretty sure it is _only_ limited by memory page size, since I''m pretty sure I remember that 8KB blocks were available on SPARC.> iSCSI > appears to acknowledge every individual block that is sent. That means > the most data one can stream without an ACK is 4KB. That means the > throughput is limited by the latency of the network rather than the > bandwidth.Hmm, buffering in the FS shouldn''t be dependant on the block layer immediately acknowledging unless you are issuing fsync()/barriers. What FS are you using on top of the iSCSI block device and is your application fsync() heavy? Gordan ----- End forwarded message ----- -- Eugen* Leitl <a href="http://leitl.org">leitl</a> http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE
Roy Sigurd Karlsbakk
2011-Aug-07 13:36 UTC
[zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver
> > 1) is this a good idea? > > > > 2) any of you are running vserver guests on iSCSI targets? Happy > > with it? > > > Yes, we have been using iSCSI to hold vserver guests for a couple of > years now and are generally unhappy with it. Besides our general > distress at Nexenta, there is the constraint of the Linux file system. > > Someone please correct me if I''m wrong because this is a big problem > for > us. As far as I know, Linux file system block size cannot exceed the > maximum memory page size and is limited to no more than 4KB. iSCSI > appears to acknowledge every individual block that is sent. That means > the most data one can stream without an ACK is 4KB. That means the > throughput is limited by the latency of the network rather than the > bandwidth.Even if Linux filesystems generally stick to a block size of 4kB, that doesn''t mean all transfers are maximum 4kB. If that would have been the case, Linux would be quite useless for a server. I/O operations are queued and if, for instance, a read() call requests 8MB, that''s done in a single operation.> Nexenta is built on OpenSolaris and has a significantly higher > internal > network latency than Linux. It is not unusual for us to see round trip > times from host to Nexenta well upwards of 100us (micro-seconds). > Let''s > say it was even as good as 100us. One could send up to 10,000 packets > per second * 4KB = 40MBps maximum throughput for any one iSCSI > conversation. That''s pretty lousy disk throughput.That''s why, back in 1992, the sliding window protocol was created (http://tools.ietf.org/html/rfc1323), so that a peer won''t wait for a TCP ACK before resuming operation. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Carson Gaspar
2011-Aug-07 19:28 UTC
[zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver
On 8/7/11 6:36 AM, Roy Sigurd Karlsbakk wrote:> That''s why, back in 1992, the sliding window protocol was created (http://tools.ietf.org/html/rfc1323), so that a peer won''t wait for a TCP ACK before resuming operation.It was part of TCP _long_ before that (it was never as stupid as XMODEM ;-) ). That RFC specifies window scaling to support windows sizes larger than 2^16 bytes, useful for large bandidth*delay product networks. -- Carson
Carson Gaspar
2011-Aug-07 19:33 UTC
[zfs-discuss] [vserver] hybrid zfs pools as iSCSI targets for vserver
>> maximum memory page size and is limited to no more than 4KB. iSCSI >> appears to acknowledge every individual block that is sent. That means >> the most data one can stream without an ACK is 4KB. That means the >> throughput is limited by the latency of the network rather than the >> bandwidth.I am _far_ from an iSCSI expert, but the above should not be true, as it isn''t true for other SCSI flavours. If your initiator supports command queuing, it should happily write multiple blocks before stalling on a response. You can also enable write cache support, but I don''t recall if it''s necessary to do so on the initiator, the target, or both. -- Carson