Gilberto Mautner
2008-Jan-08 03:27 UTC
[zfs-discuss] Does block allocation for small writes work over iSCSI?
Hello list, I''m thinking about this topology: NFS Client <----NFS---> zFS Host <---iSCSI---> zFS Node 1, 2, 3 etc. The idea here is to create a scalable NFS server by plugging in more nodes as more space is needed, striping data across them. A question is: we know from the docs that zFS optimizes random write speed by consolidating what would be many random writes into a single sequential operation. I imagine that for zFS be able to do that it has to have some knowledge about the hard disk geography. Now, if this geography is being abstracted by iSCSI, is that optimization still valid? Thanks Gilberto -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080108/d3ff5cd6/attachment.html>
Richard Elling
2008-Jan-08 05:47 UTC
[zfs-discuss] Does block allocation for small writes work over iSCSI?
Gilberto Mautner wrote:> Hello list, > > > I''m thinking about this topology: > > NFS Client <----NFS---> zFS Host <---iSCSI---> zFS Node 1, 2, 3 etc. > > The idea here is to create a scalable NFS server by plugging in more > nodes as more space is needed, striping data across them.I see people doing this, but, IMHO, it seems like a waste of resources and will be generally slower than having the disks on the NFS server.> > A question is: we know from the docs that zFS optimizes random write > speed by consolidating what would be many random writes into a single > sequential operation. > > I imagine that for zFS be able to do that it has to have some > knowledge about the hard disk geography. Now, if this geography is > being abstracted by iSCSI, is that optimization still valid?ZFS doesn''t do any optimization for hard disk geometry. Allocations are made starting at the beginning and proceeding according to the slab size. For diversity, redundant copies of metadata are spread further away, so there may be some additional "jumps," but these aren''t really based on disk geometry. In other words, I believe the optimization is probably still valid. -- richard
Andre Wenas
2008-Jan-08 08:36 UTC
[zfs-discuss] Does block allocation for small writes work over iSCSI?
Although it looks like possible, but very complex architecture. If you can wait, please explore pNFS: http://opensolaris.org/os/project/nfsv41/ What is pNFS? * The pNFS protocol allows us to separate a NFS file system''s data and metadata paths. With a separate data path we are free to lay file data out in interesting ways like striping it across multiple different file servers. For more information, see the NFSv4.1 specification. Gilberto Mautner wrote:> Hello list, > > > I''m thinking about this topology: > > NFS Client <----NFS---> zFS Host <---iSCSI---> zFS Node 1, 2, 3 etc. > > The idea here is to create a scalable NFS server by plugging in more > nodes as more space is needed, striping data across them. > > A question is: we know from the docs that zFS optimizes random write > speed by consolidating what would be many random writes into a single > sequential operation. > > I imagine that for zFS be able to do that it has to have some > knowledge about the hard disk geography. Now, if this geography is > being abstracted by iSCSI, is that optimization still valid? > > > Thanks > > Gilberto > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20080108/1b95225d/attachment.html>
Ross
2008-Jan-09 08:39 UTC
[zfs-discuss] Does block allocation for small writes work over iSCSI?
That''s pretty much exactly what I''m looking to do, except I''ve been calling it Tiered or Clustered ZFS and I''m looking to serve out CIFS / iSCSI instead of NFS. Sun set up a demo in their labs before Christmas to have a look at this for me and see how it performs. Latency was their biggest concern, other than that they seemed reasonably confident it would work ok. I''m hoping to hear back from them in the next few weeks and will be happy to pass on any results I get. This message posted from opensolaris.org
Ross
2008-Jan-09 08:50 UTC
[zfs-discuss] Does block allocation for small writes work over iSCSI?
PS. This is how I drew up the concept, I''m hoping we''ll be able to cluster the ZFS Hosts by this time next year: http://www.opensolaris.org/jive/servlet/JiveServlet/download/94-44970-177042-4435/Clustered%20ZFS.pdf This message posted from opensolaris.org
Richard Elling
2008-Jan-09 16:00 UTC
[zfs-discuss] Does block allocation for small writes work over iSCSI?
Ross wrote:> PS. This is how I drew up the concept, I''m hoping we''ll be able to cluster the ZFS Hosts by this time next year: > http://www.opensolaris.org/jive/servlet/JiveServlet/download/94-44970-177042-4435/Clustered%20ZFS.pdf > >A few notes: slide 1. striping (aka RAID-0) is not reliable. slide 3. I''m not sure why you say this would be available in "late 2008/early 2009" since it is in fact available today with Solaris Cluster, in some form. But ultimately, your architecture could be described as this: disk -- server -<iSCSI>- server -<iSCSI>- client or disk -- server -<iSCSI>- server -<NFS>- client or disk -- server -<iSCSI>- server -<CIFS>- client This really doesn''t make sense as it is far more complicated than a simpler, and more reliable, architecture which is widely adopted: disk -- server -<NFS/iSCSI/CIFS>- client As we''ve been watching people attempt to implement these sorts of things, they enter into a new set of failure domains and often do not consider the implications. If you *really* want Enterprise levels of availability, you simplify rather than complicate the architecture, and you reduce the number of failure domains whenever you can. -- richard
Ross
2008-Jan-10 09:12 UTC
[zfs-discuss] Does block allocation for small writes work over iSCSI?
For slide 3, HA-ZFS is available now with HA-Storage+ if you''re happy with Active/Passive. HA-iSCSI code was released just before christmas I believe but is currently untested, and HA-CIFS is just a thought on the roadmap. The reason for the 2008/2009 timeline is because that''s when I''ve been told it''s likely that we''ll see HA-CIFS. And yes, it''s more complicated than Disk -- Server -- Client, but you could use that same argument for VMware. That goes Server -- OS -- OS -- Client instead of the traditional Server -- OS -- Client, but I think everyone would agree that there are significant advantages from that abstraction and I see the same here. This message posted from opensolaris.org
Richard Elling
2008-Jan-10 16:05 UTC
[zfs-discuss] Does block allocation for small writes work over iSCSI?
Ross wrote:> For slide 3, HA-ZFS is available now with HA-Storage+ if you''re happy with Active/Passive. HA-iSCSI code was released just before christmas I believe but is currently untested, and HA-CIFS is just a thought on the roadmap. > > The reason for the 2008/2009 timeline is because that''s when I''ve been told it''s likely that we''ll see HA-CIFS. >HA-Samba has been available for 4+ years. Sharing the file systems is the easy part. Reconciling locks is the hard part. iSCSI has very limited locking capabilities (reservations). For CIFS, the existence of HA-Samba agents sets a precedent. The HA Samba agent is written in ksh, so it shouldn''t be too scary :-) http://opensolaris.org/os/community/ha-clusters/ohac/Documentation/Agents/open-agents/ Or, if you want to roll your own, the agent builder is relatively easy to use...> And yes, it''s more complicated than Disk -- Server -- Client, but you could use that same argument for VMware. That goes Server -- OS -- OS -- Client instead of the traditional Server -- OS -- Client, but I think everyone would agree that there are significant advantages from that abstraction and I see the same here. >Ah, but you said "Enterprise class" and VMWare is not. VM is enterprise class, but not VMWare (but clever naming always helps make positive associations :-) -- richard