Hi together, Currently I am planning a storage network for making backups of several servers. At the moment there are several dedicated backup server for it: 4 nodes; each node is providing 2.5 TB disk space and exporting it with CIFS over Ethernet/1 GBIT. Unfortunately this is not a very flexible way of providing disk space for backup purpose. The problem: the size of the file server is varying and therefore the backup-space is not used very well - both in an economic and technical view. I want to redesign the current architecture and I try to make it more flexible. I have the following idea: 1. The 4 Nodes become a storage backend; they provide disk space as an ISCSI device. 2. A new Server takes the role of a Gateway to the storage network. It will aggregate the several nodes by including the iscsi devices and building a ZFS storage pool over them. In this way I reached a big pool of storage. The space of this pool could be export with CIFS to the file-servers for making backups. 3. To reach good performance I could establish a dedicated GBIT-Ethernet network between the backup nodes and the gateway. In addition the Gateway get ISCSI HBA. The gateway should than be connected with the local network with several GBIT uplinks. 4. To reach high availability I could build a fail-over cluster of the ZFS gateway. What do you think about this architecture? Could the gateway be a bottleneck? Do you have any other ideas or recommendations? Regards, Dak -- This message posted from opensolaris.org
Dak wrote:> Hi together, > Currently I am planning a storage network for making backups of several servers. At the moment there are several dedicated backup server for it: 4 nodes; each node is providing 2.5 TB disk space and exporting it with CIFS over Ethernet/1 GBIT. Unfortunately this is not a very flexible way of providing disk space for backup purpose. The problem: the size of the file server is varying and therefore the backup-space is not used very well - both in an economic and technical view. > I want to redesign the current architecture and I try to make it more flexible. I have the following idea: > 1. The 4 Nodes become a storage backend; they provide disk space as an ISCSI device. > 2. A new Server takes the role of a Gateway to the storage network. It will aggregate the several nodes by including the iscsi devices and building a ZFS storage pool over them. In this way I reached a big pool of storage. The space of this pool could be export with CIFS to the file-servers for making backups. > 3. To reach good performance I could establish a dedicated GBIT-Ethernet network between the backup nodes and the gateway. In addition the Gateway get ISCSI HBA. The gateway should than be connected with the local network with several GBIT uplinks. > 4. To reach high availability I could build a fail-over cluster of the ZFS gateway. > > What do you think about this architecture? Could the gateway be a bottleneck? Do you have any other ideas or recommendations? >I have a setup similar to this. The most important thing I can recommend is to create a mirrored zpool from the iscsi disks. -Dave
Bob Friesenhahn
2008-Dec-13 21:40 UTC
[zfs-discuss] ZFS as a Gateway for a stroage network
On Sat, 13 Dec 2008, Dak wrote:> What do you think about this architecture? Could the gateway be a > bottleneck? Do you have any other ideas or recommendations?You will need to have redundancy somewhere to avoid possible data loss. If redundancy is in the backend, then you should be protected from individual disk failure, but it is still possible to lose the entire pool if something goes wrong with the frontend pool. Unless you export individual backend server disks (or several volumes from a larger pool) using iSCSI the problem you may face is the resilver time if something goes wrong. If the size of the backend storage volume is too big, then the resilver time will be excessively long. You don''t want to have to resilver up to 2.5TB since that might take days. The ideal solution will figure out how to dice up the storage in order to minimize the amount of resilvering which much take place if something fails. For performance you want to maximize the number of vdevs. Simple mirroring is likely safest and most performant for your headend server with raidz or raidz2 on the backend servers. Unfortunately, simple mirroring will waste half the space. You could use raidz on the headend server to minimize storage space loss but performance will be considerably reduced since writes will then be ordered and all of the backend servers will need to accept the write before the next write can proceed. Raidz will also reduce resilver performance since data has to be requested from all of the backend servers (over slow iSCSI) in order to re-construct the data. If you are able to afford it, you could get rid of the servers you were planning to use as backend storage and replace them with cheap JBOD storage arrays which are managed directly with ZFS. This is really the ideal solution in order to maximize performance, maximize reliability, and minimize resilver time. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
This is basically the setup I suggested a year or so ago. While the theory is sound, the major problem with it is that iSCSI and ZFS are not a great combination when a device (in your case server) goes down. If you create a pool of several iSCSI devices, when any one fails, the entire pool will lock up for 3 minutes while it waits for iSCSI to timeout. Provided you have redundancy it will work fine after this. And in terms of building a fail over cluster, yes this is also pretty easy to do, and something I tested myself. My notes on this are pretty old now, but drop me an e-mail on googlemail.com if you''d like a copy of them. I got a cluster working and failing over fine for CIFS and NFS clients with next to no prior experience of Solaris. -- This message posted from opensolaris.org
That is very interesting. What kind of hardware did you use? Do you have any statistics about throughput and I/O behavior? Maybe you could provide the detailed architecture. Unfortunately I did not find your e-mail address in your user profile for direct contact. -- This message posted from opensolaris.org
Bob, thank you very much for your detailed answer. Indeed, Resilvering could be a very difficult situation in such a big storage pool. I could solve this issue by building a pool for each backend node. But then I run into the same problems I have at this moment: The disk space of my server is heavily varying. A fixed size pool could not provide enough space or it provides much more space than is needed in one moment. One big storage pool is very interesting because it does not run out of space than several dedicated storage pools. With a JOBOD I will run into the same problems which I have at this moment if the number of server is growing, won?t I? Is there some other way you would recommend in order to solve this problem (one big storage pool; several backup nodes)? -- This message posted from opensolaris.org
Dave, what kind of hardware did you used? I am scared about the bandwith and I/O throughput of the zfs gateway. -- This message posted from opensolaris.org
Bob Friesenhahn
2008-Dec-14 16:23 UTC
[zfs-discuss] ZFS as a Gateway for a stroage network
On Sun, 14 Dec 2008, Dak wrote:> dedicated storage pools. With a JOBOD I will run into the same > problems which I have at this moment if the number of server is > growing, won''t I? Is there some other way you would recommend in > order to solve this problem (one big storage pool; several backup > nodes)?I don''t understand your concern regarding JBOD. JBOD provides a way that your storage pool size can be increased by installing more drives, or adding another JBOD storage array. ZFS is very good at growing its storage size by adding more disks. With this approach you can maximize usable storage capacity by using a space efficient strategy like raidz or raidz2. Using backend servers with a complex disk-based OS (e.g. Solaris) is surely more failure prone than using devices which requires only "simple" firmware to boot and provide service. The iSCSI protocol over ethernet adds more latency and offers less throughput than the SAS or fiber channel that a JBOD array will use. If you are truely expecting your backend servers to be "backup nodes" then I think you are stuck with using simple mirroring on the head-end so that all of the data is available on each backup node. As someone pointed out to me, this approach achieves "maximum disk utilization" by consuming twice as much disk space. Bob =====================================Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
I don''t have any statistics myself, but if you search the forum for iscsi, there was a long thread a few months back with some performance figures. I didn''t really do that much testing myself, once I hit the iscsi bug there wasn''t any point doing much more. There has been some work on that recently though, and somebody here posted steps on how to compile the iscsi initiator in order to manually reduce the timeout which I plan to test as soon as I get enough free time at work. And no, you won''t find my e-mail address in my profile, I try not to publish it to cut down on spam, but if you send a mail to my username at googlemail.com it''ll get through. -- This message posted from opensolaris.org
You might want to look into the products from a company called DataCore Software, http://datacore.com/products/prod_home.asp. I''ve used them and they are great stuff. They make very high performing iSCSI and FC storage controllers out of leveraging commodity hardware, like the one comment of JBOD arrays earlier in this discussion. If you were to look at things like the Storage Performance Council, http://www.storageperformance.org/results/benchmark_results_spc1, or the VMTN, http://communities.vmware.com/thread/73745, you''ll see they beat all the popular storage arrays on the market. Since they are block based storage virtualization devices, they work just fine with ZFS, UFS, any Open Systems FS / O.S. Their high availability is true H/A with two stacks of disk and automatic failover and failback, very cool stuff. Ross wrote:> I don''t have any statistics myself, but if you search the forum for iscsi, there was a long thread a few months back with some performance figures. I didn''t really do that much testing myself, once I hit the iscsi bug there wasn''t any point doing much more. > > There has been some work on that recently though, and somebody here posted steps on how to compile the iscsi initiator in order to manually reduce the timeout which I plan to test as soon as I get enough free time at work. > > And no, you won''t find my e-mail address in my profile, I try not to publish it to cut down on spam, but if you send a mail to my username at googlemail.com it''ll get through. >