Stas Oskin
2008-Dec-03 09:30 UTC
[Gluster-users] Recommended underlining disk storage environment
Hi. What is the recommended underlining disk storage environment for GlusterFS? * file-system - EXT3, XFS... * technology - RAID0 (strip) or LVM 1/2 Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081203/bfd72c29/attachment.html>
Keith Freedman
2008-Dec-03 10:40 UTC
[Gluster-users] Recommended underlining disk storage environment
I'm not sure there's an official recommendation. I use XFS with much success. I think the choice of underlying filesystem depends highly on the types of data you'll be storing and how you'll be storing the info. If it's primarily read data, then a filesystem with journaling capabilities may not provide much benefit. If you'll have lots of files in few directories then a filesystem with better large directory metrix would be ideal, etc... Gluster depends on the underlying filesystem, and will work no matter what that filesystem is provided it supports extended attributes. I've found XFS works great for most purposes. If you're on Solaris, I'd recommend ZFS. but It seems people are fond of ReiserFS, but you could certainly use EXT3 with extended attributes enabled and be just fine most likely. as for LVM. again, this really depends what you want to do with the data. If you need to use multiple physical devices/partitions to present just one to gluster you can do that and use LVM to manage your resizing of the single logical volume. Alternatively, you could use gluster's Unify translator to present one effective large/consolidated volume which can be made up of multiple devices/partitions. In this scenario, you could potentially have multiple underlying configurations. You could Unify xfs, reiser, and ext3 filesystems into one gluster filesystem. as for RAID, again, the faster and more appropriately configured the underlying system for your data requirements, the better off you will be. If you're going to use gluster's AFR translator, then I'd not bother with hardware raid/mirroring and just use RAID0 stripes, however, if you have the money, and can afford to do RAID0+1, that's always a huge benefit on read performance. Of course, if you're in a high write environment, then there's no real added value so it's not worth doing. this doesn't realy answer your question, but hopefully it helps. At 01:30 AM 12/3/2008, Stas Oskin wrote:>Hi. > >What is the recommended underlining disk storage environment for GlusterFS? > >* file-system - EXT3, XFS... >* technology - RAID0 (strip) or LVM 1/2 > >Regards. >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-03 10:40 UTC
[Gluster-users] Recommended underlining disk storage environment
I''m not sure there''s an official recommendation. I use XFS with much success. I think the choice of underlying filesystem depends highly on the types of data you''ll be storing and how you''ll be storing the info. If it''s primarily read data, then a filesystem with journaling capabilities may not provide much benefit. If you''ll have lots of files in few directories then a filesystem with better large directory metrix would be ideal, etc... Gluster depends on the underlying filesystem, and will work no matter what that filesystem is provided it supports extended attributes. I''ve found XFS works great for most purposes. If you''re on Solaris, I''d recommend ZFS. but It seems people are fond of ReiserFS, but you could certainly use EXT3 with extended attributes enabled and be just fine most likely. as for LVM. again, this really depends what you want to do with the data. If you need to use multiple physical devices/partitions to present just one to gluster you can do that and use LVM to manage your resizing of the single logical volume. Alternatively, you could use gluster''s Unify translator to present one effective large/consolidated volume which can be made up of multiple devices/partitions. In this scenario, you could potentially have multiple underlying configurations. You could Unify xfs, reiser, and ext3 filesystems into one gluster filesystem. as for RAID, again, the faster and more appropriately configured the underlying system for your data requirements, the better off you will be. If you''re going to use gluster''s AFR translator, then I''d not bother with hardware raid/mirroring and just use RAID0 stripes, however, if you have the money, and can afford to do RAID0+1, that''s always a huge benefit on read performance. Of course, if you''re in a high write environment, then there''s no real added value so it''s not worth doing. this doesn''t realy answer your question, but hopefully it helps. At 01:30 AM 12/3/2008, Stas Oskin wrote:>Hi. > >What is the recommended underlining disk storage environment for GlusterFS? > >* file-system - EXT3, XFS... >* technology - RAID0 (strip) or LVM 1/2 > >Regards. >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-03 10:40 UTC
[Gluster-users] Recommended underlining disk storage environment
I''m not sure there''s an official recommendation. I use XFS with much success. I think the choice of underlying filesystem depends highly on the types of data you''ll be storing and how you''ll be storing the info. If it''s primarily read data, then a filesystem with journaling capabilities may not provide much benefit. If you''ll have lots of files in few directories then a filesystem with better large directory metrix would be ideal, etc... Gluster depends on the underlying filesystem, and will work no matter what that filesystem is provided it supports extended attributes. I''ve found XFS works great for most purposes. If you''re on Solaris, I''d recommend ZFS. but It seems people are fond of ReiserFS, but you could certainly use EXT3 with extended attributes enabled and be just fine most likely. as for LVM. again, this really depends what you want to do with the data. If you need to use multiple physical devices/partitions to present just one to gluster you can do that and use LVM to manage your resizing of the single logical volume. Alternatively, you could use gluster''s Unify translator to present one effective large/consolidated volume which can be made up of multiple devices/partitions. In this scenario, you could potentially have multiple underlying configurations. You could Unify xfs, reiser, and ext3 filesystems into one gluster filesystem. as for RAID, again, the faster and more appropriately configured the underlying system for your data requirements, the better off you will be. If you''re going to use gluster''s AFR translator, then I''d not bother with hardware raid/mirroring and just use RAID0 stripes, however, if you have the money, and can afford to do RAID0+1, that''s always a huge benefit on read performance. Of course, if you''re in a high write environment, then there''s no real added value so it''s not worth doing. this doesn''t realy answer your question, but hopefully it helps. At 01:30 AM 12/3/2008, Stas Oskin wrote:>Hi. > >What is the recommended underlining disk storage environment for GlusterFS? > >* file-system - EXT3, XFS... >* technology - RAID0 (strip) or LVM 1/2 > >Regards. >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Keith Freedman
2008-Dec-03 10:40 UTC
[Gluster-users] Recommended underlining disk storage environment
I''m not sure there''s an official recommendation. I use XFS with much success. I think the choice of underlying filesystem depends highly on the types of data you''ll be storing and how you''ll be storing the info. If it''s primarily read data, then a filesystem with journaling capabilities may not provide much benefit. If you''ll have lots of files in few directories then a filesystem with better large directory metrix would be ideal, etc... Gluster depends on the underlying filesystem, and will work no matter what that filesystem is provided it supports extended attributes. I''ve found XFS works great for most purposes. If you''re on Solaris, I''d recommend ZFS. but It seems people are fond of ReiserFS, but you could certainly use EXT3 with extended attributes enabled and be just fine most likely. as for LVM. again, this really depends what you want to do with the data. If you need to use multiple physical devices/partitions to present just one to gluster you can do that and use LVM to manage your resizing of the single logical volume. Alternatively, you could use gluster''s Unify translator to present one effective large/consolidated volume which can be made up of multiple devices/partitions. In this scenario, you could potentially have multiple underlying configurations. You could Unify xfs, reiser, and ext3 filesystems into one gluster filesystem. as for RAID, again, the faster and more appropriately configured the underlying system for your data requirements, the better off you will be. If you''re going to use gluster''s AFR translator, then I''d not bother with hardware raid/mirroring and just use RAID0 stripes, however, if you have the money, and can afford to do RAID0+1, that''s always a huge benefit on read performance. Of course, if you''re in a high write environment, then there''s no real added value so it''s not worth doing. this doesn''t realy answer your question, but hopefully it helps. At 01:30 AM 12/3/2008, Stas Oskin wrote:>Hi. > >What is the recommended underlining disk storage environment for GlusterFS? > >* file-system - EXT3, XFS... >* technology - RAID0 (strip) or LVM 1/2 > >Regards. >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users
Stas Oskin
2008-Dec-03 12:00 UTC
[Gluster-users] Recommended underlining disk storage environment
Hi. Thanks for your detailed answers. I'd like to clarify several points: 2008/12/3 Keith Freedman <freedman at freeformit.com>> I'm not sure there's an official recommendation. > I use XFS with much success. >Is XFS suitable for massive writing / occasional reading?> > I think the choice of underlying filesystem depends highly on the types of > data you'll be storing and how you'll be storing the info. > If it's primarily read data, then a filesystem with journaling capabilities > may not provide much benefit. If you'll have lots of files in few > directories then a filesystem with better large directory metrix would be > ideal, etc... Gluster depends on the underlying filesystem, and will work > no matter what that filesystem is provided it supports extended attributes. >I'm going to store mostly large files (100+ MB), with massive writing, and only occasional read operations.> > I've found XFS works great for most purposes. If you're on Solaris, I'd > recommend ZFS. but It seems people are fond of ReiserFS, but you could > certainly use EXT3 with extended attributes enabled and be just fine most > likely. >I'm actually prefer to stay on Linux. How well XFS compares to EXT3 in the environment that I described?> > as for LVM. again, this really depends what you want to do with the data. > If you need to use multiple physical devices/partitions to present just one > to gluster you can do that and use LVM to manage your resizing of the single > logical volume.This was the first idea I though about, as I'm going to use 4 disks per server.> > Alternatively, you could use gluster's Unify translator to present one > effective large/consolidated volume which can be made up of multiple > devices/partitions. >I think I read somewhere in this mailing list that there is a migration from Unity to DHT in GlusterFS (whichever it means) in the coming 1.4. If Unity is the legacy approach, what is the relevant solution for 1.4 (DHT)?> > In this scenario, you could potentially have multiple underlying > configurations. You could Unify xfs, reiser, and ext3 filesystems into one > gluster filesystem. > > as for RAID, again, the faster and more appropriately configured the > underlying system for your data requirements, the better off you will be. > If you're going to use gluster's AFR translator, then I'd not bother with > hardware raid/mirroring and just use RAID0 stripes, however, if you have the > money, and can afford to do RAID0+1, that's always a huge benefit on read > performance. Of course, if you're in a high write environment, then there's > no real added value so it's not worth doing. >Couple of points here: 1) Thanks to AFR, I actually don't need any fault-tolerant raid (like mirror), so it's only recommeded in high-volume read enviroments, which is not the case here. Is this correct? 2) Isn't LVM (or GlusterFS own solution) much better then RAID 0 in sense that if one of the disks go, the volume still continues to work? This contrary to RAID where the whole volume goes down? 3) Continuing 2, I think I actually meant JBOD - where you just connect all the drives and make them look as a single device, rather then stripping. If you could clarfy the recommended approach, it would be great.> > this doesn't realy answer your question, but hopefully it helps. > > >Thanks again for your help. Regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20081203/f944f8f2/attachment.html>
Keith Freedman
2008-Dec-03 18:25 UTC
[Gluster-users] Recommended underlining disk storage environment
At 04:00 AM 12/3/2008, Stas Oskin wrote:>Hi. > >Thanks for your detailed answers. I''d like to clarify several points: > >2008/12/3 Keith Freedman ><<mailto:freedman at freeformit.com>freedman at freeformit.com> >I''m not sure there''s an official recommendation. >I use XFS with much success. > > >Is XFS suitable for massive writing / occasional reading?XFS is more optimal than EXT3 or ReiserFS for write environments: some useful information is here: http://www.ibm.com/developerworks/library/l-fs9.html I''d pay close attention to the "Delayed allocation" section.>I think the choice of underlying filesystem depends highly on the >types of data you''ll be storing and how you''ll be storing the info. >If it''s primarily read data, then a filesystem with journaling >capabilities may not provide much benefit. If you''ll have lots of >files in few directories then a filesystem with better large >directory metrix would be ideal, etc... Gluster depends on the >underlying filesystem, and will work no matter what that filesystem >is provided it supports extended attributes. > > >I''m going to store mostly large files (100+ MB), with massive >writing, and only occasional read operations. > > >I''ve found XFS works great for most purposes. If you''re on Solaris, >I''d recommend ZFS. but It seems people are fond of ReiserFS, but >you could certainly use EXT3 with extended attributes enabled and be >just fine most likely. > > >I''m actually prefer to stay on Linux. How well XFS compares to EXT3 >in the environment that I described?They''re all Linux filesystems, so that''s not the issue.>as for LVM. again, this really depends what you want to do with the data. >If you need to use multiple physical devices/partitions to present >just one to gluster you can do that and use LVM to manage your >resizing of the single logical volume. > >This was the first idea I though about, as I''m going to use 4 disks >per server. > >Alternatively, you could use gluster''s Unify translator to present >one effective large/consolidated volume which can be made up of >multiple devices/partitions. > > >I think I read somewhere in this mailing list that there is a >migration from Unity to DHT in GlusterFS (whichever it means) in the >coming 1.4. If Unity is the legacy approach, what is the relevant >solution for 1.4 (DHT)?the approach is the same. I belive the concept is that there''s a translator that groups multiple smaller smaller filesystem pieces into a single representation. Gluster lets you do this through the filesystem where LVM lets you do this through the block devices. Personally, I''d go with LVM since it''s likely easier to manage in the long run and gives you more flexibility. You can grow your LVM volumen, and you can, if you go with XFS, dynamically resize your filesystem, and you wont have to make any changes to your gluster config.> >In this scenario, you could potentially have multiple underlying >configurations. You could Unify xfs, reiser, and ext3 filesystems >into one gluster filesystem. > >as for RAID, again, the faster and more appropriately configured the >underlying system for your data requirements, the better off you >will be. If you''re going to use gluster''s AFR translator, then I''d >not bother with hardware raid/mirroring and just use RAID0 stripes, >however, if you have the money, and can afford to do RAID0+1, that''s >always a huge benefit on read performance. Of course, if you''re in >a high write environment, then there''s no real added value so it''s >not worth doing. > > >Couple of points here: >1) Thanks to AFR, I actually don''t need any fault-tolerant raid >(like mirror), so it''s only recommeded in high-volume read >enviroments, which is not the case here. Is this correct?you can use AFR as your fault-tolerance/mirror. However, be aware that this means your "mirroring" wil be going at network speed. If you have no need to have multiple servers with live replicated data, you''ll be much better off performance wise using hardware mirroring. However, if you want/need to have multiple servers serving identical data, then just use AFR and then you can live without hardware mirroring. I''m not sure how gluster/AFR will perform with a very large file high-write environment. We''ll have to see what the gluster devs say about it, but what I can say is this: In the event your AFR servers loose contact and then later have to auto-heal, gluster will have to move the entire large file, since it doesn''t, as far as I know, have rsync like capabilities wherein it would only move the modified bits of the file over the network--I believe it just copies over the whole thing, so if this happens a lot, it will bog things down significantly.>2) Isn''t LVM (or GlusterFS own solution) much better then RAID 0 in >sense that if one of the disks go, the volume still continues to >work? This contrary to RAID where the whole volume goes down?you''re confused about what RAID means. yes, RAID0 (striping), there is no redundancy. RAID 1 (mirroring) provides redundancy and if one drive fails the volume still functions -- you can do this with hardware or, I believe, with LVM. Then there''s RAID0+1 (Striping & mirroring) which provides the performance benefit of striping with the high-availability of mirroring. So whether or not you use LVM to do your raid or a hardware raid controller doesn''t change anything. RAID 0 you have a volume down in a failure, RAID 1 you can withstand a drive failure.>3) Continuing 2, I think I actually meant JBOD - where you just >connect all the drives and make them look as a single device, rather >then stripping.Right, however this presents the same issues as striping but without the performance benefit of striping. Lets say you have AFR set up and you have a 4 disk stripe or concatenated (jbod) volume on each of 2 servers. if you have a single drive failure on one server, that entire filesystem becomes unavailable. When you repair your drive, you effectively have a blank empty filesystem now. gluster/AFR will notice this and start auto-healing the entire filesystem (as each directory and file are accessed). so in time you''ll have copied over the entire filesystem over the network. However, if you have a single server and you mirror your devices in a RAID1/0+1 config, then you loose a drive, you''re filesystem is still running, replace the drive and the RAID software fixes everything. AFR is much more efficient in high read environments since you can either distribute the load across multiple servers and specify a local read volume to insure a particular client is always using the fastest server (which could be it''s own local brick, or a server on a lan when you''re using AFR across a wan).>If you could clarfy the recommended approach, it would be great.so here''s a summary: IF you do NOT need to have more than one server serving the data (i.e, you''re not going to replicate the data for DR purposes) I''d recommend you avoid AFR in gluster and instead configure RAID0+1 on your server. You''d be better off using a hardware RAID controller with a large batter backed up cache, but you could use a software RAID (like LVM). if you said you had a high read environment, I''d have suggested 2 servers using AFR over a private high-speed network since that reduces your points of failure, but given the high write large file environment, AFR may become a bottleneck. --again, if you NEED server redundancy, then AFR is your best option, but if you don''t need it then it will just slow things down.>this doesn''t realy answer your question, but hopefully it helps. > > > >Thanks again for your help. > >Regards.
Keith Freedman
2008-Dec-03 18:25 UTC
[Gluster-users] Recommended underlining disk storage environment
At 04:00 AM 12/3/2008, Stas Oskin wrote:>Hi. > >Thanks for your detailed answers. I''d like to clarify several points: > >2008/12/3 Keith Freedman ><<mailto:freedman at freeformit.com>freedman at freeformit.com> >I''m not sure there''s an official recommendation. >I use XFS with much success. > > >Is XFS suitable for massive writing / occasional reading?XFS is more optimal than EXT3 or ReiserFS for write environments: some useful information is here: http://www.ibm.com/developerworks/library/l-fs9.html I''d pay close attention to the "Delayed allocation" section.>I think the choice of underlying filesystem depends highly on the >types of data you''ll be storing and how you''ll be storing the info. >If it''s primarily read data, then a filesystem with journaling >capabilities may not provide much benefit. If you''ll have lots of >files in few directories then a filesystem with better large >directory metrix would be ideal, etc... Gluster depends on the >underlying filesystem, and will work no matter what that filesystem >is provided it supports extended attributes. > > >I''m going to store mostly large files (100+ MB), with massive >writing, and only occasional read operations. > > >I''ve found XFS works great for most purposes. If you''re on Solaris, >I''d recommend ZFS. but It seems people are fond of ReiserFS, but >you could certainly use EXT3 with extended attributes enabled and be >just fine most likely. > > >I''m actually prefer to stay on Linux. How well XFS compares to EXT3 >in the environment that I described?They''re all Linux filesystems, so that''s not the issue.>as for LVM. again, this really depends what you want to do with the data. >If you need to use multiple physical devices/partitions to present >just one to gluster you can do that and use LVM to manage your >resizing of the single logical volume. > >This was the first idea I though about, as I''m going to use 4 disks >per server. > >Alternatively, you could use gluster''s Unify translator to present >one effective large/consolidated volume which can be made up of >multiple devices/partitions. > > >I think I read somewhere in this mailing list that there is a >migration from Unity to DHT in GlusterFS (whichever it means) in the >coming 1.4. If Unity is the legacy approach, what is the relevant >solution for 1.4 (DHT)?the approach is the same. I belive the concept is that there''s a translator that groups multiple smaller smaller filesystem pieces into a single representation. Gluster lets you do this through the filesystem where LVM lets you do this through the block devices. Personally, I''d go with LVM since it''s likely easier to manage in the long run and gives you more flexibility. You can grow your LVM volumen, and you can, if you go with XFS, dynamically resize your filesystem, and you wont have to make any changes to your gluster config.> >In this scenario, you could potentially have multiple underlying >configurations. You could Unify xfs, reiser, and ext3 filesystems >into one gluster filesystem. > >as for RAID, again, the faster and more appropriately configured the >underlying system for your data requirements, the better off you >will be. If you''re going to use gluster''s AFR translator, then I''d >not bother with hardware raid/mirroring and just use RAID0 stripes, >however, if you have the money, and can afford to do RAID0+1, that''s >always a huge benefit on read performance. Of course, if you''re in >a high write environment, then there''s no real added value so it''s >not worth doing. > > >Couple of points here: >1) Thanks to AFR, I actually don''t need any fault-tolerant raid >(like mirror), so it''s only recommeded in high-volume read >enviroments, which is not the case here. Is this correct?you can use AFR as your fault-tolerance/mirror. However, be aware that this means your "mirroring" wil be going at network speed. If you have no need to have multiple servers with live replicated data, you''ll be much better off performance wise using hardware mirroring. However, if you want/need to have multiple servers serving identical data, then just use AFR and then you can live without hardware mirroring. I''m not sure how gluster/AFR will perform with a very large file high-write environment. We''ll have to see what the gluster devs say about it, but what I can say is this: In the event your AFR servers loose contact and then later have to auto-heal, gluster will have to move the entire large file, since it doesn''t, as far as I know, have rsync like capabilities wherein it would only move the modified bits of the file over the network--I believe it just copies over the whole thing, so if this happens a lot, it will bog things down significantly.>2) Isn''t LVM (or GlusterFS own solution) much better then RAID 0 in >sense that if one of the disks go, the volume still continues to >work? This contrary to RAID where the whole volume goes down?you''re confused about what RAID means. yes, RAID0 (striping), there is no redundancy. RAID 1 (mirroring) provides redundancy and if one drive fails the volume still functions -- you can do this with hardware or, I believe, with LVM. Then there''s RAID0+1 (Striping & mirroring) which provides the performance benefit of striping with the high-availability of mirroring. So whether or not you use LVM to do your raid or a hardware raid controller doesn''t change anything. RAID 0 you have a volume down in a failure, RAID 1 you can withstand a drive failure.>3) Continuing 2, I think I actually meant JBOD - where you just >connect all the drives and make them look as a single device, rather >then stripping.Right, however this presents the same issues as striping but without the performance benefit of striping. Lets say you have AFR set up and you have a 4 disk stripe or concatenated (jbod) volume on each of 2 servers. if you have a single drive failure on one server, that entire filesystem becomes unavailable. When you repair your drive, you effectively have a blank empty filesystem now. gluster/AFR will notice this and start auto-healing the entire filesystem (as each directory and file are accessed). so in time you''ll have copied over the entire filesystem over the network. However, if you have a single server and you mirror your devices in a RAID1/0+1 config, then you loose a drive, you''re filesystem is still running, replace the drive and the RAID software fixes everything. AFR is much more efficient in high read environments since you can either distribute the load across multiple servers and specify a local read volume to insure a particular client is always using the fastest server (which could be it''s own local brick, or a server on a lan when you''re using AFR across a wan).>If you could clarfy the recommended approach, it would be great.so here''s a summary: IF you do NOT need to have more than one server serving the data (i.e, you''re not going to replicate the data for DR purposes) I''d recommend you avoid AFR in gluster and instead configure RAID0+1 on your server. You''d be better off using a hardware RAID controller with a large batter backed up cache, but you could use a software RAID (like LVM). if you said you had a high read environment, I''d have suggested 2 servers using AFR over a private high-speed network since that reduces your points of failure, but given the high write large file environment, AFR may become a bottleneck. --again, if you NEED server redundancy, then AFR is your best option, but if you don''t need it then it will just slow things down.>this doesn''t realy answer your question, but hopefully it helps. > > > >Thanks again for your help. > >Regards.
Keith Freedman
2008-Dec-03 18:25 UTC
[Gluster-users] Recommended underlining disk storage environment
At 04:00 AM 12/3/2008, Stas Oskin wrote:>Hi. > >Thanks for your detailed answers. I''d like to clarify several points: > >2008/12/3 Keith Freedman ><<mailto:freedman at freeformit.com>freedman at freeformit.com> >I''m not sure there''s an official recommendation. >I use XFS with much success. > > >Is XFS suitable for massive writing / occasional reading?XFS is more optimal than EXT3 or ReiserFS for write environments: some useful information is here: http://www.ibm.com/developerworks/library/l-fs9.html I''d pay close attention to the "Delayed allocation" section.>I think the choice of underlying filesystem depends highly on the >types of data you''ll be storing and how you''ll be storing the info. >If it''s primarily read data, then a filesystem with journaling >capabilities may not provide much benefit. If you''ll have lots of >files in few directories then a filesystem with better large >directory metrix would be ideal, etc... Gluster depends on the >underlying filesystem, and will work no matter what that filesystem >is provided it supports extended attributes. > > >I''m going to store mostly large files (100+ MB), with massive >writing, and only occasional read operations. > > >I''ve found XFS works great for most purposes. If you''re on Solaris, >I''d recommend ZFS. but It seems people are fond of ReiserFS, but >you could certainly use EXT3 with extended attributes enabled and be >just fine most likely. > > >I''m actually prefer to stay on Linux. How well XFS compares to EXT3 >in the environment that I described?They''re all Linux filesystems, so that''s not the issue.>as for LVM. again, this really depends what you want to do with the data. >If you need to use multiple physical devices/partitions to present >just one to gluster you can do that and use LVM to manage your >resizing of the single logical volume. > >This was the first idea I though about, as I''m going to use 4 disks >per server. > >Alternatively, you could use gluster''s Unify translator to present >one effective large/consolidated volume which can be made up of >multiple devices/partitions. > > >I think I read somewhere in this mailing list that there is a >migration from Unity to DHT in GlusterFS (whichever it means) in the >coming 1.4. If Unity is the legacy approach, what is the relevant >solution for 1.4 (DHT)?the approach is the same. I belive the concept is that there''s a translator that groups multiple smaller smaller filesystem pieces into a single representation. Gluster lets you do this through the filesystem where LVM lets you do this through the block devices. Personally, I''d go with LVM since it''s likely easier to manage in the long run and gives you more flexibility. You can grow your LVM volumen, and you can, if you go with XFS, dynamically resize your filesystem, and you wont have to make any changes to your gluster config.> >In this scenario, you could potentially have multiple underlying >configurations. You could Unify xfs, reiser, and ext3 filesystems >into one gluster filesystem. > >as for RAID, again, the faster and more appropriately configured the >underlying system for your data requirements, the better off you >will be. If you''re going to use gluster''s AFR translator, then I''d >not bother with hardware raid/mirroring and just use RAID0 stripes, >however, if you have the money, and can afford to do RAID0+1, that''s >always a huge benefit on read performance. Of course, if you''re in >a high write environment, then there''s no real added value so it''s >not worth doing. > > >Couple of points here: >1) Thanks to AFR, I actually don''t need any fault-tolerant raid >(like mirror), so it''s only recommeded in high-volume read >enviroments, which is not the case here. Is this correct?you can use AFR as your fault-tolerance/mirror. However, be aware that this means your "mirroring" wil be going at network speed. If you have no need to have multiple servers with live replicated data, you''ll be much better off performance wise using hardware mirroring. However, if you want/need to have multiple servers serving identical data, then just use AFR and then you can live without hardware mirroring. I''m not sure how gluster/AFR will perform with a very large file high-write environment. We''ll have to see what the gluster devs say about it, but what I can say is this: In the event your AFR servers loose contact and then later have to auto-heal, gluster will have to move the entire large file, since it doesn''t, as far as I know, have rsync like capabilities wherein it would only move the modified bits of the file over the network--I believe it just copies over the whole thing, so if this happens a lot, it will bog things down significantly.>2) Isn''t LVM (or GlusterFS own solution) much better then RAID 0 in >sense that if one of the disks go, the volume still continues to >work? This contrary to RAID where the whole volume goes down?you''re confused about what RAID means. yes, RAID0 (striping), there is no redundancy. RAID 1 (mirroring) provides redundancy and if one drive fails the volume still functions -- you can do this with hardware or, I believe, with LVM. Then there''s RAID0+1 (Striping & mirroring) which provides the performance benefit of striping with the high-availability of mirroring. So whether or not you use LVM to do your raid or a hardware raid controller doesn''t change anything. RAID 0 you have a volume down in a failure, RAID 1 you can withstand a drive failure.>3) Continuing 2, I think I actually meant JBOD - where you just >connect all the drives and make them look as a single device, rather >then stripping.Right, however this presents the same issues as striping but without the performance benefit of striping. Lets say you have AFR set up and you have a 4 disk stripe or concatenated (jbod) volume on each of 2 servers. if you have a single drive failure on one server, that entire filesystem becomes unavailable. When you repair your drive, you effectively have a blank empty filesystem now. gluster/AFR will notice this and start auto-healing the entire filesystem (as each directory and file are accessed). so in time you''ll have copied over the entire filesystem over the network. However, if you have a single server and you mirror your devices in a RAID1/0+1 config, then you loose a drive, you''re filesystem is still running, replace the drive and the RAID software fixes everything. AFR is much more efficient in high read environments since you can either distribute the load across multiple servers and specify a local read volume to insure a particular client is always using the fastest server (which could be it''s own local brick, or a server on a lan when you''re using AFR across a wan).>If you could clarfy the recommended approach, it would be great.so here''s a summary: IF you do NOT need to have more than one server serving the data (i.e, you''re not going to replicate the data for DR purposes) I''d recommend you avoid AFR in gluster and instead configure RAID0+1 on your server. You''d be better off using a hardware RAID controller with a large batter backed up cache, but you could use a software RAID (like LVM). if you said you had a high read environment, I''d have suggested 2 servers using AFR over a private high-speed network since that reduces your points of failure, but given the high write large file environment, AFR may become a bottleneck. --again, if you NEED server redundancy, then AFR is your best option, but if you don''t need it then it will just slow things down.>this doesn''t realy answer your question, but hopefully it helps. > > > >Thanks again for your help. > >Regards.
Keith Freedman
2008-Dec-06 00:36 UTC
[Gluster-users] Recommended underlining disk storage environment
At 04:15 PM 12/5/2008, Stas Oskin wrote:>Hi. > >Thanks so much for your replies, they had given me a good head-start. > >Few remaining questions: > > >you first expand the underlying block device with LVM, then you grow >your filesystem. some filesystems support this, some dont. > > >Isn''t this usually reversed - first you grow the underlining >file-system, then you increase the LVM size?the filesystem cannot exceed the size of the device it''s sitting on. if the block device or logical volume is 200GB, you can''t expand the filesystem. so you first expand the volume/block device to 300GB then grow the filesystem to 300Gb, for example.>if you have 3 drives stripped together and one filesystem on top of >them ,then you will have a problem. >if you have 3 drives each with their own filesystem on top and you >"unify" that with gluster or something then you can keep running but >will loose access to those files. > > >Actually, this sounds as a good idea! By having all the drives >unified via GlusterFS, this basically means any of them could be >lost, but it won''t influence the other drives on same server. > >Have you ever tried such setup?not with gluster.. and there are performance advantages. with a LVM stripe, you''re data reads are distributed over mutliple physical devices, however with Unify, you''d be reading any individual file form only one spindle. However, this is the price we pay for availability, so I think it depends on your performance requirements. If you dont need blazing fast read''s then unify will give you better availability.>Also, I presume it still would be possible to have one of the disks >to function as the system disk? In the event it''s lost, a simple >restore of the root, boot and swap partitions to a new disk + AFR >healing for the data should do the job. What you think?any sub-directory can be the root of the gluster filesystem, so you could have this example: /dev/sda1 / /dev/sda2 /boot /dev/sda3 /home /dev/sdb /home2 /dev/sdc1 /home3 /dev/adc2 /junk and then unify /tree1 with /home, /home2, /home3, /junk/stuff/home4 or something like that.>at some point you''ll saturate something. you''ll either saturate >your disk I/O or your network. most likely the network, so try and >make sure that the network you use for the AFR connections doesn''t >have anything else competing for the bandwidth and I think you''ll be fine. > > >This makes sense indeed. > >By the way, how do you manage all the bricks? >Do you have some centralized way to add new breaks and update the >config files for clients/servers?My configuration is pretty simple. I have one brick on each server using AFR betwen them. However, I believe they have a few features targeted for 1.5 which will allow dynamic reconfiguration as well as a configuration editor/manager which will simplify things However, once you''re comfortable with the way the config files are parsed, you''ll get the hang of it. but if you''re going to re-configure your setup frequently, then it''ll get inconvenient pretty quickly. Keith
Keith Freedman
2008-Dec-06 00:36 UTC
[Gluster-users] Recommended underlining disk storage environment
At 04:15 PM 12/5/2008, Stas Oskin wrote:>Hi. > >Thanks so much for your replies, they had given me a good head-start. > >Few remaining questions: > > >you first expand the underlying block device with LVM, then you grow >your filesystem. some filesystems support this, some dont. > > >Isn''t this usually reversed - first you grow the underlining >file-system, then you increase the LVM size?the filesystem cannot exceed the size of the device it''s sitting on. if the block device or logical volume is 200GB, you can''t expand the filesystem. so you first expand the volume/block device to 300GB then grow the filesystem to 300Gb, for example.>if you have 3 drives stripped together and one filesystem on top of >them ,then you will have a problem. >if you have 3 drives each with their own filesystem on top and you >"unify" that with gluster or something then you can keep running but >will loose access to those files. > > >Actually, this sounds as a good idea! By having all the drives >unified via GlusterFS, this basically means any of them could be >lost, but it won''t influence the other drives on same server. > >Have you ever tried such setup?not with gluster.. and there are performance advantages. with a LVM stripe, you''re data reads are distributed over mutliple physical devices, however with Unify, you''d be reading any individual file form only one spindle. However, this is the price we pay for availability, so I think it depends on your performance requirements. If you dont need blazing fast read''s then unify will give you better availability.>Also, I presume it still would be possible to have one of the disks >to function as the system disk? In the event it''s lost, a simple >restore of the root, boot and swap partitions to a new disk + AFR >healing for the data should do the job. What you think?any sub-directory can be the root of the gluster filesystem, so you could have this example: /dev/sda1 / /dev/sda2 /boot /dev/sda3 /home /dev/sdb /home2 /dev/sdc1 /home3 /dev/adc2 /junk and then unify /tree1 with /home, /home2, /home3, /junk/stuff/home4 or something like that.>at some point you''ll saturate something. you''ll either saturate >your disk I/O or your network. most likely the network, so try and >make sure that the network you use for the AFR connections doesn''t >have anything else competing for the bandwidth and I think you''ll be fine. > > >This makes sense indeed. > >By the way, how do you manage all the bricks? >Do you have some centralized way to add new breaks and update the >config files for clients/servers?My configuration is pretty simple. I have one brick on each server using AFR betwen them. However, I believe they have a few features targeted for 1.5 which will allow dynamic reconfiguration as well as a configuration editor/manager which will simplify things However, once you''re comfortable with the way the config files are parsed, you''ll get the hang of it. but if you''re going to re-configure your setup frequently, then it''ll get inconvenient pretty quickly. Keith
Keith Freedman
2008-Dec-06 00:36 UTC
[Gluster-users] Recommended underlining disk storage environment
At 04:15 PM 12/5/2008, Stas Oskin wrote:>Hi. > >Thanks so much for your replies, they had given me a good head-start. > >Few remaining questions: > > >you first expand the underlying block device with LVM, then you grow >your filesystem. some filesystems support this, some dont. > > >Isn''t this usually reversed - first you grow the underlining >file-system, then you increase the LVM size?the filesystem cannot exceed the size of the device it''s sitting on. if the block device or logical volume is 200GB, you can''t expand the filesystem. so you first expand the volume/block device to 300GB then grow the filesystem to 300Gb, for example.>if you have 3 drives stripped together and one filesystem on top of >them ,then you will have a problem. >if you have 3 drives each with their own filesystem on top and you >"unify" that with gluster or something then you can keep running but >will loose access to those files. > > >Actually, this sounds as a good idea! By having all the drives >unified via GlusterFS, this basically means any of them could be >lost, but it won''t influence the other drives on same server. > >Have you ever tried such setup?not with gluster.. and there are performance advantages. with a LVM stripe, you''re data reads are distributed over mutliple physical devices, however with Unify, you''d be reading any individual file form only one spindle. However, this is the price we pay for availability, so I think it depends on your performance requirements. If you dont need blazing fast read''s then unify will give you better availability.>Also, I presume it still would be possible to have one of the disks >to function as the system disk? In the event it''s lost, a simple >restore of the root, boot and swap partitions to a new disk + AFR >healing for the data should do the job. What you think?any sub-directory can be the root of the gluster filesystem, so you could have this example: /dev/sda1 / /dev/sda2 /boot /dev/sda3 /home /dev/sdb /home2 /dev/sdc1 /home3 /dev/adc2 /junk and then unify /tree1 with /home, /home2, /home3, /junk/stuff/home4 or something like that.>at some point you''ll saturate something. you''ll either saturate >your disk I/O or your network. most likely the network, so try and >make sure that the network you use for the AFR connections doesn''t >have anything else competing for the bandwidth and I think you''ll be fine. > > >This makes sense indeed. > >By the way, how do you manage all the bricks? >Do you have some centralized way to add new breaks and update the >config files for clients/servers?My configuration is pretty simple. I have one brick on each server using AFR betwen them. However, I believe they have a few features targeted for 1.5 which will allow dynamic reconfiguration as well as a configuration editor/manager which will simplify things However, once you''re comfortable with the way the config files are parsed, you''ll get the hang of it. but if you''re going to re-configure your setup frequently, then it''ll get inconvenient pretty quickly. Keith