Hi All, I have been trying out XFS given it is going to be the file system of choice from upstream in el7. Starting with an Adaptec ASR71605 populated with sixteen 4TB WD enterprise hard drives. The version of OS is 6.4 x86_64 and has 64G of RAM. This next part was not well researched as I had a colleague bothering me late on Xmas Eve that he needed 14 TB immediately to move data to from an HPC cluster. I built an XFS file system straight onto the (raid 6) logical device made up of all sixteen drives with.> mkfs.xfs -d su=512k,sw=14 /dev/sdawhere "512k" is the Stripe-unit size of the single logical device built on the raid controller. "14" is from the total number of drives minus two (raid 6 redundancy). Any comments on the above from XFS users would be helpful! I mounted the filesystem with the default options assuming they would be sensible but I now believe I should have specified the "inode64" mount option to avoid all the inodes will being stuck in the first TB. The filesystem however is at 87% and does not seem to have had any issues/problems.> df -h | grep raid/dev/sda 51T 45T 6.7T 87% /raidstor Another question is could I now safely remount with the "inode64" option or will this cause problems in the future? I read this below in the XFS FAQ but wondered if it have been fixed (backported?) into el6.4? ""Starting from kernel 2.6.35, you can try and then switch back. Older kernels have a bug leading to strange problems if you mount without inode64 again. For example, you can't access files & dirs that have been created with an inode >32bit anymore."" I also noted that "xfs_check" ran out of memory and so after some reading noted that it is reccommended to use "xfs_repair -n -vv" instead as it uses far less memory. One remark is so why is "xfs_check" there at all? I do have the option of moving the data elsewhere and rebuilding but this would cause some problems. Any advice much appreciated. Steve
On 2014-01-21, Steve Brooks <steveb at mcs.st-and.ac.uk> wrote:> >> mkfs.xfs -d su=512k,sw=14 /dev/sda > > where "512k" is the Stripe-unit size of the single logical device built on > the raid controller. "14" is from the total number of drives minus two > (raid 6 redundancy).The usual advice on the XFS list is to use the defaults where possible. But you might want to ask there to see if they have any specific advice.> I mounted the filesystem with the default options assuming they would be > sensible but I now believe I should have specified the "inode64" mount > option to avoid all the inodes will being stuck in the first TB. > > The filesystem however is at 87% and does not seem to have had any > issues/problems. > >> df -h | grep raid > /dev/sda 51T 45T 6.7T 87% /raidstorWow, impressive! I know of a much smaller fs which got bit by this issue. What probably happened is, as a new fs, the entire first 1TB was able to be reserved for inodes.> Another question is could I now safely remount with the "inode64" option > or will this cause problems in the future? I read this below in the XFS > FAQ but wondered if it have been fixed (backported?) into el6.4?I have mounted a large XFS fs that previously didn't use inode64 with it, and it went fine. (I did not attempt to roll back.) You *must* umount and remount for this option to take effect. I do not know when the inode64 option made it to CentOS, but it is there now.> I also noted that "xfs_check" ran out of memory and so after some reading > noted that it is reccommended to use "xfs_repair -n -vv" instead as it > uses far less memory. One remark is so why is "xfs_check" there at all?The XFS team is working on deprecating it. But on a 51TB filesystem xfs_repair will still use a lot of memory. Using -P can help, but it'll still use quite a bit (depending on the extent of any damage and how many inodes, probably a bunch of other factors I don't know). --keith -- kkeller at wombat.san-francisco.ca.us
Hi, ----- Original Message ----- | | Hi All, | | I have been trying out XFS given it is going to be the file system of | choice from upstream in el7. Starting with an Adaptec ASR71605 | populated | with sixteen 4TB WD enterprise hard drives. The version of OS is 6.4 | x86_64 and has 64G of RAM. Good! You're going to need it with a volume that large! | This next part was not well researched as I had a colleague bothering | me | late on Xmas Eve that he needed 14 TB immediately to move data to | from an | HPC cluster. I built an XFS file system straight onto the (raid 6) | logical | device made up of all sixteen drives with. | | | > mkfs.xfs -d su=512k,sw=14 /dev/sda | | | where "512k" is the Stripe-unit size of the single logical device | built on | the raid controller. "14" is from the total number of drives minus | two | (raid 6 redundancy). Whoa! What kind of data are you writing to disk? I hope they're files that are typically large to account for such a large stripe unit or you're going to lose a lot of the performance benefits. It will write quite a bit of data to an individual drive in the RAID this way. | Any comments on the above from XFS users would be helpful! | | I mounted the filesystem with the default options assuming they would | be | sensible but I now believe I should have specified the "inode64" | mount | option to avoid all the inodes will being stuck in the first TB. | | The filesystem however is at 87% and does not seem to have had any | issues/problems. | | > df -h | grep raid | /dev/sda 51T 45T 6.7T 87% /raidstor | | Another question is could I now safely remount with the "inode64" | option | or will this cause problems in the future? I read this below in the | XFS | FAQ but wondered if it have been fixed (backported?) into el6.4? | | ""Starting from kernel 2.6.35, you can try and then switch back. | Older | kernels have a bug leading to strange problems if you mount without | inode64 again. For example, you can't access files & dirs that have | been | created with an inode >32bit anymore."" Changing to inode64 and back is no problem. Keep in mind that inode64 may not work with clients running older operating systems. This bit us when we had a mixture of Solaris 8/9 clients. | I also noted that "xfs_check" ran out of memory and so after some | reading | noted that it is reccommended to use "xfs_repair -n -vv" instead as | it | uses far less memory. One remark is so why is "xfs_check" there at | all? That's because it didn't do anything. Trust me, when you actually go and run xfs_{check,repair} without the -n flag, you're gonna need A LOT of memory. For example a 11TB file system use 24GB of memory for an xfs_repair for a filesystem that held medical imaging data. Good luck! As for why xfs_check is there, there are various reasons for it. For example, it's your go-to program for fixing quota issues, which we've had a couple issues with quotas that xfs_check pointed out so that we could then run xfs_repair. Keep in mind that xfs_check's are not run on unclean shutdowns. The XFS log is merely replayed and you're advised to run xfs_check to validate the file system consistency. | I do have the option of moving the data elsewhere and rebuilding but | this | would cause some problems. Any advice much appreciated. Do you REALLY need it to be a single volume that is so large? -- James A. Peltier Manager, IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier at sfu.ca Website : http://www.sfu.ca/itservices ?A successful person is one who can lay a solid foundation from the bricks others have thrown at them.? -David Brinkley via Luke Shaw