Joan J. Piles
2011-Mar-15 15:02 UTC
[Lustre-discuss] Problem with lustre 2.0.0.1, ext3/4 and big OSTs (>8Tb)
Hi, We are trying to set up a lustre 2.0.0.1 (the most recent one downladable from the offiecial site) installation. We plan to have some big OSTs (~ 12Tb), using ScientificLinux 5.5 (which should be a RHEL clone for all purposes). However, when we try to format the OSTs, we get the following error:> [root at oss01 ~]# mkfs.lustre --ost --fsname=extra > --mgsnode=172.16.4.4 at tcp0 --mkfsoptions ''-i 262144 -E > stride=32,stripe_width=192 '' /dev/sde > > Permanent disk data: > Target: extra-OSTffff > Index: unassigned > Lustre FS: extra > Mount type: ldiskfs > Flags: 0x72 > (OST needs_index first_time update ) > Persistent mount opts: errors=remount-ro,extents,mballoc > Parameters: mgsnode=172.16.4.4 at tcp > > checking for existing Lustre data: not found > device size = 11427830MB > formatting backing filesystem ldiskfs on /dev/sde > target name extra-OSTffff > 4k blocks 2925524480 > options -i 262144 -E stride=32,stripe_width=192 -J size=400 > -I 256 -q -O dir_index,extents,uninit_bg -F > mkfs_cmd = mke2fs -j -b 4096 -L extra-OSTffff -i 262144 -E > stride=32,stripe_width=192 -J size=400 -I 256 -q -O > dir_index,extents,uninit_bg -F /dev/sde 2925524480 > mkfs.lustre: Unable to mount /dev/sde: Invalid argument > > mkfs.lustre FATAL: failed to write local files > mkfs.lustre: exiting with 22 (Invalid argument)In the dmesg log, we find the following line:> LDISKFS-fs does not support filesystems greater than 8TB and can cause > data corruption.Use "force_over_8tb" mount option to override.After some investigation, we find it is related to the use of ext3 instead of ext4, even though we should be using ext4, proven by the fact that the file systems created are actually ext4:> [root at oss01 ~]# file -s /dev/sde > /dev/sde: Linux rev 1.0 ext4 filesystem data (extents) (large files)Further, we made a test with an ext3 filesystem in the same machine, and the difference is found:> [root at oss01 ~]# file -s /dev/sda1 > /dev/sda1: Linux rev 1.0 ext3 filesystem data (large files)Everything we found in the net about this problem seems to refer to lustre 1.8.5. However, we would not expect such a regression in lustre 2. Is this actually a problem with lustre 2? Has ext4 to be enabled either at compile time or with a parameter somewhere (we found no documentation about it)? Greetings and thanks, -- -------------------------------------------------------------------------- Joan Josep Piles Contreras - Analista de sistemas I3A - Instituto de Investigaci?n en Ingenier?a de Arag?n Tel: 976 76 10 00 (ext. 5454) http://i3a.unizar.es -- jpiles at unizar.es --------------------------------------------------------------------------
Kevin Van Maren
2011-Mar-15 15:22 UTC
[Lustre-discuss] Problem with lustre 2.0.0.1, ext3/4 and big OSTs (>8Tb)
Joan J. Piles wrote:> Hi, > > We are trying to set up a lustre 2.0.0.1 (the most recent one > downladable from the offiecial site) installation. We plan to have some > big OSTs (~ 12Tb), using ScientificLinux 5.5 (which should be a RHEL > clone for all purposes). > > However, when we try to format the OSTs, we get the following error: > > >> [root at oss01 ~]# mkfs.lustre --ost --fsname=extra >> --mgsnode=172.16.4.4 at tcp0 --mkfsoptions ''-i 262144 -E >> stride=32,stripe_width=192 '' /dev/sde >> >> Permanent disk data: >> Target: extra-OSTffff >> Index: unassigned >> Lustre FS: extra >> Mount type: ldiskfs >> Flags: 0x72 >> (OST needs_index first_time update ) >> Persistent mount opts: errors=remount-ro,extents,mballoc >> Parameters: mgsnode=172.16.4.4 at tcp >> >> checking for existing Lustre data: not found >> device size = 11427830MB >> formatting backing filesystem ldiskfs on /dev/sde >> target name extra-OSTffff >> 4k blocks 2925524480 >> options -i 262144 -E stride=32,stripe_width=192 -J size=400 >> -I 256 -q -O dir_index,extents,uninit_bg -F >> mkfs_cmd = mke2fs -j -b 4096 -L extra-OSTffff -i 262144 -E >> stride=32,stripe_width=192 -J size=400 -I 256 -q -O >> dir_index,extents,uninit_bg -F /dev/sde 2925524480 >> mkfs.lustre: Unable to mount /dev/sde: Invalid argument >> >> mkfs.lustre FATAL: failed to write local files >> mkfs.lustre: exiting with 22 (Invalid argument) >> > > > In the dmesg log, we find the following line: > > >> LDISKFS-fs does not support filesystems greater than 8TB and can cause >> data corruption.Use "force_over_8tb" mount option to override. >> > > After some investigation, we find it is related to the use of ext3 > instead of ext4,Correct.> even though we should be using ext4, proven by the fact > that the file systems created are actually ext4: > > >> [root at oss01 ~]# file -s /dev/sde >> /dev/sde: Linux rev 1.0 ext4 filesystem data (extents) (large files) >>No, these are "ldiskfs" filesystems. ext3+ldiskfs looks a bit like ext4 (ext4 is largely based on the enhancements done for Lustre''s ldiskfs), but is not the same as ext4+ldiskfs. In particular, file system size is limited to 8TB, not 16TB.> Further, we made a test with an ext3 filesystem in the same machine, and > the difference is found: > > >> [root at oss01 ~]# file -s /dev/sda1 >> /dev/sda1: Linux rev 1.0 ext3 filesystem data (large files) >> > > Everything we found in the net about this problem seems to refer to > lustre 1.8.5. However, we would not expect such a regression in lustre > 2. Is this actually a problem with lustre 2? Has ext4 to be enabled > either at compile time or with a parameter somewhere (we found no > documentation about it)? >Lustre 2.0 did not enable ext4 by default, due to known issues. You can rebuild the Lustre server, with "--enable-ext4" on the configure line, to enable it. But if you are going to use 12TB LUNs, you should either sick with v1.8.5 (stable), or pull a newer version from git (experimental). Kevin
Joan J. Piles
2011-Mar-15 16:36 UTC
[Lustre-discuss] Problem with lustre 2.0.0.1, ext3/4 and big OSTs (>8Tb)
We have tried recompiling ldiskfs with ext4 enabled, and so far it seems to create the file systems without any further problem. The only known issue we found is in the Release Notes:> Enabling ext 4 allows LUNs larger than 8 TB to be used in the Lustre > file system. > When ext4 is enabled, by default, in a system at scale, servers become > overloaded > (cause unknown). This results in clients timing out and attempting to > reconnect, > an action which the server does not accept. Eventually, the server > evicts the client > due to a lock timeout. > Workaround: Do not enable ext4 in Lustre 2.0.0.What number of clients "a system at scale" means? We are expecting to have at most 1500 processes in 150 nodes accessing the filesystem. Is this big enough to trigger the issue? Since is is going to be production system, using an experimental version is out of question. Should we sitck to 1.8 and forget about 2.0? Shall there be soon a 2.0.0.x release adressing these issues? Thanks, El 15/03/11 16:22, Kevin Van Maren escribi?:> > Lustre 2.0 did not enable ext4 by default, due to known issues. You > can rebuild the Lustre server, > with "--enable-ext4" on the configure line, to enable it. But if you > are going to use 12TB LUNs, > you should either sick with v1.8.5 (stable), or pull a newer version > from git (experimental). > > Kevin > >-- -------------------------------------------------------------------------- Joan Josep Piles Contreras - Analista de sistemas I3A - Instituto de Investigaci?n en Ingenier?a de Arag?n Tel: 976 76 10 00 (ext. 5454) http://i3a.unizar.es -- jpiles at unizar.es --------------------------------------------------------------------------