Kevin L. Buterbaugh
2007-Feb-06 10:42 UTC
[Lustre-discuss] My 1st Lustre filesystem / Number of inodes...
All, OK, I was able to get my first "real" (on a storage array, not a device in /tmp) Lustre filesystem created over the weekend. My mistake was that I was not letting the reformat of the filesystem complete. To the Lustre developers on this list, I have 4 1.25 TB RAID 5 LUNs and it took anywhere from 3 to 6 hours for the format. I can create an ext3 filesystem in one of those LUNs in about an hour. Why is Lustre so slow? May I also suggest that Lustre give some sort of progress update as it works (like every other UNIX / Linux mkfs command I''ve ever used)? Also, with my initial filesystem I only formatted 2 LUNs. I was able to mount my filesystem successfully, but when I ran a 2 client bonnie++ test against it, they both bombed off with a "couldn''t create file" type error message. According to "df -i" the filesystem had less than 10,000 inodes for a ~2.3 TB filesystem! That seems like an incredibly low number of inodes for that size filesystem! So now I''m trying to recreate the filesystems with "--inode_size 4096". After waiting 6 hours, it failed to create the filesystem (even though it appeared to finish normally): [root@lustre3 lustre]# lconf --reformat --node lustre3 config.xml; date loading module: libcfs srcdir None devdir libcfs loading module: lnet srcdir None devdir lnet loading module: ksocklnd srcdir None devdir klnds/socklnd loading module: lvfs srcdir None devdir lvfs loading module: obdclass srcdir None devdir obdclass loading module: ptlrpc srcdir None devdir ptlrpc loading module: ost srcdir None devdir ost loading module: ldiskfs srcdir None devdir ldiskfs loading module: fsfilt_ldiskfs srcdir None devdir lvfs loading module: obdfilter srcdir None devdir obdfilter NETWORK: NET_lustre3_lnet NET_lustre3_lnet_UUID lnet 129.59.197.133@tcp OSD: ost3-test ost3-test_UUID obdfilter /dev/sda 0 ldiskfs no 0 4096 sh: line 1: 30721 Segmentation fault mkfs.ext2 -j -b 4096 -F -J size=400 -I 4096 /dev/sda 2>&1 Unable to build fs: /dev/sda mke2fs 1.35 (28-Feb-2004) Warning: 4096-byte inodes not usable on most systems warning: 10240 blocks unused. Filesystem label OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 153026560 inodes, 306053120 blocks 15303168 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 9340 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Writing inode tables: done Mon Feb 5 17:47:39 CST 2007 [root@lustre3 lustre]# First off, the "-I" option is incorrect (on CentOS 4.4 for x86_64 at least); it should be "-i". So I fixed that and, since I can''t wait 6 hours each time, I tried again using "/tmp/sda" as my device. After running the lconf on the 2 OST''s, I ran it on the MDS. Here''s what I get: [root@lustrem lustre]# lconf --reformat --node lustrem config.xml NETWORK: NET_lustrem_lnet NET_lustrem_lnet_UUID lnet 129.59.197.130@tcp MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs no Unable to build fs: /tmp/mds-test mkfs.ext2: bad inode ratio 512 (min 1024/max 8192 [root@lustrem lustre]# I have searched the Lustre manual for that error and googled - zero hits. Help, please... Kevin -- Kevin L. Buterbaugh Advanced Computing Center for Research & Education - Vanderbilt University www.accre.vanderbilt.edu - (615)343-0288 - klb@accre.vanderbilt.edu
Nathaniel Rutman
2007-Feb-06 12:52 UTC
[Lustre-discuss] My 1st Lustre filesystem / Number of inodes...
Kevin L. Buterbaugh wrote:> All, > > OK, I was able to get my first "real" (on a storage array, not a > device in /tmp) Lustre filesystem created over the weekend. My > mistake was that I was not letting the reformat of the filesystem > complete. To the Lustre developers on this list, I have 4 1.25 TB > RAID 5 LUNs and it took anywhere from 3 to 6 hours for the format. I > can create an ext3 filesystem in one of those LUNs in about an hour. > Why is Lustre so slow? May I also suggest that Lustre give some sort > of progress update as it works (like every other UNIX / Linux mkfs > command I''ve ever used)? >lconf just uses mkfs.ext2, which doesn''t show any progress. You can use "lconf -v" to see the actual commands.> Also, with my initial filesystem I only formatted 2 LUNs. I was able > to mount my filesystem successfully, but when I ran a 2 client > bonnie++ test against it, they both bombed off with a "couldn''t create > file" type error message. According to "df -i" the filesystem had > less than 10,000 inodes for a ~2.3 TB filesystem! That seems like an > incredibly low number of inodes for that size filesystem! > > So now I''m trying to recreate the filesystems with "--inode_size > 4096". After waiting 6 hours, it failed to create the filesystem > (even though it appeared to finish normally): > > [root@lustre3 lustre]# lconf --reformat --node lustre3 config.xml; date > loading module: libcfs srcdir None devdir libcfs > loading module: lnet srcdir None devdir lnet > loading module: ksocklnd srcdir None devdir klnds/socklnd > loading module: lvfs srcdir None devdir lvfs > loading module: obdclass srcdir None devdir obdclass > loading module: ptlrpc srcdir None devdir ptlrpc > loading module: ost srcdir None devdir ost > loading module: ldiskfs srcdir None devdir ldiskfs > loading module: fsfilt_ldiskfs srcdir None devdir lvfs > loading module: obdfilter srcdir None devdir obdfilter > NETWORK: NET_lustre3_lnet NET_lustre3_lnet_UUID lnet 129.59.197.133@tcp > OSD: ost3-test ost3-test_UUID obdfilter /dev/sda 0 ldiskfs no 0 4096 > sh: line 1: 30721 Segmentation fault mkfs.ext2 -j -b 4096 -F -J > size=400 -I 4096 /dev/sda 2>&1Failed here.> Unable to build fs: /dev/sda mke2fs 1.35 (28-Feb-2004) > Warning: 4096-byte inodes not usable on most systems > warning: 10240 blocks unused. > > Filesystem label> OS type: Linux > Block size=4096 (log=2) > Fragment size=4096 (log=2) > 153026560 inodes, 306053120 blocks > 15303168 blocks (5.00%) reserved for the super user > First data block=0 > Maximum filesystem blocks=4294967296 > 9340 block groups > 32768 blocks per group, 32768 fragments per group > 16384 inodes per group > Superblock backups stored on blocks: > 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, > 2654208, > 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, > 78675968, > 102400000, 214990848 > > Writing inode tables: done > Mon Feb 5 17:47:39 CST 2007 > [root@lustre3 lustre]# > > First off, the "-I" option is incorrect (on CentOS 4.4 for x86_64 at > least); it should be "-i". So I fixed that and, since I can''t wait 6 > hours each time, I tried again using "/tmp/sda" as my device. After > running the lconf on the 2 OST''s, I ran it on the MDS. Here''s what I > get: >It''s not incorrect. There are two different options, [ -i bytes-per-inode ] [ -I inode-size ]. The --inode_size lconf option affects the latter; use --mkfsoptions "-i XXXX" to change the former.> [root@lustrem lustre]# lconf --reformat --node lustrem config.xml > NETWORK: NET_lustrem_lnet NET_lustrem_lnet_UUID lnet 129.59.197.130@tcp > MDSDEV: mds-test mds-test_UUID /tmp/mds-test ldiskfs no > Unable to build fs: /tmp/mds-test mkfs.ext2: bad inode ratio 512 (min > 1024/max 8192 > > [root@lustrem lustre]# > > I have searched the Lustre manual for that error and googled - zero > hits. Help, please... > > Kevin >