Niklas Edmundsson
2007-Sep-27 08:53 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
While doing some more file creation tests we hit another interesting bug. We have 13701222 files in one directory, and creating more files there fails even though we have lots of free inodes in the filesystem. On MDS we get this in the kernel log: [692361.061558] LDISKFS-fs warning (device sdb2): ldiskfs_dx_add_entry: Directory index full! On client: # lfs df -i UUID Inodes IUsed IFree IUse% Mounted on hpfs-MDT0000_UUID 35651584 13967459 21684125 39% /hpfs[MDT:0] hpfs-OST0000_UUID 76292096 3486442 72805654 4% /hpfs[OST:0] hpfs-OST0001_UUID 76292096 3484250 72807846 4% /hpfs[OST:1] hpfs-OST0002_UUID 76292096 3486534 72805562 4% /hpfs[OST:2] hpfs-OST0003_UUID 76292096 3486889 72805207 4% /hpfs[OST:3] filesystem summary: 35651584 13967459 21684125 39% /hpfs /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se --------------------------------------------------------------------------- If rabbits feet are lucky, what happened to the bunny =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Andreas Dilger
2007-Sep-27 09:05 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Sep 27, 2007 10:53 +0200, Niklas Edmundsson wrote:> We have 13701222 files in one directory, and creating more files there > fails even though we have lots of free inodes in the filesystem.We generally only test up to 10M files in a single directory. If you had a perfect hash distribution you might be able to get to 25M files in the directory. I suspect if you run "e2fsck -fD" it might help you (its job is to pack the directories), but I suspect it has never been run on such a large directory and might conceivably use too much RAM to run on the system.> On MDS we get this in the kernel log: > [692361.061558] LDISKFS-fs warning (device sdb2): ldiskfs_dx_add_entry: Directory index full!This is one of the reasons we are moving to ZFS for the back-end storage. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Robin Humble
2007-Sep-27 12:48 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Thu, Sep 27, 2007 at 10:53:26AM +0200, Niklas Edmundsson wrote:>We have 13701222 files in one directory, and creating more files there >fails even though we have lots of free inodes in the filesystem.performance on ext3 is dismal with >1m files in a dir http://oss.oracle.com/projects/btrfs/dist/documentation/benchmark.html so I suspect lustre''s ext3+/ext4 ldiskfs''s are similarly not suited to this workload either. so err, just don''t do it? :-) cheers, robin
Niklas Edmundsson
2007-Sep-27 13:49 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Thu, 27 Sep 2007, Robin Humble wrote:> On Thu, Sep 27, 2007 at 10:53:26AM +0200, Niklas Edmundsson wrote: >> We have 13701222 files in one directory, and creating more files there >> fails even though we have lots of free inodes in the filesystem. > > performance on ext3 is dismal with >1m files in a dir > http://oss.oracle.com/projects/btrfs/dist/documentation/benchmark.html > so I suspect lustre''s ext3+/ext4 ldiskfs''s are similarly not suited to > this workload either.Is this with dir_index enabled? Without dir_index you''ll certainly loose ;)> so err, just don''t do it? :-)Don''t worry, I won''t. However, some bright user might, and thus we needs to know whether lustre will behave nicely or emit some wonderfully cryptic error message and go belly up. With 13M files in one directory it''s surprisingly snappy though, 1m17s stat:ing all those files isn''t too bad considering we''re only using regular GigE. /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se --------------------------------------------------------------------------- Good girls go to heaven..bad girls go EVERYWHERE!!! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Niklas Edmundsson
2007-Sep-27 14:10 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Thu, 27 Sep 2007, Andreas Dilger wrote:> On Sep 27, 2007 10:53 +0200, Niklas Edmundsson wrote: >> We have 13701222 files in one directory, and creating more files there >> fails even though we have lots of free inodes in the filesystem. > > We generally only test up to 10M files in a single directory. If you had > a perfect hash distribution you might be able to get to 25M files in the > directory.I suspected something like that. The error message needs some help though, I''d prefer having the directory in question in there if possible, or at least some hint on where to look.>> On MDS we get this in the kernel log: >> [692361.061558] LDISKFS-fs warning (device sdb2): ldiskfs_dx_add_entry: Directory index full! > > This is one of the reasons we are moving to ZFS for the back-end storage.Any idea on the timeframe? /Nikke -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | nikke at hpc2n.umu.se --------------------------------------------------------------------------- Good girls go to heaven..bad girls go EVERYWHERE!!! =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Alastair McKinstry
2007-Sep-27 14:15 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On 27 Sep 2007, at 15:10, Niklas Edmundsson wrote:> On Thu, 27 Sep 2007, Andreas Dilger wrote: > >>> On MDS we get this in the kernel log: >>> [692361.061558] LDISKFS-fs warning (device sdb2): >>> ldiskfs_dx_add_entry: Directory index full! >> >> This is one of the reasons we are moving to ZFS for the back-end >> storage. > >Is ZFS open-source ? What implications does this have for the open- source nature of Lustre? Regards Alastair McKinstry -- Alastair McKinstry , <alastair.mckinstry at ichec.ie> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20070927/5134bcc0/attachment-0002.html
Andreas Dilger
2007-Sep-27 20:09 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Sep 27, 2007 16:10 +0200, Niklas Edmundsson wrote:> On Thu, 27 Sep 2007, Andreas Dilger wrote: > >On Sep 27, 2007 10:53 +0200, Niklas Edmundsson wrote: > >>We have 13701222 files in one directory, and creating more files there > >>fails even though we have lots of free inodes in the filesystem. > > > >We generally only test up to 10M files in a single directory. If you had > >a perfect hash distribution you might be able to get to 25M files in the > >directory. > > I suspected something like that. The error message needs some help > though, I''d prefer having the directory in question in there if > possible, or at least some hint on where to look. > > >>On MDS we get this in the kernel log: > >>[692361.061558] LDISKFS-fs warning (device sdb2): ldiskfs_dx_add_entry: > >>Directory index full! > > > >This is one of the reasons we are moving to ZFS for the back-end storage. > > Any idea on the timeframe?The 1.8 release (planned Q1 2008) will allow new filesystems to be created with ZFS OSTs and MDTs. We will require ZFS for clustered MDTs in 2.0 (and possibly all backing stores, though it should of course be possible to still use 1.8 OSTs if required). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Andreas Dilger
2007-Sep-27 20:10 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Sep 27, 2007 15:15 +0100, Alastair McKinstry wrote:> On 27 Sep 2007, at 15:10, Niklas Edmundsson wrote: > >On Thu, 27 Sep 2007, Andreas Dilger wrote: > > > >>>On MDS we get this in the kernel log: > >>>[692361.061558] LDISKFS-fs warning (device sdb2): > >>>ldiskfs_dx_add_entry: Directory index full! > >> > >>This is one of the reasons we are moving to ZFS for the back-end > >>storage. > > Is ZFS open-source ? What implications does this have for the open- > source nature of Lustre?Yes, ZFS is open source (CDDL is an OSS-approved license), just not GPL. There are no implications, Lustre will still be open source. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Kevin Fox
2007-Sep-27 20:31 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
How do you plan on mixing ZFS and Lustre code though? I thought CDDL code and GPL code were incompatible (That''s why its not crammed into the kernel proper). Are you looking at having separate processes between the ZFS code and Lustre or are you going to re-license Lustre? Either one seems bad... Kevin On Thu, 2007-09-27 at 14:10 -0600, Andreas Dilger wrote:> On Sep 27, 2007 15:15 +0100, Alastair McKinstry wrote: > > On 27 Sep 2007, at 15:10, Niklas Edmundsson wrote: > > >On Thu, 27 Sep 2007, Andreas Dilger wrote: > > > > > >>>On MDS we get this in the kernel log: > > >>>[692361.061558] LDISKFS-fs warning (device sdb2): > > >>>ldiskfs_dx_add_entry: Directory index full! > > >> > > >>This is one of the reasons we are moving to ZFS for the back-end > > >>storage. > > > > Is ZFS open-source ? What implications does this have for the open- > > source nature of Lustre? > > Yes, ZFS is open source (CDDL is an OSS-approved license), just not GPL. > There are no implications, Lustre will still be open source. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Kevin Fox
2007-Sep-28 16:47 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
Is there a reason clustered metadata must be ZFS only? Kevin On Thu, 2007-09-27 at 14:09 -0600, Andreas Dilger wrote:> On Sep 27, 2007 16:10 +0200, Niklas Edmundsson wrote: > > On Thu, 27 Sep 2007, Andreas Dilger wrote: > > >On Sep 27, 2007 10:53 +0200, Niklas Edmundsson wrote: > > >>We have 13701222 files in one directory, and creating more files there > > >>fails even though we have lots of free inodes in the filesystem. > > > > > >We generally only test up to 10M files in a single directory. If you had > > >a perfect hash distribution you might be able to get to 25M files in the > > >directory. > > > > I suspected something like that. The error message needs some help > > though, I''d prefer having the directory in question in there if > > possible, or at least some hint on where to look. > > > > >>On MDS we get this in the kernel log: > > >>[692361.061558] LDISKFS-fs warning (device sdb2): ldiskfs_dx_add_entry: > > >>Directory index full! > > > > > >This is one of the reasons we are moving to ZFS for the back-end storage. > > > > Any idea on the timeframe? > > The 1.8 release (planned Q1 2008) will allow new filesystems to be created > with ZFS OSTs and MDTs. We will require ZFS for clustered MDTs in 2.0 > (and possibly all backing stores, though it should of course be possible > to still use 1.8 OSTs if required). > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Andreas Dilger
2007-Sep-28 17:29 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Sep 28, 2007 09:47 -0700, Kevin Fox wrote:> Is there a reason clustered metadata must be ZFS only?Because it requires changes to the on-disk directory format of ext3 in a way that isn''t likely to ever make it into upstream ext4 and we don''t want to support that indefinitely (adds a pointer to a remote MDS for an inode and stores 128-bit file identifiers). Also, we need mechanism to do name->value lookups in the filesystem (also doesn''t exist in ext3). Both of these are possible with ZFS.> On Thu, 2007-09-27 at 14:09 -0600, Andreas Dilger wrote: > > On Sep 27, 2007 16:10 +0200, Niklas Edmundsson wrote: > > > On Thu, 27 Sep 2007, Andreas Dilger wrote: > > > >On Sep 27, 2007 10:53 +0200, Niklas Edmundsson wrote: > > > >>We have 13701222 files in one directory, and creating more files there > > > >>fails even though we have lots of free inodes in the filesystem. > > > > > > > >We generally only test up to 10M files in a single directory. If you had > > > >a perfect hash distribution you might be able to get to 25M files in the > > > >directory. > > > > > > I suspected something like that. The error message needs some help > > > though, I''d prefer having the directory in question in there if > > > possible, or at least some hint on where to look. > > > > > > >>On MDS we get this in the kernel log: > > > >>[692361.061558] LDISKFS-fs warning (device sdb2): ldiskfs_dx_add_entry: > > > >>Directory index full! > > > > > > > >This is one of the reasons we are moving to ZFS for the back-end storage. > > > > > > Any idea on the timeframe? > > > > The 1.8 release (planned Q1 2008) will allow new filesystems to be created > > with ZFS OSTs and MDTs. We will require ZFS for clustered MDTs in 2.0 > > (and possibly all backing stores, though it should of course be possible > > to still use 1.8 OSTs if required).Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Erich Focht
2007-Oct-01 12:33 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Friday 28 September 2007 19:29, Andreas Dilger wrote:> On Sep 28, 2007 09:47 -0700, Kevin Fox wrote: > > Is there a reason clustered metadata must be ZFS only? > > Because it requires changes to the on-disk directory format of ext3 in > a way that isn''t likely to ever make it into upstream ext4 and we don''t > want to support that indefinitely (adds a pointer to a remote MDS for an > inode and stores 128-bit file identifiers). Also, we need mechanism to > do name->value lookups in the filesystem (also doesn''t exist in ext3). > Both of these are possible with ZFS.Does this mean Lustre servers will some day run on Solaris, only? Can''t imagine you aim at ZFS-FUSE on Linux... Regards, Erich
Andreas Dilger
2007-Oct-02 05:14 UTC
[Lustre-discuss] Lustre 1.6.2 + 2.6.18 - Directory index full!
On Oct 01, 2007 14:33 +0200, Erich Focht wrote:> On Friday 28 September 2007 19:29, Andreas Dilger wrote: > > Because it requires changes to the on-disk directory format of ext3 in > > a way that isn''t likely to ever make it into upstream ext4 and we don''t > > want to support that indefinitely (adds a pointer to a remote MDS for an > > inode and stores 128-bit file identifiers). Also, we need mechanism to > > do name->value lookups in the filesystem (also doesn''t exist in ext3). > > Both of these are possible with ZFS. > > Does this mean Lustre servers will some day run on Solaris, only? Can''t > imagine you aim at ZFS-FUSE on Linux...No, not at all. The Lustre servers (both OST and MDT) are moving to userspace and will interact with the DMU component of ZFS directly. The DMU, in turn, will do O_DIRECT (zero copy) IO to the underlying disk devices. Since the DMU implements 95% of the functionality of ZFS and has been designed to run in userspace as well as in the Solaris kernel, we get the benefits of ZFS without any kind of kernel module (license) headaches. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.