thr3ads.net - Ocfs2 users - [Ocfs2-users] A Billion Files on OCFS2 -- Best Practices? [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Mark Hampton

2012-Feb-01 13:25 UTC

[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

We have an application that has many processing threads writing more than a
billion files ranging from 2KB ? 50KB, with 50% under 8KB (currently there
are 700 million files).  The files are never deleted or modified ? they are
written once, and read infrequently.  The files are hashed so that they are
evenly distributed across ~1,000,000 subdirectories up to 3 levels deep,
with up to 1000 files per directory.  The directories are structured like
this:



0/00/00

0/00/01

?

F/FF/FE

F/FF/FF



The files need to be readable and writable across a number of servers. The
NetApp filer we purchased for this project has both NFS and iSCSI
capabilities.



We first tried doing this via NFS.  After writing 700 million files (12 TB)
into a single NetApp volume, file-write performance became abysmally slow.
We can't create more than 200 files per second on the NetApp volume, which
is about 20% of our required performance target of 1000 files per second.
It appears that most of the file-write time is going towards stat and
inode-create operations.



So I now I?m trying the same thing with OCFS2 over iSCSI.  I created 16
luns on the NetApp.  The 16 luns became 16 OCFS2 filesystems with 16
different mount points on our servers.



With this configuration I was initially able to write ~1800 files per
second.  Now that I have completed 100 million files, performance has
dropped to ~1500 files per second.



I?m using OEL 6.1 (2.6.32-100 kernel) with OCFS2 version 1.6.  The
application servers have 128GB of memory.  I created my OCFS2 filesystems
as follows:



mkfs.ocfs2 ?T mail ?b 4k ?C 4k ?L <my label> --fs-features=indexed-dirs
?fs-feature-level=max-features /dev/mapper/<my device>



And I mount them with these options:



_netdev,commit=30,noatime,localflocks,localalloc=32



So my questions are these:


1) Given a billion files sized 2KB ? 50KB, with 50% under 8KB, do I have
the optimal OCFS2 filesystem and mount-point configurations?


2) Should I split the files across even more filesystems?  Currently I have
them split across 16 OCFS2 filesystems.



Thanks a billion!
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20120201/a7097f0d/attachment.html

Mark

2012-Feb-01 15:02 UTC

head link

[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

One more thing.  When I straced one of the application processes (these are the 
processes that create the files) I saw this:

% time     seconds  usecs/call    calls errors syscall
------- ----------  ---------- -------- ------ -------
  68.94   3.002017         111    27154        open
  18.93   0.929679           2   418108        read
  12.40   0.543714           2   257548        write

So it seams that inode creation is the biggest time consumer by far.

Sunil Mushran

2012-Feb-01 17:53 UTC

head link

[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

On 02/01/2012 07:02 AM, Mark wrote:> One more thing.  When I straced one of the application processes (these are
the
> processes that create the files) I saw this:
>
> % time     seconds  usecs/call    calls errors syscall
> ------- ----------  ---------- -------- ------ -------
>    68.94   3.002017         111    27154        open
>    18.93   0.929679           2   418108        read
>    12.40   0.543714           2   257548        write
>
> So it seams that inode creation is the biggest time consumer by far.
Yes. open() triggers cluster lock creation which cannot be skipped. 
Reads and writes could skip cluster activity if the node already has the 
appropriate lock level.

Sunil Mushran

2012-Feb-01 18:04 UTC

head link

[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

debugfs.ocfs2 -R "stats" /dev/mapper/...
I want to see the features enabled.

The main issue with large metdata is the fsck timing. The recently 
tagged 1.8 release of the tools has much better fsck performance.

On 02/01/2012 05:25 AM, Mark Hampton wrote:> We have an application that has many processing threads writing more
> than a billion files ranging from 2KB ? 50KB, with 50% under
> 8KB (currently there are 700 million files).  The files are never
> deleted or modified ? they are written once, and read infrequently.  The
> files are hashed so that they are evenly distributed across ~1,000,000
> subdirectories up to 3 levels deep, with up to 1000 files per
> directory.  The directories are structured like this:
>
> 0/00/00
>
> 0/00/01
>
> ?
>
> F/FF/FE
>
> F/FF/FF
>
> The files need to be readable and writable across a number of
> servers. The NetApp filer we purchased for this project has both NFS and
> iSCSI capabilities.
>
> We first tried doing this via NFS.  After writing 700 million files (12
> TB) into a single NetApp volume, file-write performance became abysmally
> slow.  We can't create more than 200 files per second on the NetApp
> volume, which is about 20% of our required performance target of 1000
> files per second.  It appears that most of the file-write time is going
> towards stat and inode-create operations.
>
> So I now I?m trying the same thing with OCFS2 over iSCSI.  I created 16
> luns on the NetApp.  The 16 luns became 16 OCFS2 filesystems with 16
> different mount points on our servers.
>
> With this configuration I was initially able to write ~1800 files per
> second.  Now that I have completed 100 million files, performance has
> dropped to ~1500 files per second.
>
> I?m using OEL 6.1 (2.6.32-100 kernel) with OCFS2 version 1.6.  The
> application servers have 128GB of memory.  I created my OCFS2
> filesystems as follows:
>
> mkfs.ocfs2 ?T mail ?b 4k ?C 4k ?L <my label>
--fs-features=indexed-dirs
> ?fs-feature-level=max-features /dev/mapper/<my device>
>
> And I mount them with these options:
>
> _netdev,commit=30,noatime,localflocks,localalloc=32
>
> So my questions are these:
>
>
> 1) Given a billion files sized 2KB ? 50KB, with 50% under 8KB, do I have
> the optimal OCFS2 filesystem and mount-point configurations?
>
>
> 2) Should I split the files across even more filesystems?  Currently I
> have them split across 16 OCFS2 filesystems.
>
> Thanks a billion!

Seemingly Similar Threads

Search for more reasonably related threads

Ocfs2 users - Feb 2012 - A Billion Files on OCFS2 -- Best Practices?

[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

[Ocfs2-users] A Billion Files on OCFS2 -- Best Practices?

Seemingly Similar Threads