scjody@clusterfs.com
2007-Apr-17 10:45 UTC
[Lustre-devel] [Bug 12181] dirty pages not being flushed in a timely manner
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=12181
This issue is a duplicate of bug 12203, which is a private bug since it contains
a lot of information sensitive to CFS''s customers. Here is a summary
of the issue:
Description: data loss for recently-modified files
In some cases it is possible that recently written or created files
may not be written to disk in a timely manner (this should normally be
within 30s unless client IO load is very high). The problem appears as
zero-length files or files that are a multiple of 1MB in size after a
client crash or client eviction that are missing data at the end of
the file.
This problem is more likely to be hit in cases where files are
repeatedly created and unlinked in the same directory, clients have a
large amount of RAM, have many CPUs, the filesystem has many OSTs, the
clients are rebooted frequently, and the files are not accessed by
other nodes after being written.
It is normal that files written just before a client crash (less than
30s) may not yet have been flushed to disk, even for local
filesystems.
pbojanic@clusterfs.com
2007-Apr-23 13:20 UTC
[Lustre-devel] [Bug 12181] dirty pages not being flushed in a timely manner
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=12181
This work-around to the defect is submitted by Nick Cardo of NERSC. In summary,
before shutdown, you just do a sync if Lustre reports dirty pages in the
cache...
#!/bin/ksh
cd /proc/fs/lustre/osc
for i in `find . -name dirty_cache_pages -print`
do
cnt=`cat $i`
if [ $cnt -ne 0 ]
then
echo "node has $cnt dirty_cache_pages, sync them"
fi
done