scjody@clusterfs.com
2007-Apr-17 10:45 UTC
[Lustre-devel] [Bug 12181] dirty pages not being flushed in a timely manner
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=12181 This issue is a duplicate of bug 12203, which is a private bug since it contains a lot of information sensitive to CFS''s customers. Here is a summary of the issue: Description: data loss for recently-modified files In some cases it is possible that recently written or created files may not be written to disk in a timely manner (this should normally be within 30s unless client IO load is very high). The problem appears as zero-length files or files that are a multiple of 1MB in size after a client crash or client eviction that are missing data at the end of the file. This problem is more likely to be hit in cases where files are repeatedly created and unlinked in the same directory, clients have a large amount of RAM, have many CPUs, the filesystem has many OSTs, the clients are rebooted frequently, and the files are not accessed by other nodes after being written. It is normal that files written just before a client crash (less than 30s) may not yet have been flushed to disk, even for local filesystems.
pbojanic@clusterfs.com
2007-Apr-23 13:20 UTC
[Lustre-devel] [Bug 12181] dirty pages not being flushed in a timely manner
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=12181 This work-around to the defect is submitted by Nick Cardo of NERSC. In summary, before shutdown, you just do a sync if Lustre reports dirty pages in the cache... #!/bin/ksh cd /proc/fs/lustre/osc for i in `find . -name dirty_cache_pages -print` do cnt=`cat $i` if [ $cnt -ne 0 ] then echo "node has $cnt dirty_cache_pages, sync them" fi done