Aleksandar Milivojevic
2005-Mar-18 17:05 UTC
[Centos] possible data corruption when NFS is used
Couple of days ago, I experienced nasty problem when doing mmap of file located on NFS mounted partition (from Solaris 9 server). The problem manifests itself as data corruption. I''ve notified folks at Red Hat on their bugzilla (after all, the kernel is build from their source), and I tought of sharing it with folks here. The problem manifests itself like this. Create empty file using open64() system call. write a single byte at some position in the file (basically this will allocate a single block at the end of the file, with rest of the file empty). In my test I wrote a single byte at 100KB offset using pwrite64() system call. Use mmap() call to map entire file into the memory. Use memset() library function to fill entire mmaped region with some pattern. Do unmmap() on the file, and close() the file. What happens is that on the Linux NFS client, if you do "less filename", you''ll see the file correctly filled with the pattern. On Solaris 9 NFS server, doing "less filename" will show that file is empty. NFS allows for 30 seconds gap before the changes are flushed from client to the server (in reality, most NFS clients do not wait and will attempt to flush the changes to the server shortly after they are made). However, this never happens, the changes are never sent to the NFS server (they stay cached on the client side forever). When client is rebooted, changes are lost. Doing "du -sk filename" on both client and server produces same results, the output indicates that the file is sparse. This shows inconsistency on the client (less shows that file is filled with pattern, so it can''t be sparse, du -sk shows size that indicates that the file is sparse). The longes I waited for the client to send updated file blocks to the NFS server was something like half an hour. So there is possibility that changes would get flushed eventually in several hours (or days) when kernel attempts to free pages used to hold cached copy (haven''t tested that scenario). If the file is updated using write() or pwrite64() system calls (instead of mmap()/memset()/munmap() combo), the file is updated on the NFS server almost instantly. I am able to reproduce it "every time" on CentOS4 as NFS client, and Solaris 9 as NFS server. Haven''t tried out other combinations. RHEL4 as NFS client should have same problem, and possible other Linux distributions (Fedora comes to mind as most likely candidate, becasue of its close connection to RHEL4). I also have a small app that demonstrates the problem (that should be labeled as "one of the most stupid uses of mmap", basically implementation of Solaris mkfile command). If anybody experienced hard to explain data corruptions on NFS mounted file systems, this might be the reason behind it. -- Aleksandar Milivojevic <amilivojevic@pbl.ca> Pollard Banknote Limited Systems Administrator 1499 Buffalo Place Tel: (204) 474-2323 ext 276 Winnipeg, MB R3T 1L7