Ruyue Ma
2007-Sep-06 13:15 UTC
[Lustre-discuss] Failures While Running a Client and an OST on the Same Node
There is a paragraph on page 194 of the operations manual 1.6 v17: While running a client and an OST on the same machine, the following failures can occur: --- If the client contains a dirty file system in memory and memory pressure, a kernel thread flushes dirty pages to the file system, and it writes to a local OST. To complete the write, the OST needs to do an allocation. Then the blocking of allocation occurs while waiting for the above kernel thread to complete the write process and free up some memory. This is a deadlock condition. ---- If the node with both a client and OST crashes, then the OST waits for the mounted client on that node to recover. However, since the client is now in crashed state, the OST considers it to be a new client and blocks it from mounting until the recovery completes. As a result, running OST and client on same machine can cause a double failure and prevent a complete recovery. I want to know how frequently this double failure occurs? You know? this is very important for me? -- Best Regards? Ruyue Ma -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070906/ae9abf37/attachment.html
Andreas Dilger
2007-Sep-07 21:44 UTC
[Lustre-discuss] Failures While Running a Client and an OST on the Same Node
On Sep 06, 2007 21:15 +0800, Ruyue Ma wrote:> While running a client and an OST on the same machine, the following > failures can occur: > --- If the client contains a dirty file system in memory and memory > pressure, a kernel thread flushes > dirty pages to the file system, and it writes to a local OST. To complete > the write, the OST needs to > do an allocation. Then the blocking of allocation occurs while waiting for > the above kernel thread to > complete the write process and free up some memory. This is a deadlock > condition.This depends on load. We do a lot of simple testing with client-on-OST, but it does very occasionally hang if the application is dirtying a lot of data.> ---- If the node with both a client and OST crashes, then the OST waits > for the mounted client on that > node to recover. However, since the client is now in crashed state, the OST > considers it to be a new > client and blocks it from mounting until the recovery completes. > > As a result, running OST and client on same machine can cause a double > failure and prevent a complete > recovery.This will prevent recovery every time an client/OST crashes. If you don''t care about recovery (e.g. app was running on client also, so no recovery is possible in any case) then you can also live with this. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.