Ruyue Ma
2007-Sep-06 13:15 UTC
[Lustre-discuss] Failures While Running a Client and an OST on the Same Node
There is a paragraph on page 194 of the operations manual 1.6 v17:
While running a client and an OST on the same machine, the following
failures can occur:
--- If the client contains a dirty file system in memory and memory
pressure, a kernel thread flushes
dirty pages to the file system, and it writes to a local OST. To complete
the write, the OST needs to
do an allocation. Then the blocking of allocation occurs while waiting for
the above kernel thread to
complete the write process and free up some memory. This is a deadlock
condition.
---- If the node with both a client and OST crashes, then the OST waits
for the mounted client on that
node to recover. However, since the client is now in crashed state, the OST
considers it to be a new
client and blocks it from mounting until the recovery completes.
As a result, running OST and client on same machine can cause a double
failure and prevent a complete
recovery.
I want to know how frequently this double failure occurs? You know? this
is very important for me?
--
Best Regards?
Ruyue Ma
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://mail.clusterfs.com/pipermail/lustre-discuss/attachments/20070906/ae9abf37/attachment.html
Andreas Dilger
2007-Sep-07 21:44 UTC
[Lustre-discuss] Failures While Running a Client and an OST on the Same Node
On Sep 06, 2007 21:15 +0800, Ruyue Ma wrote:> While running a client and an OST on the same machine, the following > failures can occur: > --- If the client contains a dirty file system in memory and memory > pressure, a kernel thread flushes > dirty pages to the file system, and it writes to a local OST. To complete > the write, the OST needs to > do an allocation. Then the blocking of allocation occurs while waiting for > the above kernel thread to > complete the write process and free up some memory. This is a deadlock > condition.This depends on load. We do a lot of simple testing with client-on-OST, but it does very occasionally hang if the application is dirtying a lot of data.> ---- If the node with both a client and OST crashes, then the OST waits > for the mounted client on that > node to recover. However, since the client is now in crashed state, the OST > considers it to be a new > client and blocks it from mounting until the recovery completes. > > As a result, running OST and client on same machine can cause a double > failure and prevent a complete > recovery.This will prevent recovery every time an client/OST crashes. If you don''t care about recovery (e.g. app was running on client also, so no recovery is possible in any case) then you can also live with this. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.