I''ve just gotten done recovering from a wierd problem on our test system, and wondered if anybody else has seen this kind of thing. Lustre 1.6b5 Kernel vanilla (more or less) 2.6.15 Opteron smp clients and servers The clients boot diskless, using lustre as their rootfs. The servers have over the last few weeks been bounced numerous times due to me playing around with hardware while evaluating interface cards and stuff. Some of the time I didn''t restart the clients when rebooting the servers, either because I forgot or because something crashed unexpectedly. Up until today, it seemed like everything was behaving itself pretty well. The clients didn''t work, of course, when their rootfs was down, but when the servers came back up, the clients seemed to reconnect and carry on, so I didn''t worry overly much about it. Today I tripped over evidence that the rootfs was corrupted, and that the corruption wasn''t limited to in-memory structures, it was on the disks. I rebuilt the fs (ie reformatted all the osts and things), installed a new rootfs, and then discovered that it was *still* corrupted. Looking at server logs, there were a number of errors from one oss where one of the old clients had been trying to reconnect to it and replay transactions. So: Is it an incredibly bad idea to allow an old stale client to try to reconnect to a freshly-reconstituted server? I had the impression that lustre had sufficient protocol in place to avoid that kind of skewage causing problems, but if that''s not the case, it would certainly account for the lossage. If that is supposed to be safe, I guess this means I''ve probably found a bug, and should try to characterize it further.