Hi, I am doing research in parallel computing and our cluster is using Lustre. I currently found that if one OST is down, I can not read any part of the file, regardless of the part is or is not located at the failed OST. One extreme example is when I have 3 OST''s: ost1, ost2 and ost3, and one file f is stored on those 3 nodes, beginning with ost1(I used #lfs setstripe /mnt/lustre 65536 0 3). When I tried to read the very first character from f with the fgetc(f), I found that my Lustre client still tries to read from all the 3 OST''s. And when I shutdown ost3, the fgetc(f) can not be finished(program halts). So is it a normal thing or I misconfigured Lustre? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-devel/attachments/20061107/3275cbb2/attachment.html
On Nov 07, 2006 07:49 -0800, Zhe Zhang wrote:> I am doing research in parallel computing and our cluster is using Lustre. I currently found that if one OST is down, I can not read any part of the file, regardless of the part is or is not located at the failed OST. > > One extreme example is when I have 3 OST''s: ost1, ost2 and ost3, and one file f is stored on those 3 nodes, beginning with ost1(I used #lfs setstripe /mnt/lustre 65536 0 3). When I tried to read the very first character from f with the fgetc(f), I found that my Lustre client still tries to read from all the 3 OST''s. And when I shutdown ost3, the fgetc(f) can not be finished(program halts). > > > So is it a normal thing or I misconfigured Lustre? Thanks!If you want to allow partial file access then you need to enable "failout" of the OSTs. Otherwise Lustre will block the access to the file until the OST is restored, to avoid application errors during failover. This can be done always by adding "--failout" to the OST definition, or at runtime if you know an OST is down for a long time and will not be recovered by: lctl --device {OST device number} deactivate on the MDS and all of the clients (note the specific OST device number will be different between the MDS and the clients). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Nov 07, 2006 07:49 -0800, Zhe Zhang wrote:> I am doing research in parallel computing and our cluster is using Lustre. I currently found that if one OST is down, I can not read any part of the file, regardless of the part is or is not located at the failed OST. > > One extreme example is when I have 3 OST''s: ost1, ost2 and ost3, and one file f is stored on those 3 nodes, beginning with ost1(I used #lfs setstripe /mnt/lustre 65536 0 3). When I tried to read the very first character from f with the fgetc(f), I found that my Lustre client still tries to read from all the 3 OST''s. And when I shutdown ost3, the fgetc(f) can not be finished(program halts). > > > So is it a normal thing or I misconfigured Lustre? Thanks!>If you want to allow partial file access then you need to enable "failout" >of the OSTs. Otherwise Lustre will block the access to the file until the >OST is restored, to avoid application errors during failover. This can be >done always by adding "--failout" to the OST definition, or at runtime if >you know an OST is down for a long time and will not be recovered by:> lctl --device {OST device number} deactivate>on the MDS and all of the clients (note the specific OST device number will >be different between the MDS and the clients).>Cheers, Andreas >-- >Andreas Dilger >Principal Software Engineer >Cluster File Systems, Inc.Hi Andreas, Thanks very much for your reply. In the case of OST failure, if we use the method you indicated (deactivate the failing OST), is there a way to recover the data on another node? I once learned that there is a way to set up mirror OSTs where each file is stored twice, and when one OST fails the client goes to the other. I am quite interested in this functionality. Could you tell me how to adjust the configuration to achive this? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.clusterfs.com/pipermail/lustre-devel/attachments/20061108/c3e136c5/attachment.html