Thomas Roth
2008-Mar-03 17:12 UTC
[Lustre-discuss] Quota setup fails because of OST ordering
Hi all, after installing a Lustre test file system consisting of 34 OSTs, I encountered a strange error when trying to set up quotas: lfs quotacheck gave me an "Input/Output error", while in /var/log/kern.log I found a Lustre error LustreError: 20807:0:(quota_check.c:227:lov_quota_check()) lov idx 32 inactive Indeed, in /proc/fs/lustre/lov/.../target_obd all 34 OSTs were listed and numbered, but number 32 was missing, instead I had a number 34. Now, in this test cluster each OSS serves two OSTs. I remember I had made a mistake with the RAID configuration of about half of the OSTs. Therefore I had mounted one OST from each OSS, while the others were still formatting. Consequently, after mounting all OSTs, their OST-Names or OST-indices were not in OSS machine order, like machine 5 now providing OST0008 and OST001a. I''d like to hear from the experts that this cannot possibly have any influence on the functionality of a Lustre cluster. That still leaves the question how I managed to produce this gap in the OST numbering and why the quota setup should stumble over this. In the meantime I reformatted all servers, being very careful with the OSTs, giving the MDT some time to stomach any new OST I mounted. Now I have the OSTs in "correct" order, no gap in the indices, and lfs quotacheck doesn''t make trouble ;-| Regards, Thomas
Andreas Dilger
2008-Mar-03 18:56 UTC
[Lustre-discuss] Quota setup fails because of OST ordering
On Mar 03, 2008 18:12 +0100, Thomas Roth wrote:> after installing a Lustre test file system consisting of 34 OSTs, I > encountered a strange error when trying to set up quotas: > lfs quotacheck gave me an "Input/Output error", while in > /var/log/kern.log I found a Lustre error > > LustreError: 20807:0:(quota_check.c:227:lov_quota_check()) lov idx 32 > inactive > > > Indeed, in /proc/fs/lustre/lov/.../target_obd all 34 OSTs were listed > and numbered, but number 32 was missing, instead I had a number 34. > Now, in this test cluster each OSS serves two OSTs. I remember I had > made a mistake with the RAID configuration of about half of the OSTs. > Therefore I had mounted one OST from each OSS, while the others were > still formatting. Consequently, after mounting all OSTs, their OST-Names > or OST-indices were not in OSS machine order, like machine 5 now > providing OST0008 and OST001a. > I''d like to hear from the experts that this cannot possibly have any > influence on the functionality of a Lustre cluster. > > That still leaves the question how I managed to produce this gap in the > OST numbering and why the quota setup should stumble over this. > > In the meantime I reformatted all servers, being very careful with the > OSTs, giving the MDT some time to stomach any new OST I mounted. Now I > have the OSTs in "correct" order, no gap in the indices, and lfs > quotacheck doesn''t make trouble ;-|Can you please file a bug on this. I''m not sure there is an easy solution, however. If the OST is inactive because it is offline then we don''t want to update the quota summary and miss user space usage, but in your case it is the right thing to do. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.