aurelien.degremont@cea.fr
2007-Mar-19 09:33 UTC
[Lustre-devel] [Bug 11973] Corruption in configuration llogs in Lustre 1.5.97
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11973 What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|FIXED | In fact, those patches did not solve our problem. Our problem is hit only in big configurations. The problem is : - When the MDS reads the MGS configuration log it does this test in mgc_copy_handler() :> /* append new records */ > if (rec->lrh_index >= llog_get_size(local_llh)) {rec->lrh_index refers to the MGS record index. Let''s say you have a big conf, with 400 usefull records (not including paddings). And, your average record size is 130 bytes. So, in CHUNK_SIZE (8192), you could put 8192 / 130 = 63 records + 1 padding record. If you have 400 usefull records to save, you will need 7 chunks, so with 7 padding records at total. So, the total number of records will be 400+7=407. And your last record in the file will be #407 ? But, in the llog header (llog_log_hdr.llh_count) Lustre only counts the usefull header (not the padding ones), so 400. (llog_log_hdr.llh_count in incremented in llog_lvfs_write_rec() only when a blob is written, not when a padding is added.) llog_get_size(local_llh) returns llog_log_hdr.llh_count, so the number of usefull records, WITHOUT the padding records, but rec->lrh_index is incremented for each record including standard records and the paddings records. So those two numbers do not compare the same total! In consequence, MDS thinks it only have 400 records, and, when it will read the record #401 it will think it a new ones and add it to its log, and so on... On medium conf, this only lead to warnings, on huge conf, this lead to MDS mount errors. If you can create a Lustre conf with 100 OSTs with failover defined for each, you will reproduce the issue.
aurelien.degremont@cea.fr
2007-Mar-20 01:39 UTC
[Lustre-devel] [Bug 11973] Corruption in configuration llogs in Lustre 1.5.97
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11973> Is this now the only problem that you see? This is the only place where > llog_get_size is used (other than as a non-0 check), and the log produced by the > copy handler is only used when the MGS itself isn''t available. Can you verify > that if you start the MDT with the MGS running, you don''t see the problem?In fact we always faced this bug in standard system, so with the MGS running. I think the MDS always uses its local copy when it starts as said in mgc_process_log():> /* Now, whether we copied or not, start using the local llog. > If we failed to copy, we''ll start using whatever the old > log has. */ > ctxt = lctxt;Our configuration use distinct MDS and MGS devices.> And that the CONFIGS/<fsname>-MDT0000 log on the MGS does _not_ have thiscorruption? Yes, the MGS logs are totally corrects.