thr3ads.net - Lustre devel - [Lustre-devel] [Bug 11973] Corruption in configuration llogs in Lustre 1.5.97 [Mar 2007]

If this information is useful, please help other people find it:
Share via:

aurelien.degremont@cea.fr

2007-Mar-19 09:33 UTC

[Lustre-devel] [Bug 11973] Corruption in configuration llogs in Lustre 1.5.97

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11973

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|FIXED                       |


In fact, those patches did not solve our problem.

Our problem is hit only in big configurations.
The problem is :

- When the MDS reads the MGS configuration log it does this test in
mgc_copy_handler() :
>        /* append new records */
>        if (rec->lrh_index >= llog_get_size(local_llh)) {
rec->lrh_index refers to the MGS record index. 

Let''s say you have a big conf, with 400 usefull records (not including
paddings). And, your average record size is 130 bytes. So, in CHUNK_SIZE (8192),
you could put 8192 / 130 = 63 records + 1 padding record. If you have 400
usefull records to save, you will need 7 chunks, so with 7 padding records at
total. So, the total number of records will be 400+7=407. And your last record
in the file will be #407 ? But, in the llog header (llog_log_hdr.llh_count)
Lustre only counts the usefull header (not the padding ones), so 400.
(llog_log_hdr.llh_count in incremented in llog_lvfs_write_rec() only when a blob
is written, not when a padding is added.)

llog_get_size(local_llh) returns llog_log_hdr.llh_count, so the number of
usefull records, WITHOUT the padding records, but rec->lrh_index is
incremented
for each record including standard records and the paddings records. So those
two numbers do not compare the same total!

In consequence, MDS thinks it only have 400 records, and, when it will read the
record #401 it will think it a new ones and add it to its log, and so on...
On medium conf, this only lead to warnings, on huge conf, this lead to MDS mount
errors.

If you can create a Lustre conf with 100 OSTs with failover defined for each,
you will reproduce the issue.

aurelien.degremont@cea.fr

2007-Mar-20 01:39 UTC

head link

[Lustre-devel] [Bug 11973] Corruption in configuration llogs in Lustre 1.5.97

Please don''t reply to lustre-devel. Instead, comment in Bugzilla by
using the following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11973


> Is this now the only problem that you see?  This is the only place where
> llog_get_size is used (other than as a non-0 check), and the log produced
by the
> copy handler is only used when the MGS itself isn''t available. 
Can you verify
> that if you start the MDT with the MGS running, you don''t see the
problem?
In fact we always faced this bug in standard system, so with the MGS running.
I think the MDS always uses its local copy when it starts as said in 
mgc_process_log():
>      /* Now, whether we copied or not, start using the local llog.
>        If we failed to copy, we''ll start using whatever the old
>        log has. */
>      ctxt = lctxt;
Our configuration use distinct MDS and MGS devices.
> And that the CONFIGS/<fsname>-MDT0000 log on the MGS does _not_ have
thiscorruption?

Yes, the MGS logs are totally corrects.

Lustre devel - Mar 2007 - [Bug 11973] Corruption in configuration llogs in Lustre 1.5.97

[Lustre-devel] [Bug 11973] Corruption in configuration llogs in Lustre 1.5.97

[Lustre-devel] [Bug 11973] Corruption in configuration llogs in Lustre 1.5.97