thr3ads.net - Lustre discuss - [Lustre-discuss] Using drbd: reformat disk or only sync ? [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Dam Thanh Tung

2009-Nov-21 02:36 UTC

[Lustre-discuss] Using drbd: reformat disk or only sync ?

Hi list

We had a problem with our OST few days ago ( i also posted my question here
), after rebuilding our RAID partition, we use drbd to re-synchronize our
data from active OST to this backup one. We just started drbd from OST (
which has been rebuild RAID partition ) and connect with drbd on an working
OST. Everything was fine and the synchronization completed without any error
report. But, when we mount this backup OST in to our system, some of web
client can''t connect to it ( MDS and some others can ) and after a
short
time, in that OST message log, we see many error report like this :

*Nov 19 19:59:36 OST6 kernel: LDISKFS-fs error (device drbd6):
ldiskfs_lookup: unlinked inode 159588368 in dir #261333022

Nov 19 19:59:36 OST6 kernel: LustreError:
3893:0:(filter_lvb.c:90:filter_lvbo_init()) lustre-OST0006: bad object
996598/0: rc -2

Nov 19 19:59:36 OST6 kernel: LustreError:
3893:0:(filter_lvb.c:90:filter_lvbo_init()) Skipped 7 previous similar
messages

Nov 19 19:59:36 OST6 kernel: LustreError:
3893:0:(ldlm_resource.c:858:ldlm_resource_add()) lvbo_init failed for
resource 996598: rc -2

Nov 19 19:59:36 OST6 kernel: LustreError:
3893:0:(ldlm_resource.c:858:ldlm_resource_add()) Skipped 7 previous similar
messages

*

*Nov 19 19:59:40 OST6 kernel: LDISKFS-fs error (device drbd6):
ldiskfs_lookup: unlinked inode 261038084 in dir #261333008*

*Nov 19 19:59:45 OST6 kernel: LDISKFS-fs error (device drbd6):
ldiskfs_lookup: unlinked inode 229924884 in dir #261333024*

*Nov 19 19:59:47 OST6 kernel: LDISKFS-fs error (device drbd6):
ldiskfs_lookup: unlinked inode 228163899 in dir #261333024*

*Nov 19 19:59:54 OST6 kernel: LDISKFS-fs error (device drbd6):
ldiskfs_lookup: unlinked inode 165830658 in dir #261333015*

I tried umount this drbd disk and use e2fsck to check it, it showed me that
this disk is clean, but after re-mount, it went wrong once again
>From now, i umounted it, and reformat this disk like this:
 *mkfs.lustre --reformat --verbose --writeconf --ost
--mgsnode=192.168.1.78 at tcp:192.168.1.80 at tcp
--failover=192.168.1.83 at tcp--index=6 /dev/sdd
*

it completed without any error. And i''m trying resynchronize this drbd
disk
with an working OST node. It will take pretty long time.

And, after all, the question i want to rise here is:

In order to using drbd as back up solution as i described above, do we need
to reformat disk before synchronize data or just sync it directly ?

Could you please give me an advice or suggestion in my situation ?

Thanks in advance

Best regards
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091121/a576787e/attachment-0001.html

Andreas Dilger

2009-Nov-21 08:25 UTC

head link

[Lustre-discuss] Using drbd: reformat disk or only sync ?

On 2009-11-20, at 19:36, Dam Thanh Tung wrote:> We just started drbd from OST (which has been rebuild RAID  
> partition) and connect with drbd on an working OST. Everything was  
> fine and the synchronization completed without any error report.  
> But, when we mount this backup OST in to our system, some of web  
> client can''t connect to it ( MDS and some others can ) and after a
> short time, in that OST message log, we see many error report like  
> this :
>
> Nov 19 19:59:36 OST6 kernel: LDISKFS-fs error (device drbd6):  
> ldiskfs_lookup: unlinked inode 159588368 in dir #261333022
>
> Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(filter_lvb.c: 
> 90:filter_lvbo_init()) lustre-OST0006: bad object 996598/0: rc -2
It sounds to me like you are trying to mount the "backup OST" at the  
same time as the "primary OST"?  That is definitely NOT how Lustre  
works.  You should stop that, as it will cause serious filesystem  
corruption if you are doing that.

The backup OST should only be mounted when the primary has failed  
(preferably when the primary is powered down via STONITH so that there  
is no chance it will still modify the filesystem).  This is normally  
controlled by HA software like Heartbeat or similar.> In order to using drbd as back up solution as i described above, do  
> we need to reformat disk before synchronize data or just sync it  
> directly?
>
I haven''t used DRBD myself, but I believe that it should NOT require  
formatting a device before using DRBD on it.  However, there would  
need to be an initial synchronization to copy all of the data from the  
primary copy to the backup.  DRBD is just doing a block-level copy of  
one device to another, it doesn''t know anything about the filesystem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Dam Thanh Tung

2009-Nov-21 15:34 UTC

head link

[Lustre-discuss] Using drbd: reformat disk or only sync ?

On Sat, Nov 21, 2009 at 3:25 PM, Andreas Dilger <adilger at sun.com>
wrote:
On 2009-11-20, at 19:36, Dam Thanh Tung wrote:
We just started drbd from OST (which has been rebuild RAID partition)
and connect with drbd on an working OST. Everything was fine and the
synchronization completed without any error report. But, when we mount
this backup OST in to our system, some of web client can''t connect to
it ( MDS and some others can ) and after a short time, in that OST
message log, we see many error report like this :

Nov 19 19:59:36 OST6 kernel: LDISKFS-fs error (device drbd6):
ldiskfs_lookup: unlinked inode 159588368 in dir #261333022

Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(filter_lvb.c:
90:filter_lvbo_init()) lustre-OST0006: bad object 996598/0: rc -2

It sounds to me like you are trying to mount the "backup OST" at the
same time as the "primary OST"? That is definitely NOT how Lustre
works. You should stop that, as it will cause serious filesystem
corruption if you are doing that.

The backup OST should only be mounted when the primary has failed
(preferably when the primary is powered down via STONITH so that there
is no chance it will still modify the filesystem). This is normally
controlled by HA software like Heartbeat or similar.

Thank you for your fast reply, Andreas

Maybe because i explained not really clearly, so you are
misunderstanding me.

I only mounted our backup OST when my primary OST went down, and it
showed me those error report

In order to using drbd as back up solution as i described above, do we
need to reformat disk before synchronize data or just sync it directly?

I haven''t used DRBD myself, but I believe that it should NOT require
formatting a device before using DRBD on it. However, there would
need to be an initial synchronization to copy all of the data from the
primary copy to the backup. DRBD is just doing a block-level copy of
one device to another, it doesn''t know anything about the filesystem.

If i don''t need to re-format before synchronizing data,could you
please tell me why did we have those error? The synchronization
completed successfully ! ( we synchronize data from primary OST to
backup OST, i when i mount the backup OST, get those errors - our
primary OST contains data but it can''t connect to our MDS, because of
it, we have to use backup OST like i described before in this list )

Everything is going worse and worse. Hope you can help me bring the
data back. If you need more information, i''ll send you in detail.

Many thanks

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091121/ab48db22/attachment.html

Jeffrey Bennett

2009-Dec-19 01:13 UTC

head link

[Lustre-discuss] MGT of 128 MB - already out of space

Hi,

Scenario is the following: 

- Lustre 1.8.1.1
- 3 Lustre filesystems, fully redundant (two networks, OSSs on active/active,
MDSs on active/passive)
- 1 MGS, 1 MDT, 2 OSTs
- For the MGT, 128MB were allocated, following Lustre''s manual
recommendations
- The MGT is already out of space, and a "ls" of the MGT is showing
files are 8MB, like:

-rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-client
-rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-MDT0000
-rw-r--r-- 1 root root 8.0M Dec  2 16:42 devfs-OST0000

Other lustre filesystems I have worked on show much smaller files. A
"dumpe2fs" on this MGT does not show anything strange like huge block
sizes, etc.

Question is, why are these files so big and how can we "shrink" them?
Is it possible to run --writeconf to fix this?

Thanks,

Jeffrey A. Bennett
HPC Systems Engineer
San Diego Supercomputer Center
http://users.sdsc.edu/~jab

Andreas Dilger

2009-Dec-19 06:25 UTC

head link

[Lustre-discuss] MGT of 128 MB - already out of space

On 2009-12-18, at 18:13, Jeffrey Bennett wrote:> Scenario is the following:
>
> - Lustre 1.8.1.1
> - 3 Lustre filesystems, fully redundant (two networks, OSSs on  
> active/active, MDSs on active/passive)
> - 1 MGS, 1 MDT, 2 OSTs
> - For the MGT, 128MB were allocated, following Lustre''s manual  
> recommendations
> - The MGT is already out of space, and a "ls" of the MGT is
showing
> files are 8MB, like:
>
> -rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-client
> -rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-MDT0000
> -rw-r--r-- 1 root root 8.0M Dec  2 16:42 devfs-OST0000
How many OSTs do you have?  Is this consuming all of the space?
> Other lustre filesystems I have worked on show much smaller files. A  
> "dumpe2fs" on this MGT does not show anything strange like huge  
> block sizes, etc.
Are these files sparse by some chance?  What does "ls -ls" show?

It may be that your journal is consuming a lot of space?  Try running:

debugfs -c -R "stat <8>" /dev/{MGTdev}

You really don''t need more than the absolute minimum of space for the  
MGT, which is 4MB.  You can remove the journal via "tune2fs -O  
^has_journal" on an umounted filesystem, then "tune2fs -j -J
size=4"
to recreate it at the minimum size (maybe "-J size=5" if it
complains).
> Question is, why are these files so big and how can we "shrink"
them?
> Is it possible to run --writeconf to fix this?
If all of the space is really consumed by the config files, are you  
using a lot of "lctl conf_param" commands, ost pools, or something  
else that would put a lot of records into the config logs?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Jeffrey Bennett

2009-Dec-23 04:47 UTC

head link

[Lustre-discuss] MGT of 128 MB - already out of space

Hi Andreas,

This turned out to be a bug on a script setting the timeout value with lctl
every minute or so, thus filling the logs.

Hopefully a tune2fs --writeconf on the MGT will remove the logs, am I correct?  

jab


-----Original Message-----
From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of
Andreas Dilger
Sent: Friday, December 18, 2009 10:26 PM
To: Jeffrey Bennett
Cc: lustre-discuss at lists.lustre.org
Subject: Re: [Lustre-discuss] MGT of 128 MB - already out of space

On 2009-12-18, at 18:13, Jeffrey Bennett wrote:> Scenario is the following:
>
> - Lustre 1.8.1.1
> - 3 Lustre filesystems, fully redundant (two networks, OSSs on  
> active/active, MDSs on active/passive)
> - 1 MGS, 1 MDT, 2 OSTs
> - For the MGT, 128MB were allocated, following Lustre''s manual  
> recommendations
> - The MGT is already out of space, and a "ls" of the MGT is
showing
> files are 8MB, like:
>
> -rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-client
> -rw-r--r-- 1 root root 8.0M Dec  2 15:11 devfs-MDT0000
> -rw-r--r-- 1 root root 8.0M Dec  2 16:42 devfs-OST0000
How many OSTs do you have?  Is this consuming all of the space?
> Other lustre filesystems I have worked on show much smaller files. A  
> "dumpe2fs" on this MGT does not show anything strange like huge  
> block sizes, etc.
Are these files sparse by some chance?  What does "ls -ls" show?

It may be that your journal is consuming a lot of space?  Try running:

debugfs -c -R "stat <8>" /dev/{MGTdev}

You really don''t need more than the absolute minimum of space for the  
MGT, which is 4MB.  You can remove the journal via "tune2fs -O  
^has_journal" on an umounted filesystem, then "tune2fs -j -J
size=4"
to recreate it at the minimum size (maybe "-J size=5" if it
complains).
> Question is, why are these files so big and how can we "shrink"
them?
> Is it possible to run --writeconf to fix this?
If all of the space is really consumed by the config files, are you  
using a lot of "lctl conf_param" commands, ost pools, or something  
else that would put a lot of records into the config logs?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Nov 2009 - Using drbd: reformat disk or only sync ?

[Lustre-discuss] Using drbd: reformat disk or only sync ?

[Lustre-discuss] Using drbd: reformat disk or only sync ?

[Lustre-discuss] Using drbd: reformat disk or only sync ?

[Lustre-discuss] MGT of 128 MB - already out of space

[Lustre-discuss] MGT of 128 MB - already out of space

[Lustre-discuss] MGT of 128 MB - already out of space