Dam Thanh Tung
2009-Nov-21 02:36 UTC
[Lustre-discuss] Using drbd: reformat disk or only sync ?
Hi list We had a problem with our OST few days ago ( i also posted my question here ), after rebuilding our RAID partition, we use drbd to re-synchronize our data from active OST to this backup one. We just started drbd from OST ( which has been rebuild RAID partition ) and connect with drbd on an working OST. Everything was fine and the synchronization completed without any error report. But, when we mount this backup OST in to our system, some of web client can''t connect to it ( MDS and some others can ) and after a short time, in that OST message log, we see many error report like this : *Nov 19 19:59:36 OST6 kernel: LDISKFS-fs error (device drbd6): ldiskfs_lookup: unlinked inode 159588368 in dir #261333022 Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(filter_lvb.c:90:filter_lvbo_init()) lustre-OST0006: bad object 996598/0: rc -2 Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(filter_lvb.c:90:filter_lvbo_init()) Skipped 7 previous similar messages Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(ldlm_resource.c:858:ldlm_resource_add()) lvbo_init failed for resource 996598: rc -2 Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(ldlm_resource.c:858:ldlm_resource_add()) Skipped 7 previous similar messages * *Nov 19 19:59:40 OST6 kernel: LDISKFS-fs error (device drbd6): ldiskfs_lookup: unlinked inode 261038084 in dir #261333008* *Nov 19 19:59:45 OST6 kernel: LDISKFS-fs error (device drbd6): ldiskfs_lookup: unlinked inode 229924884 in dir #261333024* *Nov 19 19:59:47 OST6 kernel: LDISKFS-fs error (device drbd6): ldiskfs_lookup: unlinked inode 228163899 in dir #261333024* *Nov 19 19:59:54 OST6 kernel: LDISKFS-fs error (device drbd6): ldiskfs_lookup: unlinked inode 165830658 in dir #261333015* I tried umount this drbd disk and use e2fsck to check it, it showed me that this disk is clean, but after re-mount, it went wrong once again>From now, i umounted it, and reformat this disk like this:*mkfs.lustre --reformat --verbose --writeconf --ost --mgsnode=192.168.1.78 at tcp:192.168.1.80 at tcp --failover=192.168.1.83 at tcp--index=6 /dev/sdd * it completed without any error. And i''m trying resynchronize this drbd disk with an working OST node. It will take pretty long time. And, after all, the question i want to rise here is: In order to using drbd as back up solution as i described above, do we need to reformat disk before synchronize data or just sync it directly ? Could you please give me an advice or suggestion in my situation ? Thanks in advance Best regards -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091121/a576787e/attachment-0001.html
Andreas Dilger
2009-Nov-21 08:25 UTC
[Lustre-discuss] Using drbd: reformat disk or only sync ?
On 2009-11-20, at 19:36, Dam Thanh Tung wrote:> We just started drbd from OST (which has been rebuild RAID > partition) and connect with drbd on an working OST. Everything was > fine and the synchronization completed without any error report. > But, when we mount this backup OST in to our system, some of web > client can''t connect to it ( MDS and some others can ) and after a > short time, in that OST message log, we see many error report like > this : > > Nov 19 19:59:36 OST6 kernel: LDISKFS-fs error (device drbd6): > ldiskfs_lookup: unlinked inode 159588368 in dir #261333022 > > Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(filter_lvb.c: > 90:filter_lvbo_init()) lustre-OST0006: bad object 996598/0: rc -2It sounds to me like you are trying to mount the "backup OST" at the same time as the "primary OST"? That is definitely NOT how Lustre works. You should stop that, as it will cause serious filesystem corruption if you are doing that. The backup OST should only be mounted when the primary has failed (preferably when the primary is powered down via STONITH so that there is no chance it will still modify the filesystem). This is normally controlled by HA software like Heartbeat or similar.> In order to using drbd as back up solution as i described above, do > we need to reformat disk before synchronize data or just sync it > directly? >I haven''t used DRBD myself, but I believe that it should NOT require formatting a device before using DRBD on it. However, there would need to be an initial synchronization to copy all of the data from the primary copy to the backup. DRBD is just doing a block-level copy of one device to another, it doesn''t know anything about the filesystem. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Dam Thanh Tung
2009-Nov-21 15:34 UTC
[Lustre-discuss] Using drbd: reformat disk or only sync ?
On Sat, Nov 21, 2009 at 3:25 PM, Andreas Dilger <adilger at sun.com> wrote: On 2009-11-20, at 19:36, Dam Thanh Tung wrote: We just started drbd from OST (which has been rebuild RAID partition) and connect with drbd on an working OST. Everything was fine and the synchronization completed without any error report. But, when we mount this backup OST in to our system, some of web client can''t connect to it ( MDS and some others can ) and after a short time, in that OST message log, we see many error report like this : Nov 19 19:59:36 OST6 kernel: LDISKFS-fs error (device drbd6): ldiskfs_lookup: unlinked inode 159588368 in dir #261333022 Nov 19 19:59:36 OST6 kernel: LustreError: 3893:0:(filter_lvb.c: 90:filter_lvbo_init()) lustre-OST0006: bad object 996598/0: rc -2 It sounds to me like you are trying to mount the "backup OST" at the same time as the "primary OST"? That is definitely NOT how Lustre works. You should stop that, as it will cause serious filesystem corruption if you are doing that. The backup OST should only be mounted when the primary has failed (preferably when the primary is powered down via STONITH so that there is no chance it will still modify the filesystem). This is normally controlled by HA software like Heartbeat or similar. Thank you for your fast reply, Andreas Maybe because i explained not really clearly, so you are misunderstanding me. I only mounted our backup OST when my primary OST went down, and it showed me those error report In order to using drbd as back up solution as i described above, do we need to reformat disk before synchronize data or just sync it directly? I haven''t used DRBD myself, but I believe that it should NOT require formatting a device before using DRBD on it. However, there would need to be an initial synchronization to copy all of the data from the primary copy to the backup. DRBD is just doing a block-level copy of one device to another, it doesn''t know anything about the filesystem. If i don''t need to re-format before synchronizing data,could you please tell me why did we have those error? The synchronization completed successfully ! ( we synchronize data from primary OST to backup OST, i when i mount the backup OST, get those errors - our primary OST contains data but it can''t connect to our MDS, because of it, we have to use backup OST like i described before in this list ) Everything is going worse and worse. Hope you can help me bring the data back. If you need more information, i''ll send you in detail. Many thanks Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091121/ab48db22/attachment.html
Jeffrey Bennett
2009-Dec-19 01:13 UTC
[Lustre-discuss] MGT of 128 MB - already out of space
Hi, Scenario is the following: - Lustre 1.8.1.1 - 3 Lustre filesystems, fully redundant (two networks, OSSs on active/active, MDSs on active/passive) - 1 MGS, 1 MDT, 2 OSTs - For the MGT, 128MB were allocated, following Lustre''s manual recommendations - The MGT is already out of space, and a "ls" of the MGT is showing files are 8MB, like: -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-client -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-MDT0000 -rw-r--r-- 1 root root 8.0M Dec 2 16:42 devfs-OST0000 Other lustre filesystems I have worked on show much smaller files. A "dumpe2fs" on this MGT does not show anything strange like huge block sizes, etc. Question is, why are these files so big and how can we "shrink" them? Is it possible to run --writeconf to fix this? Thanks, Jeffrey A. Bennett HPC Systems Engineer San Diego Supercomputer Center http://users.sdsc.edu/~jab
Andreas Dilger
2009-Dec-19 06:25 UTC
[Lustre-discuss] MGT of 128 MB - already out of space
On 2009-12-18, at 18:13, Jeffrey Bennett wrote:> Scenario is the following: > > - Lustre 1.8.1.1 > - 3 Lustre filesystems, fully redundant (two networks, OSSs on > active/active, MDSs on active/passive) > - 1 MGS, 1 MDT, 2 OSTs > - For the MGT, 128MB were allocated, following Lustre''s manual > recommendations > - The MGT is already out of space, and a "ls" of the MGT is showing > files are 8MB, like: > > -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-client > -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-MDT0000 > -rw-r--r-- 1 root root 8.0M Dec 2 16:42 devfs-OST0000How many OSTs do you have? Is this consuming all of the space?> Other lustre filesystems I have worked on show much smaller files. A > "dumpe2fs" on this MGT does not show anything strange like huge > block sizes, etc.Are these files sparse by some chance? What does "ls -ls" show? It may be that your journal is consuming a lot of space? Try running: debugfs -c -R "stat <8>" /dev/{MGTdev} You really don''t need more than the absolute minimum of space for the MGT, which is 4MB. You can remove the journal via "tune2fs -O ^has_journal" on an umounted filesystem, then "tune2fs -j -J size=4" to recreate it at the minimum size (maybe "-J size=5" if it complains).> Question is, why are these files so big and how can we "shrink" them? > Is it possible to run --writeconf to fix this?If all of the space is really consumed by the config files, are you using a lot of "lctl conf_param" commands, ost pools, or something else that would put a lot of records into the config logs? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Jeffrey Bennett
2009-Dec-23 04:47 UTC
[Lustre-discuss] MGT of 128 MB - already out of space
Hi Andreas, This turned out to be a bug on a script setting the timeout value with lctl every minute or so, thus filling the logs. Hopefully a tune2fs --writeconf on the MGT will remove the logs, am I correct? jab -----Original Message----- From: Andreas.Dilger at sun.com [mailto:Andreas.Dilger at sun.com] On Behalf Of Andreas Dilger Sent: Friday, December 18, 2009 10:26 PM To: Jeffrey Bennett Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] MGT of 128 MB - already out of space On 2009-12-18, at 18:13, Jeffrey Bennett wrote:> Scenario is the following: > > - Lustre 1.8.1.1 > - 3 Lustre filesystems, fully redundant (two networks, OSSs on > active/active, MDSs on active/passive) > - 1 MGS, 1 MDT, 2 OSTs > - For the MGT, 128MB were allocated, following Lustre''s manual > recommendations > - The MGT is already out of space, and a "ls" of the MGT is showing > files are 8MB, like: > > -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-client > -rw-r--r-- 1 root root 8.0M Dec 2 15:11 devfs-MDT0000 > -rw-r--r-- 1 root root 8.0M Dec 2 16:42 devfs-OST0000How many OSTs do you have? Is this consuming all of the space?> Other lustre filesystems I have worked on show much smaller files. A > "dumpe2fs" on this MGT does not show anything strange like huge > block sizes, etc.Are these files sparse by some chance? What does "ls -ls" show? It may be that your journal is consuming a lot of space? Try running: debugfs -c -R "stat <8>" /dev/{MGTdev} You really don''t need more than the absolute minimum of space for the MGT, which is 4MB. You can remove the journal via "tune2fs -O ^has_journal" on an umounted filesystem, then "tune2fs -j -J size=4" to recreate it at the minimum size (maybe "-J size=5" if it complains).> Question is, why are these files so big and how can we "shrink" them? > Is it possible to run --writeconf to fix this?If all of the space is really consumed by the config files, are you using a lot of "lctl conf_param" commands, ost pools, or something else that would put a lot of records into the config logs? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.