I have a cluster with 8 nodes, all of them running Debian Lenny (plus some additions so multipath and Infiniband works), which share an array of 48 1TB disks. Those disks form 22 pairs of hardware RAID1, plus 4 spares). The first 21 pairs are organized in two striped LVM logical volumes, of 16 and 3 TB, both formatted with ocfs2. The kernel is the version supplied with the distribution (2.6.26-2-amd64). I wanted to run an fsck on both volumes because of some errors I was getting (probably unrelated to the filesystems, but I wanted to check). On the 3TB volume (around 10% full) the check worked perfectly, and finished in less than an hour (this was run with the fsck.ocfs2 provided by Lenny ocfs2-tools, version 1.4.1): =============root at hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol1 Checking OCFS2 filesystem in /dev/hidrahome/lvol1: label: <NONE> uuid: ab 76 a9 41 fa df 4c ac a3 9f 26 c5 ae 34 1a 3f number of blocks: 959809536 bytes per block: 4096 number of clusters: 959809536 bytes per cluster: 4096 max slots: 8 /dev/hidrahome/lvol1 was run with -f, check forced. Pass 0a: Checking cluster allocation chains Pass 0b: Checking inode allocation chains Pass 0c: Checking extent block allocation chains Pass 1: Checking inodes and blocks. Pass 2: Checking directory entries. Pass 3: Checking directory connectivity. Pass 4a: checking for orphaned inodes Pass 4b: Checking inodes link counts. All passes succeeded. =========== but the check for the second filesystem (around 40% full) did this: ===========hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol0 Checking OCFS2 filesystem in /dev/hidrahome/lvol0: label: <NONE> uuid: 6a a9 0e aa cf 33 45 4c b4 72 3a b6 7c 3b 8d 57 number of blocks: 4168098816 bytes per block: 4096 number of clusters: 4168098816 bytes per cluster: 4096 max slots: 8 /dev/hidrahome/lvol0 was run with -f, check forced. Pass 0a: Checking cluster allocation chains ============ and stayed there for 8 hours (all the time keeping one core around 100% CPU usage and with a light load on the disks; this was consistent with the same step in the previous run, but of course it didn't take so long). I thought that maybe I had run into some bug, so I interrupted the process, downloaded ocfs2-tools 1.4.4 sources, compiled them, and tried with that fsck, obtaining similar results, since it's been running for almost 7 hours like this: ============hidra0:/usr/local/src/ocfs2-tools-1.4.4/fsck.ocfs2# ./fsck.ocfs2 -f /dev/hidrahome/lvol0 fsck.ocfs2 1.4.4 Checking OCFS2 filesystem in /dev/hidrahome/lvol0: Label: <NONE> UUID: 6AA90EAACF33454CB4723AB67C3B8D57 Number of blocks: 4168098816 Block size: 4096 Number of clusters: 4168098816 Cluster size: 4096 Number of slots: 8 /dev/hidrahome/lvol0 was run with -f, check forced. Pass 0a: Checking cluster allocation chains ============ and with one core CPU at 100%. Could someone tell me if this is normal? I've been searching the web and checking manuals for information on how long this checks should take, and apart from one message in this list mentioning that 3 days in a 8 TB filesystem with 300 GB was too long, I haven't been able to find anything. If this is normal, is there any way to estimate, taking into account that the first filesystem uses exactly the same disks and took less than an hour to check, how long it should take for this other filesystem? Thanks! Josep Guerrero
What is the block size? -----Original Message----- From: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Josep Guerrero Sent: Thursday, April 21, 2011 4:43 PM To: ocfs2-users at oss.oracle.com Subject: [Ocfs2-users] How long for an fsck? I have a cluster with 8 nodes, all of them running Debian Lenny (plus some additions so multipath and Infiniband works), which share an array of 48 1TB disks. Those disks form 22 pairs of hardware RAID1, plus 4 spares). The first 21 pairs are organized in two striped LVM logical volumes, of 16 and 3 TB, both formatted with ocfs2. The kernel is the version supplied with the distribution (2.6.26-2-amd64). I wanted to run an fsck on both volumes because of some errors I was getting (probably unrelated to the filesystems, but I wanted to check). On the 3TB volume (around 10% full) the check worked perfectly, and finished in less than an hour (this was run with the fsck.ocfs2 provided by Lenny ocfs2-tools, version 1.4.1): ============= root at hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol1 Checking OCFS2 filesystem in /dev/hidrahome/lvol1: label: <NONE> uuid: ab 76 a9 41 fa df 4c ac a3 9f 26 c5 ae 34 1a 3f number of blocks: 959809536 bytes per block: 4096 number of clusters: 959809536 bytes per cluster: 4096 max slots: 8 /dev/hidrahome/lvol1 was run with -f, check forced. Pass 0a: Checking cluster allocation chains Pass 0b: Checking inode allocation chains Pass 0c: Checking extent block allocation chains Pass 1: Checking inodes and blocks. Pass 2: Checking directory entries. Pass 3: Checking directory connectivity. Pass 4a: checking for orphaned inodes Pass 4b: Checking inodes link counts. All passes succeeded. =========== but the check for the second filesystem (around 40% full) did this: =========== hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol0 Checking OCFS2 filesystem in /dev/hidrahome/lvol0: label: <NONE> uuid: 6a a9 0e aa cf 33 45 4c b4 72 3a b6 7c 3b 8d 57 number of blocks: 4168098816 bytes per block: 4096 number of clusters: 4168098816 bytes per cluster: 4096 max slots: 8 /dev/hidrahome/lvol0 was run with -f, check forced. Pass 0a: Checking cluster allocation chains ============ and stayed there for 8 hours (all the time keeping one core around 100% CPU usage and with a light load on the disks; this was consistent with the same step in the previous run, but of course it didn't take so long). I thought that maybe I had run into some bug, so I interrupted the process, downloaded ocfs2-tools 1.4.4 sources, compiled them, and tried with that fsck, obtaining similar results, since it's been running for almost 7 hours like this: ============ hidra0:/usr/local/src/ocfs2-tools-1.4.4/fsck.ocfs2# ./fsck.ocfs2 -f /dev/hidrahome/lvol0 fsck.ocfs2 1.4.4 Checking OCFS2 filesystem in /dev/hidrahome/lvol0: Label: <NONE> UUID: 6AA90EAACF33454CB4723AB67C3B8D57 Number of blocks: 4168098816 Block size: 4096 Number of clusters: 4168098816 Cluster size: 4096 Number of slots: 8 /dev/hidrahome/lvol0 was run with -f, check forced. Pass 0a: Checking cluster allocation chains ============ and with one core CPU at 100%. Could someone tell me if this is normal? I've been searching the web and checking manuals for information on how long this checks should take, and apart from one message in this list mentioning that 3 days in a 8 TB filesystem with 300 GB was too long, I haven't been able to find anything. If this is normal, is there any way to estimate, taking into account that the first filesystem uses exactly the same disks and took less than an hour to check, how long it should take for this other filesystem? Thanks! Josep Guerrero _______________________________________________ Ocfs2-users mailing list <mailto:Ocfs2-users at oss.oracle.com> Ocfs2-users at oss.oracle.com <http://oss.oracle.com/mailman/listinfo/ocfs2-users> http://oss.oracle.com/mailman/listinfo/ocfs2-users -------------- next part -------------- An HTML attachment was scrubbed... URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110421/2c74f085/attachment.html
On 04/21/2011 06:43 AM, Josep Guerrero wrote:> I have a cluster with 8 nodes, all of them running Debian Lenny (plus some > additions so multipath and Infiniband works), which share an array of 48 1TB > disks. Those disks form 22 pairs of hardware RAID1, plus 4 spares). The first > 21 pairs are organized in two striped LVM logical volumes, of 16 and 3 TB, > both formatted with ocfs2. The kernel is the version supplied with the > distribution (2.6.26-2-amd64). > > I wanted to run an fsck on both volumes because of some errors I was getting > (probably unrelated to the filesystems, but I wanted to check). On the 3TB > volume (around 10% full) the check worked perfectly, and finished in less than > an hour (this was run with the fsck.ocfs2 provided by Lenny ocfs2-tools, > version 1.4.1): ><snip>> but the check for the second filesystem (around 40% full) did this: > > ===========> hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol0 > Checking OCFS2 filesystem in /dev/hidrahome/lvol0: > label:<NONE> > uuid: 6a a9 0e aa cf 33 45 4c b4 72 3a b6 7c 3b 8d 57 > number of blocks: 4168098816 > bytes per block: 4096 > number of clusters: 4168098816 > bytes per cluster: 4096 > max slots: 8 > > /dev/hidrahome/lvol0 was run with -f, check forced. > Pass 0a: Checking cluster allocation chains > ============> > and stayed there for 8 hours (all the time keeping one core around 100% CPU > usage and with a light load on the disks; this was consistent with the same > step in the previous run, but of course it didn't take so long). I thought > that maybe I had run into some bug, so I interrupted the process, downloaded > ocfs2-tools 1.4.4 sources, compiled them, and tried with that fsck, obtaining > similar results, since it's been running for almost 7 hours like this: > > ============> hidra0:/usr/local/src/ocfs2-tools-1.4.4/fsck.ocfs2# ./fsck.ocfs2 -f > /dev/hidrahome/lvol0 > fsck.ocfs2 1.4.4 > Checking OCFS2 filesystem in /dev/hidrahome/lvol0: > Label:<NONE> > UUID: 6AA90EAACF33454CB4723AB67C3B8D57 > Number of blocks: 4168098816 > Block size: 4096 > Number of clusters: 4168098816 > Cluster size: 4096 > Number of slots: 8 > > /dev/hidrahome/lvol0 was run with -f, check forced. > Pass 0a: Checking cluster allocation chains > > ============> > and with one core CPU at 100%. > > Could someone tell me if this is normal? I've been searching the web and > checking manuals for information on how long this checks should take, and > apart from one message in this list mentioning that 3 days in a 8 TB filesystem > with 300 GB was too long, I haven't been able to find anything. > > If this is normal, is there any way to estimate, taking into account that the > first filesystem uses exactly the same disks and took less than an hour to > check, how long it should take for this other filesystem?Do: # debugfs.ocfs2 -R "stat //global_bitmap" /dev/hidrahome/lvol0 Does this hang too? Redirect the output to a file. That will give us some clues.
Hello again, It just finished. The output file is almost 9 MB long, but compressed is less than 1 MB. I attach it to the message.> Do: > # debugfs.ocfs2 -R "stat //global_bitmap" /dev/hidrahome/lvol0 > > Does this hang too? Redirect the output to a file. That will give us some > clues.Josep Guerrero -------------- next part -------------- A non-text attachment was scrubbed... Name: fit.bz2 Type: application/x-bzip Size: 903876 bytes Desc: not available Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110421/cc82c559/attachment-0001.bin
On 04/21/2011 10:46 AM, Josep Guerrero wrote:> Hello again, > > It just finished. The output file is almost 9 MB long, but compressed is less > than 1 MB. I attach it to the message. > >> Do: >> # debugfs.ocfs2 -R "stat //global_bitmap" /dev/hidrahome/lvol0 >> >> Does this hang too? Redirect the output to a file. That will give us some >> clues. >How long did the debugfs output take? Did fsck eventually finish? If so, how long did that take? Approximately. I have a theory as to why it is slow. But I would like some confirmation. Thanks Sunil
On 04/22/2011 02:33 PM, Sunil Mushran wrote:> On 04/21/2011 10:46 AM, Josep Guerrero wrote: >> Hello again, >> >> It just finished. The output file is almost 9 MB long, but compressed is less >> than 1 MB. I attach it to the message. >> >>> Do: >>> # debugfs.ocfs2 -R "stat //global_bitmap" /dev/hidrahome/lvol0 >>> >>> Does this hang too? Redirect the output to a file. That will give us some >>> clues. > How long did the debugfs output take? > Did fsck eventually finish? > If so, how long did that take? > Approximately. > > I have a theory as to why it is slow. But I would like some confirmation.BTW, you said one of the cores was at 100%. What does top show? Is fsck the main contributor or is some other process spinning? My theory had fsck have high wait%. I seem to be missing something.
On 04/22/2011 03:24 PM, Josep Guerrero wrote:>> How long did the debugfs output take? > I think about 30 minutes. No more than 50 for sure (just by looking at the > times of the mails). > >> Did fsck eventually finish? > No. I had to cancel it after it stayed 24 hours in the same state, showing the > same message. It never moved beyond "Pass 0a", and always was using 100% CPU > in one core. I don't know if it would have finished on its own. > >> BTW, you said one of the cores was at 100%. What does top show? >> Is fsck the main contributor or is some other process spinning? > It was fsck (I kept a top opened the whole time, and fsck always was around > 99% CPU usage). > >> I have a theory as to why it is slow. But I would like some confirmation. >> My theory had fsck have high wait%. I seem to be missing something. > I didn't look at the wait%, but I checked the physical disk load with iotop > and it was very low, so it didn't look like fsck was being slow because of the > disk. In the filesystem I successfully "fscked" before (the 3 TB one that took > less than 60 minutes), it started doing something similar (very high CPU > usage, low disk load) but after several minutes (when the rest of the messages > after "Pass 0a" appeared), it did just the opposite: low CPU use, high disk > load. Both filesystems are physically on the same set of disks (the 16TB > logical volume is an striped LVM volume that fills about 75% of the 21 physical > disks and the 3TB is another striped LVM volume filling the remaining space of > the same disks) so I don't think it's a problem with the physical devices (of > course, I could be wrong).File a bz. This will need some investigation. BTW, how much memory does your box have?
Hi, On Sat, Apr 23, 2011 at 10:57 AM, Sunil Mushran <sunil.mushran at oracle.com> wrote:> On 04/23/2011 07:56 AM, Tao Ma wrote: >> >> So what is your version of fsck? I have met with some issue like that >> when fsck is allocating a large number of memories and it stucks for >> quite a long time of because of the swapping. > > It is not that issue. It is in pass0. I assumed there was a problem > is in cluster allocation chains. But debugfs managed to scan the > chain. No loops. Looks ok. So unsure where it could be spinning. > > Note it is a 16T, ?4k/4k fs.We had a similar problem which was fixed by commit 2d741da9367b33f559802dfabe62d96f6adc7777 Version number would be helpful. Regards, -- Goldwyn
On 05/11/2011 11:14 AM, Goldwyn Rodrigues wrote:> Hi, > > On Sat, Apr 23, 2011 at 10:57 AM, Sunil Mushran > <sunil.mushran at oracle.com> wrote: >> On 04/23/2011 07:56 AM, Tao Ma wrote: >>> So what is your version of fsck? I have met with some issue like that >>> when fsck is allocating a large number of memories and it stucks for >>> quite a long time of because of the swapping. >> It is not that issue. It is in pass0. I assumed there was a problem >> is in cluster allocation chains. But debugfs managed to scan the >> chain. No loops. Looks ok. So unsure where it could be spinning. >> >> Note it is a 16T, 4k/4k fs. > > > We had a similar problem which was fixed by > commit 2d741da9367b33f559802dfabe62d96f6adc7777 > > Version number would be helpful.Thanks for that. Josep was on 1.4.4. Fixed in 1.6.4. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1323