Hello list,
I have a ~10TB ocfs2 filesystem in a 8-node cluster. This sits on a
logical volume (I know lv is not cluster aware, but I make sure no one
touches the lv, while the cluster is running). The LV consists of 5x2TB
multipath devices.
I recently had errors like this on some nodes:
OCFS2: ERROR (device dm-7): ocfs2_check_group_descriptor: Group Descriptor # 0
has bad signature
File system is now read-only due to the potential of on-disk corruption. Please
run fsck.ocfs2 once the file system is unmounted.
(kvm,12322,1):ocfs2_search_chain:1363 ERROR: status = -5
(kvm,12322,1):ocfs2_claim_suballoc_bits:1524 ERROR: status = -5
(kvm,12322,1):__ocfs2_claim_clusters:1806 ERROR: status = -5
(kvm,12322,1):ocfs2_local_alloc_new_window:1013 ERROR: status = -5
(kvm,12322,1):ocfs2_local_alloc_slide_window:1116 ERROR: status = -5
(kvm,12322,1):ocfs2_reserve_local_alloc_bits:537 ERROR: status = -5
(kvm,12322,1):__ocfs2_reserve_clusters:816 ERROR: status = -5
(kvm,12322,1):ocfs2_lock_allocators:677 ERROR: status = -5
(kvm,12322,1):ocfs2_write_begin_nolock:1750 ERROR: status = -5
(kvm,12322,1):ocfs2_write_begin:1860 ERROR: status = -5
(kvm,12322,1):ocfs2_file_buffered_write:2039 ERROR: status = -5
OCFS2: ERROR (device dm-7): ocfs2_check_group_descriptor: Group Descriptor # 0
has bad signature
So I ran fsck.ocfs2 -f. But it hangs forever (>12h) with this output:
fsck.ocfs2 1.4.4
Checking OCFS2 filesystem in /dev/mapper/lv0:
Label: <NONE>
UUID: F27D7B8F7127436981A2B5D1C93FB204
Number of blocks: 2684349440
Block size: 4096
Number of clusters: 2684349440
Cluster size: 4096
Number of slots: 16
/dev/mapper/lv0 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
I attaced strace to it, to see what is going on.
Before it hangs I get:
write(1, "Pass 0a: Checking cluster alloca"..., 44Pass 0a: Checking
cluster allocation chains) = 44
mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2ad6a5001000
munmap(0x2ad6a5001000, 4198400) = 0
mmap(NULL, 4202496, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x2ad6a5001000
pread(3,
"INODE01\0;Q&\354\377\377\7\0\0\0\0\0\0\354\377\237\0\0\0\0\0\0\0\0"...,
4096, 45056) = 4096
mprotect(0x2ad640020000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0x2ad640021000, 4096, PROT_READ|PROT_WRITE) = 0
...
[a couple of hundred similar lines]
...
mprotect(0x2ad640803000, 4096, PROT_READ|PROT_WRITE) = 0
Then it hangs with 100% idle on one core.
Regards,
Matthias