Hi! We operate a 2-node cluster running OCFS2 on top of DRBD. It shows about 4.3 GB free space on the OCFS2 filesystem using df on both nodes, but one node can't even write 10 MB: df (ouput identical on both the nodes) $ df -k /cluster Filesystem 1K-blocks Used Available Use% Mounted on /dev/drbd0 83883484 80071096 3812388 96% /cluster $ df -i /cluster Filesystem Inodes IUsed IFree IUse% Mounted on /dev/drbd0 20970871 20017778 953093 96% /cluster dd test on CL1-N1 -- FAILING: $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10 dd: writing `cl1-n1.tst': No space left on device 1+0 records in 0+0 records out 1032192 bytes (1,0 MB) copied, 1,56907 s, 658 kB/s same dd test on CL1-N2 -- OK: $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB) copied, 1,58164 s, 6,6 MB/s We are running Debian Linux. The problems occurred while running linux kernel 2.6.26 and according to <http://www.mail-archive.com/ocfs2-users at oss.oracle.com/msg03661.html> we hoped that it will be fixed using a newer kernel. Therefore we upgraded to Linux kernel 2.6.32 (using Debian package linux-image-2.6.32-trunk-amd64_2.6.32-5_amd64.deb from sid), upgraded the userland tools to ocfs2-tools 1.4.3-1 and ran fsck.ocfs -fy (that showed no errors) ? but the problem still persists: one node can't write data while the other one has no problems ... $ modinfo ocfs2 filename: /lib/modules/2.6.32-trunk-amd64/kernel/fs/ocfs2/ocfs2.ko license: GPL author: Oracle version: 1.5.0 description: OCFS2 1.5.0 srcversion: 944B0B239B4DEBAF58A7FE1 depends: jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager vermagic: 2.6.32-trunk-amd64 SMP mod_unload modversions (isn't the 1.5.0 version number a little bit strange here??) "fsck.ocfs2 -f" doesn't show any errors at all. Neither are any (kernel) messages logged. I think this is similar to bug #1167 (http://oss.oracle.com/bugzilla/show_bug.cgi?id=1167) so I updated the information there as well and attached the output of the ?stat_sysdir.sh? script running on the failing node. Do you have any idea what goes wrong here? Any workarounds? Anything we can test to help debug this issue? Thanks Alex
Alexander Barton [26.01.2010 16:35]:> Hi! > > We operate a 2-node cluster running OCFS2 on top of DRBD. It shows about 4.3 GB free space on the OCFS2 filesystem using df on both nodes, but one node can't even write 10 MB: > > df (ouput identical on both the nodes) > > $ df -k /cluster > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/drbd0 83883484 80071096 3812388 96% /clusterWell, often there is a 5% reservation for root on all filesystems. Do you try to write the data as root or as user? Maybe root can write, but user doesn't. The "$" prompt seems to hint to a user... Just my 2c... Regards, Werner
You are running into bz#1189. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1189 I'll be attaching a potential fix to that bugzilla soon. In your case, you will be better off reducing the number of node slots from 4 to 3. Or maybe even 2 as drbd supports max 2 nodes. Alexander Barton wrote:> Hi! > > We operate a 2-node cluster running OCFS2 on top of DRBD. It shows about 4.3 GB free space on the OCFS2 filesystem using df on both nodes, but one node can't even write 10 MB: > > df (ouput identical on both the nodes) > > $ df -k /cluster > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/drbd0 83883484 80071096 3812388 96% /cluster > > $ df -i /cluster > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/drbd0 20970871 20017778 953093 96% /cluster > > dd test on CL1-N1 -- FAILING: > > $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10 > dd: writing `cl1-n1.tst': No space left on device > 1+0 records in > 0+0 records out > 1032192 bytes (1,0 MB) copied, 1,56907 s, 658 kB/s > > same dd test on CL1-N2 -- OK: > > $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10 > 10+0 records in > 10+0 records out > 10485760 bytes (10 MB) copied, 1,58164 s, 6,6 MB/s > > We are running Debian Linux. The problems occurred while running linux kernel 2.6.26 and according to <http://www.mail-archive.com/ocfs2-users at oss.oracle.com/msg03661.html> we hoped that it will be fixed using a newer kernel. > > Therefore we upgraded to Linux kernel 2.6.32 (using Debian package linux-image-2.6.32-trunk-amd64_2.6.32-5_amd64.deb from sid), upgraded the userland tools to ocfs2-tools 1.4.3-1 and ran fsck.ocfs -fy (that showed no errors) ? but the problem still persists: one node can't write data while the other one has no problems ... > > $ modinfo ocfs2 > filename: /lib/modules/2.6.32-trunk-amd64/kernel/fs/ocfs2/ocfs2.ko > license: GPL > author: Oracle > version: 1.5.0 > description: OCFS2 1.5.0 > srcversion: 944B0B239B4DEBAF58A7FE1 > depends: jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager > vermagic: 2.6.32-trunk-amd64 SMP mod_unload modversions > > (isn't the 1.5.0 version number a little bit strange here??) > > "fsck.ocfs2 -f" doesn't show any errors at all. > Neither are any (kernel) messages logged. > > I think this is similar to bug #1167 (http://oss.oracle.com/bugzilla/show_bug.cgi?id=1167) so I updated the information there as well and attached the output of the ?stat_sysdir.sh? script running on the failing node. > > Do you have any idea what goes wrong here? > > Any workarounds? > > Anything we can test to help debug this issue? > > Thanks > Alex