thr3ads.net - Ocfs2 users - [Ocfs2-users] No space left on device in one node [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Alexander Barton

2010-Jan-26 15:35 UTC

[Ocfs2-users] No space left on device in one node

Hi!

We operate a 2-node cluster running OCFS2 on top of DRBD. It shows about 4.3 GB
free space on the OCFS2 filesystem using df on both nodes, but one node
can't even write 10 MB:

df (ouput identical on both the nodes)

  $ df -k /cluster
  Filesystem           1K-blocks      Used Available Use% Mounted on
  /dev/drbd0            83883484  80071096   3812388  96% /cluster

  $ df -i /cluster
  Filesystem             Inodes    IUsed   IFree IUse% Mounted on
  /dev/drbd0           20970871 20017778  953093   96% /cluster

dd test on CL1-N1 -- FAILING:

  $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10
  dd: writing `cl1-n1.tst': No space left on device
  1+0 records in
  0+0 records out
  1032192 bytes (1,0 MB) copied, 1,56907 s, 658 kB/s

same dd test on CL1-N2 -- OK:

  $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10
  10+0 records in
  10+0 records out
  10485760 bytes (10 MB) copied, 1,58164 s, 6,6 MB/s

We are running Debian Linux. The problems occurred while running linux kernel
2.6.26 and according to <http://www.mail-archive.com/ocfs2-users at
oss.oracle.com/msg03661.html> we hoped that it will be fixed using a newer
kernel.

Therefore we upgraded to Linux kernel 2.6.32 (using Debian package
linux-image-2.6.32-trunk-amd64_2.6.32-5_amd64.deb from sid), upgraded the
userland tools to ocfs2-tools 1.4.3-1 and ran fsck.ocfs -fy (that showed no
errors) ? but the problem still persists: one node can't write data while
the other one has no problems ...

  $ modinfo ocfs2
  filename:       /lib/modules/2.6.32-trunk-amd64/kernel/fs/ocfs2/ocfs2.ko
  license:        GPL
  author:         Oracle
  version:        1.5.0
  description:    OCFS2 1.5.0
  srcversion:     944B0B239B4DEBAF58A7FE1
  depends:        jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager
  vermagic:       2.6.32-trunk-amd64 SMP mod_unload modversions 

(isn't the 1.5.0 version number a little bit strange here??)

"fsck.ocfs2 -f" doesn't show any errors at all.
Neither are any (kernel) messages logged.

I think this is similar to bug #1167
(http://oss.oracle.com/bugzilla/show_bug.cgi?id=1167) so I updated the
information there as well and attached the output of the ?stat_sysdir.sh? script
running on the failing node.

Do you have any idea what goes wrong here?

Any workarounds?

Anything we can test to help debug this issue?

Thanks
Alex

Werner Flamme

2010-Jan-26 17:33 UTC

head link

[Ocfs2-users] No space left on device in one node

Alexander Barton [26.01.2010 16:35]:> Hi!
> 
> We operate a 2-node cluster running OCFS2 on top of DRBD. It shows about
4.3 GB free space on the OCFS2 filesystem using df on both nodes, but one node
can't even write 10 MB:
> 
> df (ouput identical on both the nodes)
> 
>   $ df -k /cluster
>   Filesystem           1K-blocks      Used Available Use% Mounted on
>   /dev/drbd0            83883484  80071096   3812388  96% /cluster
Well, often there is a 5% reservation for root on all filesystems. Do
you try to write the data as root or as user? Maybe root can write, but
user doesn't. The "$" prompt seems to hint to a user...

Just my 2c...

Regards,
Werner

Sunil Mushran

2010-Jan-26 18:12 UTC

head link

[Ocfs2-users] No space left on device in one node

You are running into bz#1189.

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1189

I'll be attaching a potential fix to that bugzilla soon.

In your case, you will be better off reducing the number of node slots
from 4 to 3. Or maybe even 2 as drbd supports max 2 nodes.

Alexander Barton wrote:> Hi!
>
> We operate a 2-node cluster running OCFS2 on top of DRBD. It shows about
4.3 GB free space on the OCFS2 filesystem using df on both nodes, but one node
can't even write 10 MB:
>
> df (ouput identical on both the nodes)
>
>   $ df -k /cluster
>   Filesystem           1K-blocks      Used Available Use% Mounted on
>   /dev/drbd0            83883484  80071096   3812388  96% /cluster
>
>   $ df -i /cluster
>   Filesystem             Inodes    IUsed   IFree IUse% Mounted on
>   /dev/drbd0           20970871 20017778  953093   96% /cluster
>
> dd test on CL1-N1 -- FAILING:
>
>   $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10
>   dd: writing `cl1-n1.tst': No space left on device
>   1+0 records in
>   0+0 records out
>   1032192 bytes (1,0 MB) copied, 1,56907 s, 658 kB/s
>
> same dd test on CL1-N2 -- OK:
>
>   $ dd if=/dev/zero of=`hostname`.tst bs=1M count=10
>   10+0 records in
>   10+0 records out
>   10485760 bytes (10 MB) copied, 1,58164 s, 6,6 MB/s
>
> We are running Debian Linux. The problems occurred while running linux
kernel 2.6.26 and according to <http://www.mail-archive.com/ocfs2-users at
oss.oracle.com/msg03661.html> we hoped that it will be fixed using a newer
kernel.
>
> Therefore we upgraded to Linux kernel 2.6.32 (using Debian package
linux-image-2.6.32-trunk-amd64_2.6.32-5_amd64.deb from sid), upgraded the
userland tools to ocfs2-tools 1.4.3-1 and ran fsck.ocfs -fy (that showed no
errors) ? but the problem still persists: one node can't write data while
the other one has no problems ...
>
>   $ modinfo ocfs2
>   filename:       /lib/modules/2.6.32-trunk-amd64/kernel/fs/ocfs2/ocfs2.ko
>   license:        GPL
>   author:         Oracle
>   version:        1.5.0
>   description:    OCFS2 1.5.0
>   srcversion:     944B0B239B4DEBAF58A7FE1
>   depends:        jbd2,ocfs2_stackglue,quota_tree,ocfs2_nodemanager
>   vermagic:       2.6.32-trunk-amd64 SMP mod_unload modversions 
>
> (isn't the 1.5.0 version number a little bit strange here??)
>
> "fsck.ocfs2 -f" doesn't show any errors at all.
> Neither are any (kernel) messages logged.
>
> I think this is similar to bug #1167
(http://oss.oracle.com/bugzilla/show_bug.cgi?id=1167) so I updated the
information there as well and attached the output of the ?stat_sysdir.sh? script
running on the failing node.
>
> Do you have any idea what goes wrong here?
>
> Any workarounds?
>
> Anything we can test to help debug this issue?
>
> Thanks
> Alex

Maybe Matching Threads

Search for more maybe matching threads

Ocfs2 users - Jan 2010 - No space left on device in one node

[Ocfs2-users] No space left on device in one node

[Ocfs2-users] No space left on device in one node

[Ocfs2-users] No space left on device in one node

Maybe Matching Threads