thr3ads.net - Ocfs2 users - [Ocfs2-users] OCFS2 unmount problems after online resize [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Simon Hargrave

2011-Jul-25 12:26 UTC

[Ocfs2-users] OCFS2 unmount problems after online resize

Please read the warning at the end of this email
________________________________________________

Hi

I'm doing some experimentation with OCFS2 (1.4 on RHEL5) with a view to
using as a 2-node clustered filesystem.  I seem to be having issues with online
resize (which documentation suggests is supported under 1.4).  I'm creating
a LUN and publishing from a HP EVA6400 storage array to the 2 nodes, and
creating a filesystem which works fine.  However, it appears that if I
online-increase the size of the LUN and subsequently the filesystem, it hangs
indefinately on unmount.  Full transcript of issue is as below: -


/etc/ocfs2/cluster.conf (created via ocfs2console)
--------------------------------------------------
node:
        ip_port = 7777
        ip_address = 10.34.8.90
        number = 0
        name = ybsxlx45
        cluster = ocfs2

node:
        ip_port = 7777
        ip_address = 10.34.8.91
        number = 1
        name = ybsxlx46
        cluster = ocfs2

cluster:
        node_count = 2
        name = ocfs2



/etc/sysconfig/o2cb (created via ocfs2console)
----------------------------------------------
# O2CB_ENABLED: 'true' means to load the driver on boot.
O2CB_ENABLED=true

# O2CB_STACK: The name of the cluster stack backing O2CB.
O2CB_STACK=o2cb

# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2

# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD
# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered
dead.
O2CB_IDLE_TIMEOUT_MS
# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS
# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS


2GB LUN published to both nodes and appears as /dev/sdb
-------------------------------------------------------
# grep sdb /proc/partitions
   8    16    2097152 sdb



Operating System
----------------
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
Linux ybsxlx45 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64
x86_64 x86_64 GNU/Linux



OCFS2 Packages
--------------
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5



Create and exercise filesystem
------------------------------
# mkfs.ocfs2 -L "ocfstest" /dev/sdb
# mount -L ocfstest /ocfstest
# dd if=/dev/zero of=/ocfstest/file1 bs=1024k count=500 (on first node)
# dd if=/dev/zero of=/ocfstest/file2 bs=1024k count=500 (on second node)
# df -k /ocfstest
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb               2097152   1320836    776316  63% /ocfstest



Test unmount and remount
------------------------
# strace -f -o before.txt umount /ocfstest
# mount -L ocfstest /ocfstest



LUN resized to 3GB and rescan on each host
------------------------------------------
# echo "1" > /sys/block/sdb/device/rescan
# grep sdb /proc/partitions
   8    16    3145728 sdb

(new device size showing)



Online resize of filesystem
---------------------------
# df -k /ocfstest
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb               2097152   1312644    784508  63% /ocfstest

# tunefs.ocfs2 -S /dev/sdb

# df -k /ocfstest
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb               3145728   1312676   1833052  42% /ocfstest

(new filesystem size shows on both nodes)



Exercise filesystem
-------------------
# dd if=/dev/zero of=/ocfstest/file3 bs=1024k count=500 (on first node)
# dd if=/dev/zero of=/ocfstest/file4 bs=1024k count=500 (on second node)
# df -k /ocfstest
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb               3145728   2340772    804956  75% /ocfstest

(filesystem continues to function and can be filled past old size)



Unmount filesystem
------------------
# strace -f -o after.txt umount /ocfstest

At this point, the unmount hangs forever and only a reboot will clear it.

Comparing the "strace" output, the second one hangs during the call to
umount() system call, after having checked that umount.ocfs2 doesn't exist.

Whilst hung, the filesystem still "appears" in /etc/mtab and df
output, but it is not mounted according to the kernel (/proc/mounts).

Other node continues to function whilst in this state, filesystem does not hang.



So the question is, is this a bug, or am I doing something wrong?  The OCFS2 1.4
user guide does state: -

9. Online File system Resize
Users can now grow the file system without having to unmount it. This feature
requires a compatible clustered logical volume manager. Compatible volumes
managers will be announced when support is available.

However since I'm using the raw device, not LVM this should work, provided
the scsi device rescan has been performed on all nodes prior to running
tunefs.ocfs2?

I should finally point out that this is being performed on 2 VMware guests, but
the LUN is published directly to the guests as a Raw Device Mapping in Physical
Compatibility Mode (passthru), as per the various VMware whitepapers.  I
don't have 2 spare SAN-attached crash-and-burn hosts to test this out
physically, but I don't believe this should be a factor.

Any help appreciated as online resize is a must in a 24x7 clustered environment!

Thanks

-
Simon Hargrave szhargrave at
ybs.co.uk<blocked::blocked::blocked::mailto:szhargrave at ybs.co.uk>
Enterprise Systems Team Leader x2831
Yorkshire Building Society 01274 472831
http://wwwtech/sysint/tsgcore.asp<blocked::http://wwwtech/sysint/tsgcore.asp>


________________________________________________
This email and any attachments are confidential and may contain privileged
information.
If you are not the person for whom they are intended please return the email and
then delete all material from any computer. You must not use the email or
attachments for any purpose, nor disclose its contents to anyone other than the
intended recipient.
Any statements made by an individual in this email do not necessarily reflect
the views of the Yorkshire Building Society Group.
________________________________________________

Yorkshire Building Society, which is authorised and regulated by the Financial
Services Authority, chooses to introduce its customers to Legal & General
for the purposes of advising on and arranging life assurance and investment
products bearing Legal & General?s name.

We are entered in the FSA Register and our FSA registration number is 106085
http://www.fsa.gov.uk/register

Head Office: Yorkshire Building Society, Yorkshire House, Yorkshire Drive,
Bradford, BD5 8LJ
Tel: 0845 1 200 100

Visit Our Website
http://www.ybs.co.uk

All communications with us may be monitored/recorded to improve the quality of
our service and for your protection and security.



________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110725/526bea70/attachment-0001.html

Simon Hargrave

2011-Jul-25 12:50 UTC

head link

[Ocfs2-users] OCFS2 unmount problems after online resize

Further to this, I get the following in dmesg every 120 seconds after the
attempted unmount: -

INFO: task ocfs2_hb_ctl:3794 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
ocfs2_hb_ctl  D ffff810003db6420     0  3794   3793                     (NOTLB)
 ffff8100b9d05cf8 0000000000000086 00000000f000020a ffffffff8002d0ee
 0000000000000000 0000000000000007 ffff8100d801e820 ffffffff80310b60
 000000887c712d88 000000000000791a ffff8100d801ea08 0000000080009852
Call Trace:
 [<ffffffff8002d0ee>] wake_up_bit+0x11/0x22
 [<ffffffff8006466c>] __down_read+0x7a/0x92
 [<ffffffff800e68aa>] get_super+0x48/0x95
 [<ffffffff800e387b>] fsync_bdev+0xe/0x3b
 [<ffffffff8014a6f8>] invalidate_partition+0x28/0x40
 [<ffffffff8010d6e7>] rescan_partitions+0x37/0x279
 [<ffffffff800e78ec>] do_open+0x231/0x30f
 [<ffffffff800e7c1e>] blkdev_open+0x0/0x4f
 [<ffffffff800e7c41>] blkdev_open+0x23/0x4f
 [<ffffffff8001eab6>] __dentry_open+0xd9/0x1dc
 [<ffffffff8002751f>] do_filp_open+0x2a/0x38
 [<ffffffff8002ae16>] iput+0x4b/0x84
 [<ffffffff800dddf3>] alternate_node_alloc+0x70/0x8c
 [<ffffffff80019f7e>] do_sys_open+0x44/0xbe
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

-
Simon Hargrave szhargrave at
ybs.co.uk<blocked::blocked::blocked::mailto:szhargrave at ybs.co.uk>
Enterprise Systems Team Leader x2831
Yorkshire Building Society 01274 472831
http://wwwtech/sysint/tsgcore.asp<blocked::http://wwwtech/sysint/tsgcore.asp>


________________________________
From: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-bounces at
oss.oracle.com] On Behalf Of Simon Hargrave
Sent: 25 July 2011 13:26
To: ocfs2-users at oss.oracle.com
Subject: [Ocfs2-users] OCFS2 unmount problems after online resize


Please read the warning at the end of this email
________________________________________________



Hi

I'm doing some experimentation with OCFS2 (1.4 on RHEL5) with a view to
using as a 2-node clustered filesystem.  I seem to be having issues with online
resize (which documentation suggests is supported under 1.4).  I'm creating
a LUN and publishing from a HP EVA6400 storage array to the 2 nodes, and
creating a filesystem which works fine.  However, it appears that if I
online-increase the size of the LUN and subsequently the filesystem, it hangs
indefinately on unmount.  Full transcript of issue is as below: -


/etc/ocfs2/cluster.conf (created via ocfs2console)
--------------------------------------------------
node:
        ip_port = 7777
        ip_address = 10.34.8.90
        number = 0
        name = ybsxlx45
        cluster = ocfs2

node:
        ip_port = 7777
        ip_address = 10.34.8.91
        number = 1
        name = ybsxlx46
        cluster = ocfs2

cluster:
        node_count = 2
        name = ocfs2



/etc/sysconfig/o2cb (created via ocfs2console)
----------------------------------------------
# O2CB_ENABLED: 'true' means to load the driver on boot.
O2CB_ENABLED=true

# O2CB_STACK: The name of the cluster stack backing O2CB.
O2CB_STACK=o2cb

# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2

# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD
# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is considered
dead.
O2CB_IDLE_TIMEOUT_MS
# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS
# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS


2GB LUN published to both nodes and appears as /dev/sdb
-------------------------------------------------------
# grep sdb /proc/partitions
   8    16    2097152 sdb



Operating System
----------------
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
Linux ybsxlx45 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64
x86_64 x86_64 GNU/Linux



OCFS2 Packages
--------------
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5



Create and exercise filesystem
------------------------------
# mkfs.ocfs2 -L "ocfstest" /dev/sdb
# mount -L ocfstest /ocfstest
# dd if=/dev/zero of=/ocfstest/file1 bs=1024k count=500 (on first node)
# dd if=/dev/zero of=/ocfstest/file2 bs=1024k count=500 (on second node)
# df -k /ocfstest
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb               2097152   1320836    776316  63% /ocfstest



Test unmount and remount
------------------------
# strace -f -o before.txt umount /ocfstest
# mount -L ocfstest /ocfstest



LUN resized to 3GB and rescan on each host
------------------------------------------
# echo "1" > /sys/block/sdb/device/rescan
# grep sdb /proc/partitions
   8    16    3145728 sdb

(new device size showing)



Online resize of filesystem
---------------------------
# df -k /ocfstest
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb               2097152   1312644    784508  63% /ocfstest

# tunefs.ocfs2 -S /dev/sdb

# df -k /ocfstest
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb               3145728   1312676   1833052  42% /ocfstest

(new filesystem size shows on both nodes)



Exercise filesystem
-------------------
# dd if=/dev/zero of=/ocfstest/file3 bs=1024k count=500 (on first node)
# dd if=/dev/zero of=/ocfstest/file4 bs=1024k count=500 (on second node)
# df -k /ocfstest
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdb               3145728   2340772    804956  75% /ocfstest

(filesystem continues to function and can be filled past old size)



Unmount filesystem
------------------
# strace -f -o after.txt umount /ocfstest

At this point, the unmount hangs forever and only a reboot will clear it.

Comparing the "strace" output, the second one hangs during the call to
umount() system call, after having checked that umount.ocfs2 doesn't exist.

Whilst hung, the filesystem still "appears" in /etc/mtab and df
output, but it is not mounted according to the kernel (/proc/mounts).

Other node continues to function whilst in this state, filesystem does not hang.



So the question is, is this a bug, or am I doing something wrong?  The OCFS2 1.4
user guide does state: -

9. Online File system Resize
Users can now grow the file system without having to unmount it. This feature
requires a compatible clustered logical volume manager. Compatible volumes
managers will be announced when support is available.

However since I'm using the raw device, not LVM this should work, provided
the scsi device rescan has been performed on all nodes prior to running
tunefs.ocfs2?

I should finally point out that this is being performed on 2 VMware guests, but
the LUN is published directly to the guests as a Raw Device Mapping in Physical
Compatibility Mode (passthru), as per the various VMware whitepapers.  I
don't have 2 spare SAN-attached crash-and-burn hosts to test this out
physically, but I don't believe this should be a factor.

Any help appreciated as online resize is a must in a 24x7 clustered environment!

Thanks

-
Simon Hargrave szhargrave at
ybs.co.uk<blocked::blocked::blocked::mailto:szhargrave at ybs.co.uk>
Enterprise Systems Team Leader x2831
Yorkshire Building Society 01274 472831
http://wwwtech/sysint/tsgcore.asp<blocked::http://wwwtech/sysint/tsgcore.asp>


________________________________________________

This email and any attachments are confidential and may contain privileged
information.

If you are not the person for whom they are intended please return the email and
then delete all material from any computer. You must not use the email or
attachments for any purpose, nor disclose its contents to anyone other than the
intended recipient.

Any statements made by an individual in this email do not necessarily reflect
the views of the Yorkshire Building Society Group.

________________________________________________

Yorkshire Building Society, which is authorised and regulated by the Financial
Services Authority, chooses to introduce its customers to Legal & General
for the purposes of advising on and arranging life assurance and investment
products bearing Legal & General's name.

We are entered in the FSA Register and our FSA registration number is 106085
http://www.fsa.gov.uk/register

Head Office: Yorkshire Building Society, Yorkshire House, Yorkshire Drive,
Bradford, BD5 8LJ
Tel: 0845 1 200 100

Visit Our Website
http://www.ybs.co.uk

All communications with us may be monitored/recorded to improve the quality of
our service and for your protection and security.



________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110725/1090f0e3/attachment-0001.html

Ocfs2 users - Jul 2011 - OCFS2 unmount problems after online resize

[Ocfs2-users] OCFS2 unmount problems after online resize

[Ocfs2-users] OCFS2 unmount problems after online resize