thr3ads.net - Gluster users - [Gluster-users] Problem with VM images when one node goes online (self-healing) on a 2 node replication gluster for VMware datastore [Oct 2011]

If this information is useful, please help other people find it:
Share via:

keith

2011-Oct-11 08:57 UTC

[Gluster-users] Problem with VM images when one node goes online (self-healing) on a 2 node replication gluster for VMware datastore

Hi all

I am testing gluster-3.2.4 on a 2 nodes storage with replication as our 
VMware datastore.

The setup is running replication on 2 nodes with ucarp and mount it on 
WMware using NFS to gluster as a datastore.
> Volume Name: GLVOL1
> Type: Replicate
> Status: Started
> Number of Bricks: 2
> Transport-type: tcp
> Bricks:
> Brick1: t4-01.store:/EXPORT/GLVOL1
> Brick2: t4-03.store:/EXPORT/GLVOL1
> Options Reconfigured:
> performance.cache-size: 4096MB
High-availability testing goes on smoothly without any problem or 
data-corruption, that is when any node is down, all VM guests runs 
normally without any problem.

The problem arises when I bring up the failed node and the node start 
doing self-healing.  All my VM guests get kernel error messages and 
finally the VM guests ended up with "EXT3-fs error: 
ext3_journal_start_sb: detected aborted journal" remount filesystem 
(root) as read-only.

Below are some of the VM guests kernel error generated when I bring up 
the failed gluster node for self-healing:
> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
> ffff8100221c90c0
> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
> ffff8100221c9240
> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
> ffff8100221c93c0
> Oct 11 15:58:34 testvm3 kernel: INFO: task kjournald:2081 blocked for 
> more than 120 seconds.
> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 11 15:58:34 testvm3 kernel: kjournald     D ffff810001736420     
> 0  2081     14          2494  2060 (L-TLB)
> Oct 11 15:58:34 testvm3 kernel: ffff81003c087cf0 0000000000000046 
> ffff810030ef2288 ffff81003f5d6048
> Oct 11 15:58:34 testvm3 kernel: 00000000037685c8 000000000000000a 
> ffff810037c53820 ffffffff80314b60
> Oct 11 15:58:34 testvm3 kernel: 00001883cb68d47d 0000000000002c4e 
> ffff810037c53a08 000000003f5128b8
> Oct 11 15:58:34 testvm3 kernel: Call Trace:
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>] 
> do_gettimeofday+0x40/0x90
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>]
sync_buffer+0x0/0x3f
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>]
io_schedule+0x3f/0x67
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>]
sync_buffer+0x3b/0x3f
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639fa>] 
> __wait_on_bit+0x40/0x6e
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>]
sync_buffer+0x0/0x3f
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063a94>] 
> out_of_line_wait_on_bit+0x6c/0x78
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>] 
> wake_bit_function+0x0/0x23
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88033a41>] 
> :jbd:journal_commit_transaction+0x553/0x10aa
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003d85b>] 
> lock_timer_base+0x1b/0x3c
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8004ad98>] 
> try_to_del_timer_sync+0x7f/0x88
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88037662>] 
> :jbd:kjournald+0xc1/0x213
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2dfd>] 
> autoremove_wake_function+0x0/0x2e
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
> keventd_create_kthread+0x0/0xc4
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff880375a1>] 
> :jbd:kjournald+0x0/0x213
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
> keventd_create_kthread+0x0/0xc4
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032722>]
kthread+0xfe/0x132
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfb1>]
child_rip+0xa/0x11
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
> keventd_create_kthread+0x0/0xc4
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032624>]
kthread+0x0/0x132
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfa7>]
child_rip+0x0/0x11
> Oct 11 15:58:34 testvm3 kernel:
> Oct 11 15:58:34 testvm3 kernel: INFO: task crond:3418 blocked for more 
> than 120 seconds.
> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 11 15:58:34 testvm3 kernel: crond         D ffff810001736420     
> 0  3418      1          3436  3405 (NOTLB)
> Oct 11 15:58:34 testvm3 kernel: ffff810036c55ca8 0000000000000086 
> 0000000000000000 ffffffff80019e3e
> Oct 11 15:58:34 testvm3 kernel: 0000000000065bf2 0000000000000007 
> ffff81003ce4b080 ffffffff80314b60
> Oct 11 15:58:34 testvm3 kernel: 000018899ae16270 0000000000023110 
> ffff81003ce4b268 000000008804ec00
> Oct 11 15:58:34 testvm3 kernel: Call Trace:
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>]
__getblk+0x25/0x22c
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>] 
> do_gettimeofday+0x40/0x90
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>]
sync_buffer+0x0/0x3f
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>]
io_schedule+0x3f/0x67
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>]
sync_buffer+0x3b/0x3f
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063912>] 
> __wait_on_bit_lock+0x36/0x66
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>]
sync_buffer+0x0/0x3f
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639ae>] 
> out_of_line_wait_on_bit_lock+0x6c/0x78
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>] 
> wake_bit_function+0x0/0x23
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8803181e>] 
> :jbd:do_get_write_access+0x54/0x522
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>]
__getblk+0x25/0x22c
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88031d0e>] 
> :jbd:journal_get_write_access+0x22/0x33
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804dd37>] 
> :ext3:ext3_reserve_inode_write+0x38/0x90
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804ddb0>] 
> :ext3:ext3_mark_inode_dirty+0x21/0x3c
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88050d35>] 
> :ext3:ext3_dirty_inode+0x63/0x7b
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80013d98>] 
> __mark_inode_dirty+0x29/0x16e
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80025a49>] filldir+0x0/0xb7
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003516b>]
vfs_readdir+0x8c/0xa9
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800389db>] 
> sys_getdents+0x75/0xbd
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d229>]
tracesys+0x71/0xe0
> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d28d>]
tracesys+0xd5/0xe0
> Oct 11 15:58:34 testvm3 kernel:
> Oct 11 15:58:34 testvm3 kernel: INFO: task httpd:3452 blocked for more 
> than 120 seconds.
> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 11 15:58:34 testvm3 kernel: httpd         D ffff810001736420     
> 0  3452   3405          3453       (NOTLB)
> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9dc8 0000000000000086 
> 0000000000000000 ffffffff80009a1c
> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9e28 0000000000000009 
> ffff810037e52080 ffffffff80314b60
> Oct 11 15:58:34 testvm3 kernel: 000018839f75405c 000000000003363d 
> ffff810037e52268 000000003f5e7150
Please note that although I am using ucarp for IP failover and by 
default ucarp will alway have a preferred master, I have added codes to 
make sure that the ucarp master will always become slave when it goes 
down and come up again.  This will ensure that WMware will not connect 
back to the failed node when it comes back up.

However this does not prevent the problem I describe above.

There are a lot of logs generated during self-healing process.  It 
doesn't make any sense to me.  I am attaching it. It's over 900k. So I 
zip them up.  Hopefully the mailling list allow attachment.

Is there any best practices to setup/run gluster with replication as a 
datastore to VMware that make sure VM guests run smoothly even when one 
node goes into self-healing?

Any advise is appreciated.

Keith



-------------- next part --------------
A non-text attachment was scrubbed...
Name: glusterfs-nfs-log.gz
Type: application/x-gzip
Size: 32187 bytes
Desc: not available
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20111011/bd8222ae/attachment.gz>

Peter Linder

2011-Oct-11 09:05 UTC

head link

[Gluster-users] Problem with VM images when one node goes online (self-healing) on a 2 node replication gluster for VMware datastore

With 3.2.4 during self-heal, no operation on the file being healed is 
allowed so your VM's will stall and time out if the self-heal isn't 
finished quick enough. gluster 3.3 will fix this, but I don't know when 
it will be released. There are betas to try out though :). Perhaps 
somebody else can say how stable 3.3-beta2 is compared to 3.2.4?

On 10/11/2011 10:57 AM, keith wrote:> Hi all
>
> I am testing gluster-3.2.4 on a 2 nodes storage with replication as 
> our VMware datastore.
>
> The setup is running replication on 2 nodes with ucarp and mount it on 
> WMware using NFS to gluster as a datastore.
>
>> Volume Name: GLVOL1
>> Type: Replicate
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: t4-01.store:/EXPORT/GLVOL1
>> Brick2: t4-03.store:/EXPORT/GLVOL1
>> Options Reconfigured:
>> performance.cache-size: 4096MB
>
> High-availability testing goes on smoothly without any problem or 
> data-corruption, that is when any node is down, all VM guests runs 
> normally without any problem.
>
> The problem arises when I bring up the failed node and the node start 
> doing self-healing.  All my VM guests get kernel error messages and 
> finally the VM guests ended up with "EXT3-fs error: 
> ext3_journal_start_sb: detected aborted journal" remount filesystem 
> (root) as read-only.
>
> Below are some of the VM guests kernel error generated when I bring up 
> the failed gluster node for self-healing:
>
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
>> ffff8100221c90c0
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
>> ffff8100221c9240
>> Oct 11 15:57:58 testvm3 kernel: pvscsi: task abort on host 1, 
>> ffff8100221c93c0
>> Oct 11 15:58:34 testvm3 kernel: INFO: task kjournald:2081 blocked for 
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: kjournald     D ffff810001736420     
>> 0  2081     14          2494  2060 (L-TLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff81003c087cf0 0000000000000046 
>> ffff810030ef2288 ffff81003f5d6048
>> Oct 11 15:58:34 testvm3 kernel: 00000000037685c8 000000000000000a 
>> ffff810037c53820 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 00001883cb68d47d 0000000000002c4e 
>> ffff810037c53a08 000000003f5128b8
>> Oct 11 15:58:34 testvm3 kernel: Call Trace:
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>] 
>> do_gettimeofday+0x40/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] 
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>] 
>> io_schedule+0x3f/0x67
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>] 
>> sync_buffer+0x3b/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639fa>] 
>> __wait_on_bit+0x40/0x6e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] 
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063a94>] 
>> out_of_line_wait_on_bit+0x6c/0x78
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>] 
>> wake_bit_function+0x0/0x23
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88033a41>] 
>> :jbd:journal_commit_transaction+0x553/0x10aa
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003d85b>] 
>> lock_timer_base+0x1b/0x3c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8004ad98>] 
>> try_to_del_timer_sync+0x7f/0x88
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88037662>] 
>> :jbd:kjournald+0xc1/0x213
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2dfd>] 
>> autoremove_wake_function+0x0/0x2e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff880375a1>] 
>> :jbd:kjournald+0x0/0x213
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032722>]
kthread+0xfe/0x132
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfb1>]
child_rip+0xa/0x11
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2be5>] 
>> keventd_create_kthread+0x0/0xc4
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80032624>]
kthread+0x0/0x132
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005dfa7>]
child_rip+0x0/0x11
>> Oct 11 15:58:34 testvm3 kernel:
>> Oct 11 15:58:34 testvm3 kernel: INFO: task crond:3418 blocked for 
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: crond         D ffff810001736420     
>> 0  3418      1          3436  3405 (NOTLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff810036c55ca8 0000000000000086 
>> 0000000000000000 ffffffff80019e3e
>> Oct 11 15:58:34 testvm3 kernel: 0000000000065bf2 0000000000000007 
>> ffff81003ce4b080 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 000018899ae16270 0000000000023110 
>> ffff81003ce4b268 000000008804ec00
>> Oct 11 15:58:34 testvm3 kernel: Call Trace:
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>]
__getblk+0x25/0x22c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8006ec8f>] 
>> do_gettimeofday+0x40/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] 
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800637ce>] 
>> io_schedule+0x3f/0x67
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8001560e>] 
>> sync_buffer+0x3b/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80063912>] 
>> __wait_on_bit_lock+0x36/0x66
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800155d3>] 
>> sync_buffer+0x0/0x3f
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800639ae>] 
>> out_of_line_wait_on_bit_lock+0x6c/0x78
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800a2e2b>] 
>> wake_bit_function+0x0/0x23
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8803181e>] 
>> :jbd:do_get_write_access+0x54/0x522
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80019e3e>]
__getblk+0x25/0x22c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88031d0e>] 
>> :jbd:journal_get_write_access+0x22/0x33
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804dd37>] 
>> :ext3:ext3_reserve_inode_write+0x38/0x90
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8804ddb0>] 
>> :ext3:ext3_mark_inode_dirty+0x21/0x3c
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff88050d35>] 
>> :ext3:ext3_dirty_inode+0x63/0x7b
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80013d98>] 
>> __mark_inode_dirty+0x29/0x16e
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff80025a49>]
filldir+0x0/0xb7
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8003516b>] 
>> vfs_readdir+0x8c/0xa9
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff800389db>] 
>> sys_getdents+0x75/0xbd
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d229>]
tracesys+0x71/0xe0
>> Oct 11 15:58:34 testvm3 kernel: [<ffffffff8005d28d>]
tracesys+0xd5/0xe0
>> Oct 11 15:58:34 testvm3 kernel:
>> Oct 11 15:58:34 testvm3 kernel: INFO: task httpd:3452 blocked for 
>> more than 120 seconds.
>> Oct 11 15:58:34 testvm3 kernel: "echo 0 > 
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Oct 11 15:58:34 testvm3 kernel: httpd         D ffff810001736420     
>> 0  3452   3405          3453       (NOTLB)
>> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9dc8 0000000000000086 
>> 0000000000000000 ffffffff80009a1c
>> Oct 11 15:58:34 testvm3 kernel: ffff810035ea9e28 0000000000000009 
>> ffff810037e52080 ffffffff80314b60
>> Oct 11 15:58:34 testvm3 kernel: 000018839f75405c 000000000003363d 
>> ffff810037e52268 000000003f5e7150
>
> Please note that although I am using ucarp for IP failover and by 
> default ucarp will alway have a preferred master, I have added codes 
> to make sure that the ucarp master will always become slave when it 
> goes down and come up again.  This will ensure that WMware will not 
> connect back to the failed node when it comes back up.
>
> However this does not prevent the problem I describe above.
>
> There are a lot of logs generated during self-healing process.  It 
> doesn't make any sense to me.  I am attaching it. It's over 900k.
So I
> zip them up.  Hopefully the mailling list allow attachment.
>
> Is there any best practices to setup/run gluster with replication as a 
> datastore to VMware that make sure VM guests run smoothly even when 
> one node goes into self-healing?
>
> Any advise is appreciated.
>
> Keith
>
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20111011/ba49610c/attachment.html>

Gluster users - Oct 2011 - Problem with VM images when one node goes online (self-healing) on a 2 node replication gluster for VMware datastore

[Gluster-users] Problem with VM images when one node goes online (self-healing) on a 2 node replication gluster for VMware datastore

[Gluster-users] Problem with VM images when one node goes online (self-healing) on a 2 node replication gluster for VMware datastore