Hello! After upgrade to version 2.0, now using 2.0.1, I'm experiencing problems with glusterfs stability. I'm running 2 node setup with cliet side afr, and glusterfsd also is running on same servers. Time to time glusterfs just hangs, i can reproduce this running iozone benchmarking tool. I'm using patched Fuse, but same result is with unpatched. ===============================================================================Version : glusterfs 2.0.1 built on May 27 2009 16:04:01 TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b Starting Time: 2009-05-27 16:38:20 Command line : /usr/sbin/glusterfsd --volfile=/etc/glusterfs/glusterfs-server.vol --pid-file=/var/run/glusterfsd.pid --log-file=/var/log/glusterfsd.log PID : 31971 System name : Linux Nodename : weeber.st-inst.lv Kernel Release : 2.6.28-hardened-r7 Hardware Identifier: i686 Given volfile: +------------------------------------------------------------------------------+ 1: # file: /etc/glusterfs/glusterfs-server.vol 2: volume posix 3: type storage/posix 4: option directory /home/export 5: end-volume 6: 7: volume locks 8: type features/locks 9: option mandatory-locks on 10: subvolumes posix 11: end-volume 12: 13: volume brick 14: type performance/io-threads 15: option autoscaling on 16: subvolumes locks 17: end-volume 18: 19: volume server 20: type protocol/server 21: option transport-type tcp 22: option auth.addr.brick.allow 127.0.0.1,192.168.1.* 23: subvolumes brick 24: end-volume +------------------------------------------------------------------------------+ [2009-05-27 16:38:20] N [glusterfsd.c:1152:main] glusterfs: Successfully started [2009-05-27 16:38:33] N [server-protocol.c:7035:mop_setvolume] server: accepted client from 192.168.1.233:1021 [2009-05-27 16:38:33] N [server-protocol.c:7035:mop_setvolume] server: accepted client from 192.168.1.233:1020 [2009-05-27 16:38:46] N [server-protocol.c:7035:mop_setvolume] server: accepted client from 192.168.1.252:1021 [2009-05-27 16:38:46] N [server-protocol.c:7035:mop_setvolume] server: accepted client from 192.168.1.252:1020 ===============================================================================Version : glusterfs 2.0.1 built on May 27 2009 16:04:01 TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b Starting Time: 2009-05-27 16:38:46 Command line : /usr/sbin/glusterfs -N -f /etc/glusterfs/glusterfs-client.vol /mnt/gluster PID : 32161 System name : Linux Nodename : weeber.st-inst.lv Kernel Release : 2.6.28-hardened-r7 Hardware Identifier: i686 Given volfile: +------------------------------------------------------------------------------+ 1: volume xeon 2: type protocol/client 3: option transport-type tcp 4: option remote-host 192.168.1.233 5: option remote-subvolume brick 6: end-volume 7: 8: volume weeber 9: type protocol/client 10: option transport-type tcp 11: option remote-host 192.168.1.252 12: option remote-subvolume brick 13: end-volume 14: 15: volume replicate 16: type cluster/replicate 17: subvolumes xeon weeber 18: end-volume 20: volume readahead 21: type performance/read-ahead 22: option page-size 128kB 23: option page-count 16 24: option force-atime-update off 25: subvolumes replicate 26: end-volume 27: 28: volume writebehind 29: type performance/write-behind 30: option aggregate-size 1MB 31: option window-size 3MB 32: option flush-behind on 33: option enable-O_SYNC on 34: subvolumes readahead 35: end-volume 36: 37: volume iothreads 38: type performance/io-threads 39: option autoscaling on 40: subvolumes writebehind 41: end-volume 42: 43: 44: 45: #volume bricks 46: #type cluster/distribute 47: #option lookup-unhashed yes 48: #option min-free-disk 20% 49: # subvolumes weeber xeon 50: #end-volume +------------------------------------------------------------------------------+ [2009-05-27 16:38:46] W [xlator.c:555:validate_xlator_volume_options] writebehind: option 'window-size' is deprecated, preferred is 'cache-size', continuing with correction [2009-05-27 16:38:46] W [glusterfsd.c:455:_log_if_option_is_invalid] writebehind: option 'aggregate-size' is not recognized [2009-05-27 16:38:46] W [glusterfsd.c:455:_log_if_option_is_invalid] readahead: option 'page-size' is not recognized [2009-05-27 16:38:46] N [glusterfsd.c:1152:main] glusterfs: Successfully started [2009-05-27 16:38:46] N [client-protocol.c:5557:client_setvolume_cbk] xeon: Connected to 192.168.1.233:6996, attached to remote volume 'brick'. [2009-05-27 16:38:46] N [afr.c:2190:notify] replicate: Subvolume 'xeon' came back up; going online. [2009-05-27 16:38:46] N [client-protocol.c:5557:client_setvolume_cbk] xeon: Connected to 192.168.1.233:6996, attached to remote volume 'brick'. [2009-05-27 16:38:46] N [afr.c:2190:notify] replicate: Subvolume 'xeon' came back up; going online. [2009-05-27 16:38:46] N [client-protocol.c:5557:client_setvolume_cbk] weeber: Connected to 192.168.1.252:6996, attached to remote volume 'brick'. [2009-05-27 18:46:02] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-27 18:16:01. frame-timeout = 1800 [2009-05-27 19:16:09] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-27 18:46:02. frame-timeout = 1800 [2009-05-27 19:46:18] E [client-protocol.c:292:call_bail] weeber: bailing out frame OPEN(12) frame sent = 2009-05-27 19:16:09. frame-timeout = 1800 [2009-05-27 20:16:25] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-27 19:46:18. frame-timeout = 1800 [2009-05-27 20:46:34] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-27 20:16:25. frame-timeout = 1800 [2009-05-27 21:16:41] E [client-protocol.c:292:call_bail] weeber: bailing out frame OPEN(12) frame sent = 2009-05-27 20:46:34. frame-timeout = 1800 [2009-05-27 21:47:00] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-27 21:16:53. frame-timeout = 1800 [2009-05-27 22:17:07] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-27 21:47:00. frame-timeout = 1800 [2009-05-27 22:47:15] E [client-protocol.c:292:call_bail] weeber: bailing out frame OPENDIR(21) frame sent = 2009-05-27 22:17:07. frame-timeout = 1800 [2009-05-27 23:17:23] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-27 22:47:15. frame-timeout = 1800 [2009-05-27 23:47:31] E [client-protocol.c:292:call_bail] weeber: bailing out frame OPEN(12) frame sent = 2009-05-27 23:17:23. frame-timeout = 1800 [2009-05-28 00:17:39] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-27 23:47:32. frame-timeout = 1800 [2009-05-28 00:47:47] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 00:17:39. frame-timeout = 1800 [2009-05-28 01:17:55] E [client-protocol.c:292:call_bail] weeber: bailing out frame OPENDIR(21) frame sent = 2009-05-28 00:47:47. frame-timeout = 1800 [2009-05-28 01:48:03] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 01:17:55. frame-timeout = 1800 [2009-05-28 02:18:11] E [client-protocol.c:292:call_bail] weeber: bailing out frame OPEN(12) frame sent = 2009-05-28 01:48:03. frame-timeout = 1800 [2009-05-28 02:48:29] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 02:18:24. frame-timeout = 1800 [2009-05-28 03:18:37] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 02:48:29. frame-timeout = 1800 [2009-05-28 03:48:45] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 03:18:37. frame-timeout = 1800 [2009-05-28 04:18:53] E [client-protocol.c:292:call_bail] weeber: bailing out frame XATTROP(40) frame sent = 2009-05-28 03:48:45. frame-timeout = 1800 [2009-05-28 04:49:01] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 04:18:53. frame-timeout = 1800 [2009-05-28 05:19:09] E [client-protocol.c:292:call_bail] weeber: bailing out frame OPENDIR(21) frame sent = 2009-05-28 04:49:01. frame-timeout = 1800 [2009-05-28 05:49:17] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 05:19:09. frame-timeout = 1800 [2009-05-28 06:19:25] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 05:49:17. frame-timeout = 1800 [2009-05-28 06:49:33] E [client-protocol.c:292:call_bail] weeber: bailing out frame XATTROP(40) frame sent = 2009-05-28 06:19:25. frame-timeout = 1800 [2009-05-28 07:19:40] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 06:49:33. frame-timeout = 1800 [2009-05-28 07:49:48] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 07:19:40. frame-timeout = 1800 [2009-05-28 08:19:56] E [client-protocol.c:292:call_bail] weeber: bailing out frame LOOKUP(32) frame sent = 2009-05-28 07:49:48. frame-timeout = 1800 -------------- next part -------------- A non-text attachment was scrubbed... Name: maris.vcf Type: text/x-vcard Size: 206 bytes Desc: not available URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090528/b863fae2/attachment.vcf>
I have same issue with same config when both nodes are x64. But difference is that, there is no bailout messages in logs. Jasper van Wanrooy - Chatventure wrote:> Hi Maris, > > I regret to hear that. I was also having problems with the stability > on 32bit platforms. Possibly you should try it on a 64bit platform. Is > that an option? > > Best Regards Jasper > > > On 28 mei 2009, at 09:36, Maris Ruskulis wrote: > >> Hello! >> After upgrade to version 2.0, now using 2.0.1, I'm experiencing >> problems with glusterfs stability. >> I'm running 2 node setup with cliet side afr, and glusterfsd also is >> running on same servers. Time to time glusterfs just hangs, i can >> reproduce this running iozone benchmarking tool. I'm using patched >> Fuse, but same result is with unpatched. >> >> ===============================================================================>> Version : glusterfs 2.0.1 built on May 27 2009 16:04:01 >> TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b >> Starting Time: 2009-05-27 16:38:20 >> Command line : /usr/sbin/glusterfsd >> --volfile=/etc/glusterfs/glusterfs-server.vol >> --pid-file=/var/run/glusterfsd.pid --log-file=/var/log/glusterfsd.log >> PID : 31971 >> System name : Linux >> Nodename : weeber.st-inst.lv >> Kernel Release : 2.6.28-hardened-r7 >> Hardware Identifier: i686 >> >> Given volfile: >> +------------------------------------------------------------------------------+ >> 1: # file: /etc/glusterfs/glusterfs-server.vol >> 2: volume posix >> 3: type storage/posix >> 4: option directory /home/export >> 5: end-volume >> 6: >> 7: volume locks >> 8: type features/locks >> 9: option mandatory-locks on >> 10: subvolumes posix >> 11: end-volume >> 12: >> 13: volume brick >> 14: type performance/io-threads >> 15: option autoscaling on >> 16: subvolumes locks >> 17: end-volume >> 18: >> 19: volume server >> 20: type protocol/server >> 21: option transport-type tcp >> 22: option auth.addr.brick.allow 127.0.0.1,192.168.1.* >> 23: subvolumes brick >> 24: end-volume >> >> +------------------------------------------------------------------------------+ >> [2009-05-27 16:38:20] N [glusterfsd.c:1152:main] glusterfs: >> Successfully started >> [2009-05-27 16:38:33] N [server-protocol.c:7035:mop_setvolume] >> server: accepted client from 192.168.1.233:1021 >> [2009-05-27 16:38:33] N [server-protocol.c:7035:mop_setvolume] >> server: accepted client from 192.168.1.233:1020 >> [2009-05-27 16:38:46] N [server-protocol.c:7035:mop_setvolume] >> server: accepted client from 192.168.1.252:1021 >> [2009-05-27 16:38:46] N [server-protocol.c:7035:mop_setvolume] >> server: accepted client from 192.168.1.252:1020 >> >> ===============================================================================>> Version : glusterfs 2.0.1 built on May 27 2009 16:04:01 >> TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b >> Starting Time: 2009-05-27 16:38:46 >> Command line : /usr/sbin/glusterfs -N -f >> /etc/glusterfs/glusterfs-client.vol /mnt/gluster >> PID : 32161 >> System name : Linux >> Nodename : weeber.st-inst.lv >> Kernel Release : 2.6.28-hardened-r7 >> Hardware Identifier: i686 >> >> Given volfile: >> +------------------------------------------------------------------------------+ >> 1: volume xeon >> 2: type protocol/client >> 3: option transport-type tcp >> 4: option remote-host 192.168.1.233 >> 5: option remote-subvolume brick >> 6: end-volume >> 7: >> 8: volume weeber >> 9: type protocol/client >> 10: option transport-type tcp >> 11: option remote-host 192.168.1.252 >> 12: option remote-subvolume brick >> 13: end-volume >> 14: >> 15: volume replicate >> 16: type cluster/replicate >> 17: subvolumes xeon weeber >> 18: end-volume >> 20: volume readahead >> 21: type performance/read-ahead >> 22: option page-size 128kB >> 23: option page-count 16 >> 24: option force-atime-update off >> 25: subvolumes replicate >> 26: end-volume >> 27: >> 28: volume writebehind >> 29: type performance/write-behind >> 30: option aggregate-size 1MB >> 31: option window-size 3MB >> 32: option flush-behind on >> 33: option enable-O_SYNC on >> 34: subvolumes readahead >> 35: end-volume >> 36: >> 37: volume iothreads >> 38: type performance/io-threads >> 39: option autoscaling on >> 40: subvolumes writebehind >> 41: end-volume >> 42: >> 43: >> 44: >> 45: #volume bricks >> 46: #type cluster/distribute >> 47: #option lookup-unhashed yes >> 48: #option min-free-disk 20% >> 49: # subvolumes weeber xeon >> 50: #end-volume >> >> +------------------------------------------------------------------------------+ >> [2009-05-27 16:38:46] W [xlator.c:555:validate_xlator_volume_options] >> writebehind: option 'window-size' is deprecated, preferred is >> 'cache-size', continuing with correction >> [2009-05-27 16:38:46] W [glusterfsd.c:455:_log_if_option_is_invalid] >> writebehind: option 'aggregate-size' is not recognized >> [2009-05-27 16:38:46] W [glusterfsd.c:455:_log_if_option_is_invalid] >> readahead: option 'page-size' is not recognized >> [2009-05-27 16:38:46] N [glusterfsd.c:1152:main] glusterfs: >> Successfully started >> [2009-05-27 16:38:46] N [client-protocol.c:5557:client_setvolume_cbk] >> xeon: Connected to 192.168.1.233:6996, attached to remote volume 'brick'. >> [2009-05-27 16:38:46] N [afr.c:2190:notify] replicate: Subvolume >> 'xeon' came back up; going online. >> [2009-05-27 16:38:46] N [client-protocol.c:5557:client_setvolume_cbk] >> xeon: Connected to 192.168.1.233:6996, attached to remote volume 'brick'. >> [2009-05-27 16:38:46] N [afr.c:2190:notify] replicate: Subvolume >> 'xeon' came back up; going online. >> [2009-05-27 16:38:46] N [client-protocol.c:5557:client_setvolume_cbk] >> weeber: Connected to 192.168.1.252:6996, attached to remote volume >> 'brick'. >> [2009-05-27 18:46:02] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-27 18:16:01. >> frame-timeout = 1800 >> [2009-05-27 19:16:09] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-27 18:46:02. >> frame-timeout = 1800 >> [2009-05-27 19:46:18] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame OPEN(12) frame sent = 2009-05-27 19:16:09. >> frame-timeout = 1800 >> [2009-05-27 20:16:25] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-27 19:46:18. >> frame-timeout = 1800 >> [2009-05-27 20:46:34] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-27 20:16:25. >> frame-timeout = 1800 >> [2009-05-27 21:16:41] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame OPEN(12) frame sent = 2009-05-27 20:46:34. >> frame-timeout = 1800 >> [2009-05-27 21:47:00] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-27 21:16:53. >> frame-timeout = 1800 >> [2009-05-27 22:17:07] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-27 21:47:00. >> frame-timeout = 1800 >> [2009-05-27 22:47:15] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame OPENDIR(21) frame sent = 2009-05-27 22:17:07. >> frame-timeout = 1800 >> [2009-05-27 23:17:23] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-27 22:47:15. >> frame-timeout = 1800 >> [2009-05-27 23:47:31] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame OPEN(12) frame sent = 2009-05-27 23:17:23. >> frame-timeout = 1800 >> [2009-05-28 00:17:39] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-27 23:47:32. >> frame-timeout = 1800 >> [2009-05-28 00:47:47] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 00:17:39. >> frame-timeout = 1800 >> [2009-05-28 01:17:55] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame OPENDIR(21) frame sent = 2009-05-28 00:47:47. >> frame-timeout = 1800 >> [2009-05-28 01:48:03] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 01:17:55. >> frame-timeout = 1800 >> [2009-05-28 02:18:11] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame OPEN(12) frame sent = 2009-05-28 01:48:03. >> frame-timeout = 1800 >> [2009-05-28 02:48:29] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 02:18:24. >> frame-timeout = 1800 >> [2009-05-28 03:18:37] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 02:48:29. >> frame-timeout = 1800 >> [2009-05-28 03:48:45] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 03:18:37. >> frame-timeout = 1800 >> [2009-05-28 04:18:53] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame XATTROP(40) frame sent = 2009-05-28 03:48:45. >> frame-timeout = 1800 >> [2009-05-28 04:49:01] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 04:18:53. >> frame-timeout = 1800 >> [2009-05-28 05:19:09] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame OPENDIR(21) frame sent = 2009-05-28 04:49:01. >> frame-timeout = 1800 >> [2009-05-28 05:49:17] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 05:19:09. >> frame-timeout = 1800 >> [2009-05-28 06:19:25] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 05:49:17. >> frame-timeout = 1800 >> [2009-05-28 06:49:33] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame XATTROP(40) frame sent = 2009-05-28 06:19:25. >> frame-timeout = 1800 >> [2009-05-28 07:19:40] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 06:49:33. >> frame-timeout = 1800 >> [2009-05-28 07:49:48] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 07:19:40. >> frame-timeout = 1800 >> [2009-05-28 08:19:56] E [client-protocol.c:292:call_bail] weeber: >> bailing out frame LOOKUP(32) frame sent = 2009-05-28 07:49:48. >> frame-timeout = 1800 >> >> <maris.vcf>_______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> >> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090528/908e2ebb/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: maris.vcf Type: text/x-vcard Size: 206 bytes Desc: not available URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090528/908e2ebb/attachment.vcf>
Thank You, for reply! As You can see from config, ping-timeout is not set - default is asumed. Now started glusterfs with 8 threads on both server and client (autoscaling switched off). Hardware: *server1:* lspci 00:00.0 Host bridge: Intel Corporation E7505 Memory Controller Hub (rev 03) 00:00.1 Class ff00: Intel Corporation E7505/E7205 Series RAS Controller (rev 03) 00:01.0 PCI bridge: Intel Corporation E7505/E7205 PCI-to-AGP Bridge (rev 03) 00:02.0 PCI bridge: Intel Corporation E7505 Hub Interface B PCI-to-PCI Bridge (rev 03) 00:02.1 Class ff00: Intel Corporation E7505 Hub Interface B PCI-to-PCI Bridge RAS Controller (rev 03) 00:1d.0 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) USB UHCI Controller #3 (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 82) 00:1f.0 ISA bridge: Intel Corporation 82801DB/DBL (ICH4/ICH4-L) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801DB (ICH4) IDE Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 02) 02:1c.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 02:1d.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 02:1e.0 PIC: Intel Corporation 82870P2 P64H2 I/OxAPIC (rev 04) 02:1f.0 PCI bridge: Intel Corporation 82870P2 P64H2 Hub PCI Bridge (rev 04) 03:01.0 RAID bus controller: Intel Corporation RAID Controller 04:02.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02) 05:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) 05:03.0 Ethernet controller: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 (rev 0d) cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.40GHz stepping : 5 cpu MHz : 2392.024 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr bogomips : 4784.04 clflush size : 64 power management: processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Xeon(TM) CPU 2.40GHz stepping : 5 cpu MHz : 2392.024 cache size : 512 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pebs bts cid xtpr bogomips : 4784.16 clflush size : 64 power management: *server2:* lspci 00:00.0 Host bridge: Intel Corporation E7520 Memory Controller Hub (rev 0c) 00:00.1 Class ff00: Intel Corporation E7525/E7520 Error Reporting Registers (rev 0c) 00:01.0 System peripheral: Intel Corporation E7520 DMA Controller (rev 0c) 00:02.0 PCI bridge: Intel Corporation E7525/E7520/E7320 PCI Express Port A (rev 0c) 00:04.0 PCI bridge: Intel Corporation E7525/E7520 PCI Express Port B (rev 0c) 00:05.0 PCI bridge: Intel Corporation E7520 PCI Express Port B1 (rev 0c) 00:06.0 PCI bridge: Intel Corporation E7520 PCI Express Port C (rev 0c) 00:1d.0 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #1 (rev 02) 00:1d.1 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #2 (rev 02) 00:1d.2 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB UHCI Controller #3 (rev 02) 00:1d.7 USB Controller: Intel Corporation 82801EB/ER (ICH5/ICH5R) USB2 EHCI Controller (rev 02) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev c2) 00:1f.0 ISA bridge: Intel Corporation 82801EB/ER (ICH5/ICH5R) LPC Interface Bridge (rev 02) 00:1f.1 IDE interface: Intel Corporation 82801EB/ER (ICH5/ICH5R) IDE Controller (rev 02) 00:1f.3 SMBus: Intel Corporation 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02) 01:00.0 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge A (rev 09) 01:00.1 PIC: Intel Corporation 6700/6702PXH I/OxAPIC Interrupt Controller A (rev 09) 01:00.2 PCI bridge: Intel Corporation 6700PXH PCI Express-to-PCI Bridge B (rev 09) 01:00.3 PIC: Intel Corporation 6700PXH I/OxAPIC Interrupt Controller B (rev 09) 02:03.0 SCSI storage controller: LSI Logic / Symbios Logic SAS1068 PCI-X Fusion-MPT SAS (rev 01) 02:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) 02:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) 03:01.0 I2O: LSI Logic / Symbios Logic MegaRAID (rev 01) 05:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8050 PCI-E ASF Gigabit Ethernet Controller (rev 18) 07:04.0 Ethernet controller: Intel Corporation 82541GI Gigabit Ethernet Controller (rev 05) 07:0c.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27) cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 1 cpu MHz : 2792.955 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr bogomips : 5590.46 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 1 cpu MHz : 2792.955 cache size : 1024 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr bogomips : 5586.06 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 1 cpu MHz : 2792.955 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr bogomips : 5586.02 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: processor : 3 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 1 cpu MHz : 2792.955 cache size : 1024 KB physical id : 3 siblings : 2 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr bogomips : 5586.05 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: jvanwanrooy at chatventure.nl wrote:> Hi Maris, > > Can you tell me something more about the hardware you use? With our > tests yesterday we had some troubles with very high load inconjunction > with autoscaling. You can try a fixed limit of threads. What are the > ping-timeout settings by the way? > > Best Regards Jasper > > Jasper van Wanrooy - Chatventure BV > Technical Manager > T: +31 (0) 6 47 248 722 > E: jvanwanrooy at chatventure.nl > W: www.chatventure.nl > > > ----- Original Message ----- > From: "Maris Ruskulis" <maris at chown.lv> > To: gluster-users at gluster.org > Sent: Friday, 29 May, 2009 10:11:45 GMT +01:00 Amsterdam / Berlin / > Bern / Rome / Stockholm / Vienna > Subject: Re: [Gluster-users] Glusterfs 2.0 hangs on high load > > Is there way to solve this issue? > > Maris Ruskulis wrote: > > I have same issue with same config when both nodes are x64. But > difference is that, there is no bailout messages in logs. > > Jasper van Wanrooy - Chatventure wrote: > > Hi Maris, > > I regret to hear that. I was also having problems with the > stability on 32bit platforms. Possibly you should try it on a > 64bit platform. Is that an option? > > Best Regards Jasper > > > On 28 mei 2009, at 09:36, Maris Ruskulis wrote: > > Hello! > After upgrade to version 2.0, now using 2.0.1, I'm > experiencing problems with glusterfs stability. > I'm running 2 node setup with cliet side afr, and > glusterfsd also is running on same servers. Time to time > glusterfs just hangs, i can reproduce this running iozone > benchmarking tool. I'm using patched Fuse, but same > result is with unpatched. > > ===============================================================================> Version : glusterfs 2.0.1 built on May 27 2009 16:04:01 > TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b > Starting Time: 2009-05-27 16:38:20 > Command line : /usr/sbin/glusterfsd > --volfile=/etc/glusterfs/glusterfs-server.vol > --pid-file=/var/run/glusterfsd.pid > --log-file=/var/log/glusterfsd.log > PID : 31971 > System name : Linux > Nodename : weeber.st-inst.lv > Kernel Release : 2.6.28-hardened-r7 > Hardware Identifier: i686 > > Given volfile: > +------------------------------------------------------------------------------+ > 1: # file: /etc/glusterfs/glusterfs-server.vol > 2: volume posix > 3: type storage/posix > 4: option directory /home/export > 5: end-volume > 6: > 7: volume locks > 8: type features/locks > 9: option mandatory-locks on > 10: subvolumes posix > 11: end-volume > 12: > 13: volume brick > 14: type performance/io-threads > 15: option autoscaling on > 16: subvolumes locks > 17: end-volume > 18: > 19: volume server > 20: type protocol/server > 21: option transport-type tcp > 22: option auth.addr.brick.allow 127.0.0.1,192.168.1.* > 23: subvolumes brick > 24: end-volume > > +------------------------------------------------------------------------------+ > [2009-05-27 16:38:20] N [glusterfsd.c:1152:main] > glusterfs: Successfully started > [2009-05-27 16:38:33] N > [server-protocol.c:7035:mop_setvolume] server: accepted > client from 192.168.1.233:1021 > [2009-05-27 16:38:33] N > [server-protocol.c:7035:mop_setvolume] server: accepted > client from 192.168.1.233:1020 > [2009-05-27 16:38:46] N > [server-protocol.c:7035:mop_setvolume] server: accepted > client from 192.168.1.252:1021 > [2009-05-27 16:38:46] N > [server-protocol.c:7035:mop_setvolume] server: accepted > client from 192.168.1.252:1020 > > ===============================================================================> Version : glusterfs 2.0.1 built on May 27 2009 16:04:01 > TLA Revision : 5c1d9108c1529a1155963cb1911f8870a674ab5b > Starting Time: 2009-05-27 16:38:46 > Command line : /usr/sbin/glusterfs -N -f > /etc/glusterfs/glusterfs-client.vol /mnt/gluster > PID : 32161 > System name : Linux > Nodename : weeber.st-inst.lv > Kernel Release : 2.6.28-hardened-r7 > Hardware Identifier: i686 > > Given volfile: > +------------------------------------------------------------------------------+ > 1: volume xeon > 2: type protocol/client > 3: option transport-type tcp > 4: option remote-host 192.168.1.233 > 5: option remote-subvolume brick > 6: end-volume > 7: > 8: volume weeber > 9: type protocol/client > 10: option transport-type tcp > 11: option remote-host 192.168.1.252 > 12: option remote-subvolume brick > 13: end-volume > 14: > 15: volume replicate > 16: type cluster/replicate > 17: subvolumes xeon weeber > 18: end-volume > 20: volume readahead > 21: type performance/read-ahead > 22: option page-size 128kB > 23: option page-count 16 > 24: option force-atime-update off > 25: subvolumes replicate > 26: end-volume > 27: > 28: volume writebehind > 29: type performance/write-behind > 30: option aggregate-size 1MB > 31: option window-size 3MB > 32: option flush-behind on > 33: option enable-O_SYNC on > 34: subvolumes readahead > 35: end-volume > 36: > 37: volume iothreads > 38: type performance/io-threads > 39: option autoscaling on > 40: subvolumes writebehind > 41: end-volume > 42: > 43: > 44: > 45: #volume bricks > 46: #type cluster/distribute > 47: #option lookup-unhashed yes > 48: #option min-free-disk 20% > 49: # subvolumes weeber xeon > 50: #end-volume > > +------------------------------------------------------------------------------+ > [2009-05-27 16:38:46] W > [xlator.c:555:validate_xlator_volume_options] writebehind: > option 'window-size' is deprecated, preferred is > 'cache-size', continuing with correction > [2009-05-27 16:38:46] W > [glusterfsd.c:455:_log_if_option_is_invalid] writebehind: > option 'aggregate-size' is not recognized > [2009-05-27 16:38:46] W > [glusterfsd.c:455:_log_if_option_is_invalid] readahead: > option 'page-size' is not recognized > [2009-05-27 16:38:46] N [glusterfsd.c:1152:main] > glusterfs: Successfully started > [2009-05-27 16:38:46] N > [client-protocol.c:5557:client_setvolume_cbk] xeon: > Connected to 192.168.1.233:6996, attached to remote volume > 'brick'. > [2009-05-27 16:38:46] N [afr.c:2190:notify] replicate: > Subvolume 'xeon' came back up; going online. > [2009-05-27 16:38:46] N > [client-protocol.c:5557:client_setvolume_cbk] xeon: > Connected to 192.168.1.233:6996, attached to remote volume > 'brick'. > [2009-05-27 16:38:46] N [afr.c:2190:notify] replicate: > Subvolume 'xeon' came back up; going online. > [2009-05-27 16:38:46] N > [client-protocol.c:5557:client_setvolume_cbk] weeber: > Connected to 192.168.1.252:6996, attached to remote volume > 'brick'. > [2009-05-27 18:46:02] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-27 18:16:01. frame-timeout = 1800 > [2009-05-27 19:16:09] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-27 18:46:02. frame-timeout = 1800 > [2009-05-27 19:46:18] E [client-protocol.c:292:call_bail] > weeber: bailing out frame OPEN(12) frame sent = 2009-05-27 > 19:16:09. frame-timeout = 1800 > [2009-05-27 20:16:25] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-27 19:46:18. frame-timeout = 1800 > [2009-05-27 20:46:34] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-27 20:16:25. frame-timeout = 1800 > [2009-05-27 21:16:41] E [client-protocol.c:292:call_bail] > weeber: bailing out frame OPEN(12) frame sent = 2009-05-27 > 20:46:34. frame-timeout = 1800 > [2009-05-27 21:47:00] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-27 21:16:53. frame-timeout = 1800 > [2009-05-27 22:17:07] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-27 21:47:00. frame-timeout = 1800 > [2009-05-27 22:47:15] E [client-protocol.c:292:call_bail] > weeber: bailing out frame OPENDIR(21) frame sent > 2009-05-27 22:17:07. frame-timeout = 1800 > [2009-05-27 23:17:23] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-27 22:47:15. frame-timeout = 1800 > [2009-05-27 23:47:31] E [client-protocol.c:292:call_bail] > weeber: bailing out frame OPEN(12) frame sent = 2009-05-27 > 23:17:23. frame-timeout = 1800 > [2009-05-28 00:17:39] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-27 23:47:32. frame-timeout = 1800 > [2009-05-28 00:47:47] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 00:17:39. frame-timeout = 1800 > [2009-05-28 01:17:55] E [client-protocol.c:292:call_bail] > weeber: bailing out frame OPENDIR(21) frame sent > 2009-05-28 00:47:47. frame-timeout = 1800 > [2009-05-28 01:48:03] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 01:17:55. frame-timeout = 1800 > [2009-05-28 02:18:11] E [client-protocol.c:292:call_bail] > weeber: bailing out frame OPEN(12) frame sent = 2009-05-28 > 01:48:03. frame-timeout = 1800 > [2009-05-28 02:48:29] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 02:18:24. frame-timeout = 1800 > [2009-05-28 03:18:37] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 02:48:29. frame-timeout = 1800 > [2009-05-28 03:48:45] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 03:18:37. frame-timeout = 1800 > [2009-05-28 04:18:53] E [client-protocol.c:292:call_bail] > weeber: bailing out frame XATTROP(40) frame sent > 2009-05-28 03:48:45. frame-timeout = 1800 > [2009-05-28 04:49:01] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 04:18:53. frame-timeout = 1800 > [2009-05-28 05:19:09] E [client-protocol.c:292:call_bail] > weeber: bailing out frame OPENDIR(21) frame sent > 2009-05-28 04:49:01. frame-timeout = 1800 > [2009-05-28 05:49:17] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 05:19:09. frame-timeout = 1800 > [2009-05-28 06:19:25] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 05:49:17. frame-timeout = 1800 > [2009-05-28 06:49:33] E [client-protocol.c:292:call_bail] > weeber: bailing out frame XATTROP(40) frame sent > 2009-05-28 06:19:25. frame-timeout = 1800 > [2009-05-28 07:19:40] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 06:49:33. frame-timeout = 1800 > [2009-05-28 07:49:48] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 07:19:40. frame-timeout = 1800 > [2009-05-28 08:19:56] E [client-protocol.c:292:call_bail] > weeber: bailing out frame LOOKUP(32) frame sent > 2009-05-28 07:49:48. frame-timeout = 1800 > > <maris.vcf>_______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org <mailto:Gluster-users at gluster.org> > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users > > > > > _______________________________________________ Gluster-users mailing > list Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090529/72c5b384/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: maris.vcf Type: text/x-vcard Size: 216 bytes Desc: not available URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20090529/72c5b384/attachment.vcf>
Apparently Analagous Threads
- big problem with HTB/CBQ and CPU for more than 1.700 customers
- Gluster failure due to "0-management: Lock not released for <volumename>"
- Gluster failure due to "0-management: Lock not released for <volumename>"
- Gluster failure due to "0-management: Lock not released for <volumename>"
- Gluster failure due to "0-management: Lock not released for <volumename>"