Burnash, James
2011-Apr-27 16:49 UTC
[Gluster-users] OS crash of client using version 3.1.1
Hello. I have had two 3.1.1 client machines running CentOS 5.2 crash with no indications in /var/log/messages, but with this stanza in /var/log/messages/etc-glusterfs-glusterd.vol.log: [2011-04-27 11:12:00.350935] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory [2011-04-27 11:12:00.379320] E [rpc-transport.c:905:rpc_transport_load] rpc-transport: /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so: ca nnot open shared object file: No such file or directory [2011-04-27 11:12:00.379340] E [rpc-transport.c:909:rpc_transport_load] rpc-transport: volume 'rdma.management': transport-type 'rdma' is n ot valid or not found on this machine [2011-04-27 11:12:00.389775] I [glusterd.c:87:glusterd_uuid_init] glusterd: retrieved UUID: b03f0420-14cb-403b-86c5-bde8ef2d4a28 Given volfile: +------------------------------------------------------------------------------+ 1: volume management 2: type mgmt/glusterd 3: option working-directory /etc/glusterd 4: option transport-type socket,rdma 5: option transport.socket.keepalive-time 10 6: option transport.socket.keepalive-interval 2 7: end-volume 8: I'm not running RDMA, I'm running TCP over 1Gb ethernet. Volume created with: gluster volume create pfs-ro1 replica 2 transport tcp <bricks ...> Server info: root at jc1letgfs18:/export/read-only# /usr/sbin/glusterfs -V glusterfs 3.1.3 built on Mar 16 2011 01:01:54 Repository revision: v3.1.3 Client info: rpm -qa "gluster*" glusterfs-core-3.1.1-1.x86_64 glusterfs-fuse-3.1.1-1.x86_64 glusterfs-debuginfo-3.1.1-1.x86_64 Finally, this is (hopefully) the relevant section from the crashdump: taps_linux64.os[10253]: segfault at 0000000000000000 rip 000000000042b057 rsp 0000000040c60f20 error 4 nfs: server pid3780 at jc1lodin2:/net not responding, still trying nfs: server pid3780 at jc1lodin2:/net OK Unable to handle kernel NULL pointer dereference at 00000000000000e8 RIP: [<ffffffff800095c6>] __link_path_walk+0x54/0xf42 PGD 730763067 PUD 7db2e3067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /devices/pci0000:00/0000:00:00.0/irq CPU 0 Modules linked in: fuse(U) mptctl mptbase sg ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) hpilo(U) nfs fscache nfsd exportfs lockd nfs_acl auth_rpcgss aut ofs4 sunrpc bonding dm_multipath video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ata_piix libata ide_ cd cdrom i5000_edac serio_raw edac_mc pcspkr shpchp bnx2 dm_snapshot dm_zero dm_mirror dm_mod cciss(U) sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 4004, comm: csh Tainted: G 2.6.18-92.el5 #1 Thanks, James Burnash, Unix Engineering DISCLAIMER: This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110427/11aa3a2e/attachment.html>
I had an issue like this, and it ended up being the chelsio driver. -luis On Apr 27, 2011, at 11:49 AM, Burnash, James wrote:> Hello. > > I have had two 3.1.1 client machines running CentOS 5.2 crash with no indications in /var/log/messages, but with this stanza in /var/log/messages/etc-glusterfs-glusterd.vol.log: > > [2011-04-27 11:12:00.350935] I [glusterd.c:275:init] management: Using /etc/glusterd as working directory > [2011-04-27 11:12:00.379320] E [rpc-transport.c:905:rpc_transport_load] rpc-transport: /usr/lib64/glusterfs/3.1.1/rpc-transport/rdma.so: ca > nnot open shared object file: No such file or directory > [2011-04-27 11:12:00.379340] E [rpc-transport.c:909:rpc_transport_load] rpc-transport: volume 'rdma.management': transport-type 'rdma' is n > ot valid or not found on this machine > [2011-04-27 11:12:00.389775] I [glusterd.c:87:glusterd_uuid_init] glusterd: retrieved UUID: b03f0420-14cb-403b-86c5-bde8ef2d4a28 > Given volfile: > +------------------------------------------------------------------------------+ > 1: volume management > 2: type mgmt/glusterd > 3: option working-directory /etc/glusterd > 4: option transport-type socket,rdma > 5: option transport.socket.keepalive-time 10 > 6: option transport.socket.keepalive-interval 2 > 7: end-volume > 8: > > > > I?m not running RDMA, I?m running TCP over 1Gb ethernet. > > Volume created with: > > gluster volume create pfs-ro1 replica 2 transport tcp <bricks ?> > > Server info: > root at jc1letgfs18:/export/read-only# /usr/sbin/glusterfs -V > glusterfs 3.1.3 built on Mar 16 2011 01:01:54 > Repository revision: v3.1.3 > > Client info: > rpm -qa "gluster*" > glusterfs-core-3.1.1-1.x86_64 > glusterfs-fuse-3.1.1-1.x86_64 > glusterfs-debuginfo-3.1.1-1.x86_64 > > Finally, this is (hopefully) the relevant section from the crashdump: > > taps_linux64.os[10253]: segfault at 0000000000000000 rip 000000000042b057 rsp 0000000040c60f20 error 4 > nfs: server pid3780 at jc1lodin2:/net not responding, still trying > nfs: server pid3780 at jc1lodin2:/net OK > Unable to handle kernel NULL pointer dereference at 00000000000000e8 RIP: > [<ffffffff800095c6>] __link_path_walk+0x54/0xf42 > PGD 730763067 PUD 7db2e3067 PMD 0 > Oops: 0000 [1] SMP > last sysfs file: /devices/pci0000:00/0000:00:00.0/irq > CPU 0 > Modules linked in: fuse(U) mptctl mptbase sg ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) hpilo(U) nfs fscache nfsd exportfs lockd nfs_acl auth_rpcgss aut > ofs4 sunrpc bonding dm_multipath video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport ata_piix libata ide_ > cd cdrom i5000_edac serio_raw edac_mc pcspkr shpchp bnx2 dm_snapshot dm_zero dm_mirror dm_mod cciss(U) sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd > Pid: 4004, comm: csh Tainted: G 2.6.18-92.el5 #1 > > Thanks, > > James Burnash, Unix Engineering > > > > DISCLAIMER: > This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. > NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-usersLuis E. Cerezo http://www.luiscerezo.org http://twitter.com/luiscerezo http://flickr.com/photos/luiscerezo photos for sale: http://photos.luiscerezo.org Voice: 412 223 7396 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110427/27b9a78f/attachment.html>