Hi? everyone I?m running glusterfs 3.8.5 container with kubernetes in a coreos 1185.5.0 environment. I had a 20 server?s glusterfs cluster, 7 bricks each server, totally a 132-bricks distributed 3-replicated volume. Here is the volume info: Type: Distributed-Replicate Volume ID: cc9f0101-0bc7-4a40-a813-a7e540593a2b Status: Started Snapshot Count: 0 Number of Bricks: 44 x 3 = 132 Transport-type: tcp Bricks: Brick1: 10.32.3.9:/mnt/brick1/vol Brick2: 10.32.3.19:/mnt/brick1/vol Brick3: 10.32.3.29:/mnt/brick1/vol ?. Brick132: 10.32.3.40:/mnt/brick7/vol Options Reconfigured: nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.quota: on features.inode-quota: on features.quota-deem-statfs: on cluster.quorum-type: auto features.quota-timeout: 10 features.bitrot: on features.scrub: Active performance.cache-size: 4GB diagnostics.latency-measurement: on diagnostics.count-fop-hits: on I was doing some k8s upgrade from 1.4.6 to 1.5.1 for the time being. upgrade, reboot server, of cause the gluster pod restarted, one by one. After 3rd server upgraded, the status of this sever was not normal, some bricks were online, some didn?t . I was handling this, another server rebooted, thus, the mirror group which include these two servers entered readonly status. Then I checked that server , it?s crashed with the log followed: Dec 27 14:22:29 ac07.pek.prod.com kernel: general protection fault: 0000 [#1] SMP Dec 27 14:22:29 ac07.pek.prod.com kernel: Modules linked in: xt_physdev fuse nfsd auth_rpcgss nfs_acl lockd grace sunrpc bin fmt_misc xfs ipt_REJECT nf_reject_ipv4 xt_statistic xt_nat xt_recent xt_mark ip6t_rpfilter xt_comment ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw nf_conntrack_netlink ip6table_filter ip6_tables xt_set ip_set nfnetlink ipt_MASQUERADE nf_n at_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntr ack br_netfilter bridge stp llc overlay bonding dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32c_generic core temp nls_ascii nls_cp437 sb_edac edac_core vfat fat x86_pkg_temp_thermal kvm_intel ipmi_ssif i2c_core ipmi_devintf kvm mei_me ipmi_s i irqbypass ses enclosure evdev dcdbas scsi_transport_sas Dec 27 14:22:29 ac07.pek.prod.com kernel: mei ipmi_msghandler button sch_fq_codel ip_tables ext4 crc16 jbd2 mbcache mlx4_en sd_mod crc32c_intel jitterentropy_rng drbg ahci aesni_intel libahci aes_x86_64 ehci_pci glue_helper tg3 lrw ehci_hcd gf128mul ablk_ helper hwmon megaraid_sas cryptd libata ptp mlx4_core usbcore scsi_mod pps_core usb_common libphy dm_mirror dm_region_hash dm_log dm _mod autofs4 Dec 27 14:22:29 ac07.pek.prod.com kernel: CPU: 15 PID: 40090 Comm: glusterfsd Not tainted 4.7.3-coreos-r3 #1 Dec 27 14:22:29 ac07.pek.prod.com kernel: Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.1.7 06/16/2016 Dec 27 14:22:29 ac07.pek.prod.com kernel: task: ffff885fa2f11d40 ti: ffff885f66988000 task.ti: ffff885f66988000 Dec 27 14:22:29 ac07.pek.prod.com kernel: RIP: 0010:[<ffffffffa11e517f>] [<ffffffffa11e517f>] do_iter_readv_writev+0xdf/0x1 10 Dec 27 14:22:29 ac07.pek.prod.com kernel: RSP: 0018:ffff885f6698bd58 EFLAGS: 00010246 Dec 27 14:22:29 ac07.pek.prod.com kernel: RAX: 0000000000000000 RBX: ffff885f6698bec8 RCX: ffffffffa1479020 Dec 27 14:22:29 ac07.pek.prod.com kernel: RDX: 73ff884e61ba1598 RSI: ffff885f6698bdb8 RDI: ffff885f7d0f3900 Dec 27 14:22:29 ac07.pek.prod.com kernel: RBP: ffff885f6698bd90 R08: 0000000000000000 R09: 0000000000000802 Dec 27 14:22:29 ac07.pek.prod.com kernel: R10: 0000000000000000 R11: ffff885f6698c000 R12: 0000000000000000 Dec 27 14:22:29 ac07.pek.prod.com kernel: R13: ffff885f7d0f3900 R14: ffff885f6698bec8 R15: 0000000000000000 Dec 27 14:22:29 ac07.pek.prod.com kernel: FS: 00007f00b5562700(0000) GS:ffff885fbe7c0000(0000) knlGS:0000000000000000 Dec 27 14:22:29 ac07.pek.prod.com kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Dec 27 14:22:29 ac07.pek.prod.com kernel: CR2: 000000c821378000 CR3: 0000005eb41c3000 CR4: 00000000003406e0 Dec 27 14:22:29 ac07.pek.prod.com kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Dec 27 14:22:29 ac07.pek.prod.com kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Dec 27 14:22:29 ac07.pek.prod.com kernel: Stack: Dec 27 14:22:29 ac07.pek.prod.com kernel: 0000000000000000 ffff885f7d0f3900 ffff885f6698bec8 ffff885f6698bd90 Dec 27 14:22:29 ac07.pek.prod.com kernel: ffffffffa11e58be 000000004c834e96 0000000000000000 ffff885f6698bea0 Dec 27 14:22:29 ac07.pek.prod.com kernel: ffffffffa11e5da2 0000000000000000 ffff885f6698bde8 0000000000000000 Dec 27 14:22:29 ac07.pek.prod.com kernel: Call Trace: Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa11e58be>] ? rw_verify_area+0x4e/0xb0 Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa11e5da2>] do_readv_writev+0x1a2/0x240 Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa122e749>] ? ep_poll+0x139/0x340 Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa1115771>] ? __audit_syscall_entry+0xb1/0x100 Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa11e5e79>] vfs_readv+0x39/0x50 Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa11e5ef1>] do_readv+0x61/0xf0 Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa11e7270>] SyS_readv+0x10/0x20 Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa1003c6d>] do_syscall_64+0x5d/0x150 Dec 27 14:22:29 ac07.pek.prod.com kernel: [<ffffffffa159c7a1>] entry_SYSCALL64_slow_path+0x25/0x25 Dec 27 14:22:29 ac07.pek.prod.com kernel: Code: 54 48 8b 55 d0 48 89 13 48 8b 5d f0 65 48 33 1c 25 28 00 00 00 75 40 48 83 c 4 30 5b 5d c3 83 4d e8 30 eb c8 48 8b 97 f8 00 00 00 <48> 8b 12 4c 8b 52 28 41 f6 42 50 10 0f 85 65 ff ff ff f6 42 0c Dec 27 14:22:29 ac07.pek.prod.com kernel: RIP [<ffffffffa11e517f>] do_iter_readv_writev+0xdf/0x110 I don?t have any hint about the cause of the crash, any help is appreciated. likun -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161227/5e6cfe84/attachment.html>
Dmitry Melekhov
2016-Dec-27 10:42 UTC
[Gluster-users] glusterfsd Not tainted error, server crashed
27.12.2016 14:32, likun ?????:> > I don?t have any hint about the cause of the crash, any help is > appreciated. >afaik, gluster is completely userspace, you have kernel crash, so it is not gluster related at all.. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161227/22d8f7f1/attachment.html>