thr3ads.net - Gluster users - [Gluster-users] QEMU gfapi segfault [Dec 2014]

If this information is useful, please help other people find it:
Share via:

Josh Boon

2014-Dec-30 22:41 UTC

[Gluster-users] QEMU gfapi segfault

Hey folks, 

I'm working on tracking down rogue QEMU segfaults in my infrastructure that
look to be dying due to gluster. The tips that I get is that the process is in
disk sleep when it dies and the process is backed only by gluster and the
segfault lends to io system issues. Unfortunately I haven't figured out how
to get a full crash dump so I can run it through apport-retrace to get exactly
what went wrong. The other interesting thing is this happens only when gluster
is under heavy load. Any tips about debugging further or getting this fixed up
would be appreciated.

Segfault: 


Dec 30 20:42:56 HFMHVR3 kernel: [5976247.820875] qemu-system-x86[27730]:
segfault at 128 ip 00007f891f0cc82c sp 00007f89376846a0 error 4 in
qemu-system-x86_64 (deleted)[7f891ed42000+4af000]




Brick log: 

[2014-12-30 20:42:56.797946] I [server.c:520:server_rpc_notify]
0-VMARRAY-server: disconnecting connectionfrom
HFMHVR3-27726-2014/11/29-00:42:11:436294-VMARRAY-client-0-0-0
[2014-12-30 20:42:56.798244] W [inodelk.c:392:pl_inodelk_log_cleanup]
0-VMARRAY-server: releasing lock on 6e640448-aa4c-4faa-b7ad-33e68aca0d3a held by
{client=0x7fe130776740, pid=0 lk-owner=ecb80
[2014-12-30 20:42:56.798287] I [server-helpers.c:289:do_fd_cleanup]
0-VMARRAY-server: fd cleanup on /HFMPCI0.img
[2014-12-30 20:42:56.798384] I [client_t.c:417:gf_client_unref]
0-VMARRAY-server: Shutting down connection
HFMHVR3-27726-2014/11/29-00:42:11:436294-VMARRAY-client-0-0-0




Nothing interesting in the VM log or around the segfault event in the hypervisor
log




Enviroment 

Ubuntu 14.04 running stock QEMU 2.0.0 only modified for gfapi from
https://launchpad.net/~josh-boon/+archive/ubuntu/qemu-glusterfs running on top
of an SSD RAID0 array. The gluster volumes are connected over back-to-back 10G
fiber connects running in a bond using balance-rr.




Config 

Filesystem mount 

/dev/mapper/VG0-VAR on /var type xfs (rw,noatime,nodiratime,nobarrier) 

Gluster config 

Volume Name: VMARRAY 
Type: Replicate 
Volume ID: c0947aea-d07f-4ca0-bfcf-3b1c97cec247 
Status: Started 
Number of Bricks: 1 x 2 = 2 
Transport-type: tcp 
Bricks: 
Brick1: 10.9.1.1:/var/lib/glusterfs 
Brick2: 10.9.1.2:/var/lib/glusterfs 
Options Reconfigured: 
cluster.choose-local: true 
storage.owner-gid: 112 
storage.owner-uid: 107 
cluster.server-quorum-type: none 
cluster.quorum-type: none 
network.remote-dio: enable 
cluster.eager-lock: enable 
performance.stat-prefetch: off 
performance.io-cache: off 
performance.read-ahead: off 
performance.quick-read: off 
server.allow-insecure: on 
network.ping-timeout: 7 

Machine Disk XML 

<disk type='network' device='disk'> 

<driver name='qemu' type='raw' cache='none'/> 
<source protocol='gluster' name='VMARRAY/HFMPCI0.img'> 
<host name='10.9.1.2'/> 
</source> 
<target dev='vda' bus='virtio'/> 
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</disk> 




Thanks, 

Josh 


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20141230/027616f8/attachment.html>

Vijay Bellur

2014-Dec-31 16:06 UTC

head link

[Gluster-users] QEMU gfapi segfault

On 12/31/2014 04:11 AM, Josh Boon wrote:> Hey folks,
>
> I'm working on tracking down rogue QEMU segfaults in my infrastructure
> that look to be dying due to gluster. The tips that I get is that the
> process is in disk sleep when it dies and the process is backed only by
> gluster and the segfault lends to io system issues. Unfortunately I
> haven't figured out how to get a full crash dump so I can run it
through
> apport-retrace to get exactly what went wrong. The other interesting
> thing is this happens only when gluster is under heavy load. Any tips
> about debugging further or getting this fixed up would be appreciated.
>
> Segfault:
>
> Dec 30 20:42:56 HFMHVR3 kernel: [5976247.820875] qemu-system-x86[27730]:
> segfault at 128 ip 00007f891f0cc82c sp 00007f89376846a0 error 4 in
> qemu-system-x86_64 (deleted)[7f891ed42000+4af000]
>
Do you see a qemu core dump file? If yes, can you please post the backtrace?

-Vijay

Gluster users - Dec 2014 - QEMU gfapi segfault

[Gluster-users] QEMU gfapi segfault

[Gluster-users] QEMU gfapi segfault