thr3ads.net - Gluster users - [Gluster-users] GlusterFS 3.7.11 crash issue [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Yann LEMARIE

2016-Jun-28 08:49 UTC

[Gluster-users] GlusterFS 3.7.11 crash issue

Hi,

I found the coredump file, but it's a 15Mo file (zipped), I can't post 
it on this mailling list.

Here is some parts of the repport :
> ProblemType: Crash
> Architecture: amd64
> Date: Sun Jun 26 11:27:44 2016
> DistroRelease: Ubuntu 14.04
> ExecutablePath: /usr/sbin/glusterfsd
> ExecutableTimestamp: 1460982898
> ProcCmdline: /usr/sbin/glusterfsd -s nfs05 --volfile-id 
> cdn.nfs05.srv-cdn -p /var/lib/glusterd/vols/cdn/run/nfs05-srv-cdn.pid 
> -S /var/run/gluster/d52ac3e6c0a3fa316a9e8360976f3af5.socket 
> --brick-name /srv/cdn -l /var/log/glusterfs/bricks/srv-cdn.log 
> --xlator-option 
> *-posix.glusterd-uuid=6af63b78-a3da-459d-a909-c010e6c9072c 
> --brick-port 49155 --xlator-option cdn-server.listen-port=49155
> ProcCwd: /
> ProcEnviron:
>  PATH=(custom, no user)
>  TERM=linux
> ProcMaps:
>  7f25f18d9000-7f25f18da000 ---p 00000000 00:00 0
>  7f25f18da000-7f25f19da000 rw-p 00000000 00:00 
> 0                          [stack:849]
>  7f25f19da000-7f25f19db000 ---p 00000000 00:00 0
...> ProcStatus:
>  Name:  glusterfsd
>  State: D (disk sleep)
>  Tgid:  7879
>  Ngid:  0
>  Pid:   7879
>  PPid:  1
>  TracerPid:     0
>  Uid:   0       0       0       0
>  Gid:   0       0       0       0
>  FDSize:        64
>  Groups:        0
>  VmPeak:          878404 kB
>  VmSize:          878404 kB
>  VmLck:        0 kB
>  VmPin:        0 kB
>  VmHWM:    96104 kB
>  VmRSS:    90652 kB
>  VmData:          792012 kB
>  VmStk:      276 kB
>  VmExe:       84 kB
>  VmLib:     7716 kB
>  VmPTE:      700 kB
>  VmSwap:           20688 kB
>  Threads:       22
>  SigQ:  0/30034
>  SigPnd:        0000000000000000
>  ShdPnd:        0000000000000000
>  SigBlk:        0000000000004a01
>  SigIgn:        0000000000001000
>  SigCgt:        00000001800000fa
>  CapInh:        0000000000000000
>  CapPrm:        0000001fffffffff
>  CapEff:        0000001fffffffff
>  CapBnd:        0000001fffffffff
>  Seccomp:       0
>  Cpus_allowed:  7fff
>  Cpus_allowed_list:     0-14
>  Mems_allowed:  00000000,00000001
>  Mems_allowed_list:     0
>  voluntary_ctxt_switches:       3
>  nonvoluntary_ctxt_switches:    1
> Signal: 11
> Uname: Linux 3.13.0-44-generic x86_64
> UserGroups:
> CoreDump: base64...


Yann

Le 28/06/2016 09:31, Anoop C S a ?crit :> On Mon, 2016-06-27 at 15:05 +0200, Yann LEMARIE wrote:
>>   @Anoop,
>>
>> Where can I find the coredump file ?
>>
> You will get hints about the crash from entries inside
> /var/log/messages(for example pid of the process, location of coredump
> etc).
>
>> The crash occurs 2 times last 7 days, each time a sunday morning with
>> no reason, no increase of traffic or something like this, the volume
>> was mounted since 15 days.
>>
>> The bricks are used as a CDN like, distributting small images and css
>> files with a nginx https service (with a load balancer and 2 EC2), on
>> a sunday morning there is not a lot of activity ...
>>
>  From the very minimal back trace that we have from brick logs I would
> assume that a truncate operation was being handled by trash translator
> and it crashed.
>
>> Volume infos:
>>> root at nfs05 /var/log/glusterfs # gluster volume info cdn
>>>   
>>> Volume Name: cdn
>>> Type: Replicate
>>> Volume ID: c53b9bae-5e12-4f13-8217-53d8c96c302c
>>> Status: Started
>>> Number of Bricks: 1 x 2 = 2
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: nfs05:/srv/cdn
>>> Brick2: nfs06:/srv/cdn
>>> Options Reconfigured:
>>> performance.readdir-ahead: on
>>> features.trash: on
>>> features.trash-max-filesize: 20MB
>>   
>> I don't know if there is a link with this crash problem, but I have
>> another problem with my 2 servers that make GluserFS's clients
>> disconnected (from another volume) :
>>> Jun 24 02:28:04 nfs05 kernel: [2039468.818617] xen_netfront:
>>> xennet: skb rides the rocket: 19 slots
>>> Jun 24 02:28:11 nfs05 kernel: [2039475.744086] net_ratelimit: 66
>>> callbacks suppressed
>>   It seem to be a network interface problem :
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811
>>
>> Yann
>>
>> Le 27/06/2016 12:59, Anoop C S a ?crit :
>>> On Mon, 2016-06-27 at 09:47 +0200, Yann LEMARIE wrote:
>>>> Hi,
>>>>
>>>> I'm using GlusterFS since many years and never see this
problem,
>>>> but
>>>> this is the second time in one week ...
>>>>
>>>> I have 3 volumes with 2 bricks and 1 volume crash with no
reason,
>>> Did you observe the crash while mounting the volume? Or can you be
>>> more
>>> specific on what were you doing just before you saw the crash? Can
>>> you
>>> please share the output of `gluster volume info <VOLNAME>`?
>>>
>>>>   I just have to stop/start the volume to make it up again.
>>>> The only logs I can find are in syslog :
>>>>
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: pending frames:
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: frame : type(0) op(10)
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: patchset:
>>>>> git://git.gluster.com/glusterfs.git
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: signal received: 11
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: time of crash:
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: 2016-06-26 09:27:44
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: configuration details:
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: argp 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: backtrace 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: dlfcn 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: libpthread 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: llistxattr 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: setfsid 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: spinlock 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: epoll.h 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: xattr.h 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: st_atim.tv_nsec 1
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: package-string:
glusterfs
>>>>> 3.7.11
>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: ---------
>>>>>
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: pending frames:
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: frame : type(0) op(10)
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: patchset:
>>>>> git://git.gluster.com/glusterfs.git
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: signal received: 11
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: time of crash:
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: 2016-06-26 09:27:44
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: configuration details:
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: argp 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: backtrace 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: dlfcn 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: libpthread 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: llistxattr 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: setfsid 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: spinlock 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: epoll.h 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: xattr.h 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: st_atim.tv_nsec 1
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: package-string:
glusterfs
>>>>> 3.7.11
>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: ---------
>>>>>
>>>>   
>>>> Thanks for your help
>>>>
>>>>
>>>> Regards
>>>> -- 
>>>> Yann Lemari?
>>>> iRaiser - Support Technique
>>>>   
>>>> ylemarie at iraiser.eu
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>   
>> -- 
>> Yann Lemari?
>> iRaiser - Support Technique
>>   
>> ylemarie at iraiser.eu
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
-- 
Yann Lemari?
iRaiser - Support Technique
iRaiser Logotype <http://www.iraiser.eu>
ylemarie at iraiser.eu <mailto:ylemarie at iraiser.eu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160628/be9de102/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: efihhdbb.png
Type: image/png
Size: 1842 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160628/be9de102/attachment.png>

Anoop C S

2016-Jun-29 06:39 UTC

head link

[Gluster-users] GlusterFS 3.7.11 crash issue

On Tue, 2016-06-28 at 10:49 +0200, Yann LEMARIE wrote:> Hi,
> 
> I found the coredump file, but it's a 15Mo file (zipped), I can't
> post it on this mailling list.
> 
Great. In order to exactly pin point the crash location, can you please
attach gdb to extracted coredump file and share us the complete back
trace by executing `bt` command in gdb shell? Apart from gdb you may be
instructed to install some debug-info packages for extracting a useful
back trace while attaching gdb as follows:

# gdb /usr/sbin/glusterfsd <path-to-coredump-file>

If prompted install required packages and reattach the coredump file.
When you are inside (gdb) prompt type 'bt' and paste the back trace.
> Here is some parts of the repport :
> 
> > ProblemType: Crash
> > Architecture: amd64
> > Date: Sun Jun 26 11:27:44 2016
> > DistroRelease: Ubuntu 14.04
> > ExecutablePath: /usr/sbin/glusterfsd
> > ExecutableTimestamp: 1460982898
> > ProcCmdline: /usr/sbin/glusterfsd -s nfs05 --volfile-id
> > cdn.nfs05.srv-cdn -p /var/lib/glusterd/vols/cdn/run/nfs05-srv-
> > cdn.pid -S /var/run/gluster/d52ac3e6c0a3fa316a9e8360976f3af5.socket
> > --brick-name /srv/cdn -l /var/log/glusterfs/bricks/srv-cdn.log --
> > xlator-option *-posix.glusterd-uuid=6af63b78-a3da-459d-a909-
> > c010e6c9072c --brick-port 49155 --xlator-option cdn-server.listen-
> > port=49155
> > ProcCwd: /
> > ProcEnviron:
> > ?PATH=(custom, no user)
> > ?TERM=linux
> > ProcMaps:
> > ?7f25f18d9000-7f25f18da000 ---p 00000000 00:00 0
> > ?7f25f18da000-7f25f19da000 rw-p 00000000 00:00
> > 0????????????????????????? [stack:849]
> > ?7f25f19da000-7f25f19db000 ---p 00000000 00:00 0
> ?...
> > ProcStatus:
> > ?Name:? glusterfsd
> > ?State: D (disk sleep)
> > ?Tgid:? 7879
> > ?Ngid:? 0
> > ?Pid:?? 7879
> > ?PPid:? 1
> > ?TracerPid:???? 0
> > ?Uid:?? 0?????? 0?????? 0?????? 0
> > ?Gid:?? 0?????? 0?????? 0?????? 0
> > ?FDSize:??????? 64
> > ?Groups:??????? 0
> > ?VmPeak:????????? 878404 kB
> > ?VmSize:????????? 878404 kB
> > ?VmLck:??????? 0 kB
> > ?VmPin:??????? 0 kB
> > ?VmHWM:??? 96104 kB
> > ?VmRSS:??? 90652 kB
> > ?VmData:????????? 792012 kB
> > ?VmStk:????? 276 kB
> > ?VmExe:?????? 84 kB
> > ?VmLib:???? 7716 kB
> > ?VmPTE:????? 700 kB
> > ?VmSwap:?????????? 20688 kB
> > ?Threads:?????? 22
> > ?SigQ:? 0/30034
> > ?SigPnd:??????? 0000000000000000
> > ?ShdPnd:??????? 0000000000000000
> > ?SigBlk:??????? 0000000000004a01
> > ?SigIgn:??????? 0000000000001000
> > ?SigCgt:??????? 00000001800000fa
> > ?CapInh:??????? 0000000000000000
> > ?CapPrm:??????? 0000001fffffffff
> > ?CapEff:??????? 0000001fffffffff
> > ?CapBnd:??????? 0000001fffffffff
> > ?Seccomp:?????? 0
> > ?Cpus_allowed:? 7fff
> > ?Cpus_allowed_list:???? 0-14
> > ?Mems_allowed:? 00000000,00000001
> > ?Mems_allowed_list:???? 0
> > ?voluntary_ctxt_switches:?????? 3
> > ?nonvoluntary_ctxt_switches:??? 1
> > Signal: 11
> > Uname: Linux 3.13.0-44-generic x86_64
> > UserGroups:
> > CoreDump: base64
> ?...
> 
> Yann
> 
> Le 28/06/2016 09:31, Anoop C S a ?crit?:
> > On Mon, 2016-06-27 at 15:05 +0200, Yann LEMARIE wrote:
> > > ?@Anoop,
> > > 
> > > Where can I find the coredump file ?
> > > 
> > You will get hints about the crash from entries inside
> > /var/log/messages(for example pid of the process, location of
> > coredump
> > etc).?
> > 
> > > The crash occurs 2 times last 7 days, each time a sunday morning
> > > with
> > > no reason, no increase of traffic or something like this, the
> > > volume
> > > was mounted since 15 days.
> > > 
> > > The bricks are used as a CDN like, distributting small images and
> > > css
> > > files with a nginx https service (with a load balancer and 2
> > > EC2), on
> > > a sunday morning there is not a lot of activity ...
> > > 
> > From the very minimal back trace that we have from brick logs I
> > would
> > assume that a truncate operation was being handled by trash
> > translator
> > and it crashed.
> > 
> > > Volume infos:?
> > > > root at nfs05 /var/log/glusterfs # gluster volume info cdn
> > > > ?
> > > > Volume Name: cdn
> > > > Type: Replicate
> > > > Volume ID: c53b9bae-5e12-4f13-8217-53d8c96c302c
> > > > Status: Started
> > > > Number of Bricks: 1 x 2 = 2
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: nfs05:/srv/cdn
> > > > Brick2: nfs06:/srv/cdn
> > > > Options Reconfigured:
> > > > performance.readdir-ahead: on
> > > > features.trash: on
> > > > features.trash-max-filesize: 20MB
> > > ?
> > > I don't know if there is a link with this crash problem, but
I
> > > have
> > > another problem with my 2 servers that make GluserFS's
clients
> > > disconnected (from another volume) :
> > > > Jun 24 02:28:04 nfs05 kernel: [2039468.818617] xen_netfront:
> > > > xennet: skb rides the rocket: 19 slots
> > > > Jun 24 02:28:11 nfs05 kernel: [2039475.744086]
net_ratelimit:
> > > > 66
> > > > callbacks suppressed
> > > ?It seem to be a network interface problem :
> > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811
> > > 
> > > Yann
> > > 
> > > Le 27/06/2016 12:59, Anoop C S a ?crit?:
> > > > On Mon, 2016-06-27 at 09:47 +0200, Yann LEMARIE wrote:
> > > > > Hi,
> > > > > 
> > > > > I'm using GlusterFS since many years and never see
this
> > > > > problem,
> > > > > but
> > > > > this is the second time in one week ...
> > > > > 
> > > > > I have 3 volumes with 2 bricks and 1 volume crash with
no
> > > > > reason,
> > > > Did you observe the crash while mounting the volume? Or can
you
> > > > be
> > > > more
> > > > specific on what were you doing just before you saw the
crash?
> > > > Can
> > > > you
> > > > please share the output of `gluster volume info
<VOLNAME>`?
> > > > 
> > > > > ?I just have to stop/start the volume to make it up
again.
> > > > > The only logs I can find are in syslog :
> > > > > 
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: pending
frames:
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: frame :
type(0) op(10)
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: patchset:
> > > > > > git://git.gluster.com/glusterfs.git
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: signal
received: 11
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: time of
crash:
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: 2016-06-26
09:27:44
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: configuration
details:
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: argp 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: backtrace 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: dlfcn 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: libpthread 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: llistxattr 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: setfsid 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: spinlock 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: epoll.h 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: xattr.h 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]:
st_atim.tv_nsec 1
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]:
package-string:
> > > > > > glusterfs
> > > > > > 3.7.11
> > > > > > Jun 26 11:27:44 nfs05 srv-cdn[7879]: ---------
> > > > > > 
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: pending
frames:
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: frame :
type(0) op(10)
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: patchset:
> > > > > > git://git.gluster.com/glusterfs.git
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: signal
received: 11
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: time of
crash:
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: 2016-06-26
09:27:44
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: configuration
details:
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: argp 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: backtrace 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: dlfcn 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: libpthread 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: llistxattr 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: setfsid 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: spinlock 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: epoll.h 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: xattr.h 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]:
st_atim.tv_nsec 1
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]:
package-string:
> > > > > > glusterfs
> > > > > > 3.7.11
> > > > > > Jun 26 11:27:44 nfs06 srv-cdn[1787]: ---------
> > > > > > 
> > > > > ?
> > > > > Thanks for your help
> > > > > 
> > > > > 
> > > > > Regards
> > > > > --?
> > > > > Yann Lemari?
> > > > > iRaiser - Support Technique
> > > > > ?
> > > > > ylemarie at iraiser.eu
> > > > > _______________________________________________
> > > > > Gluster-users mailing list
> > > > > Gluster-users at gluster.org
> > > > > http://www.gluster.org/mailman/listinfo/gluster-users
> > > ?
> > > --?
> > > Yann Lemari?
> > > iRaiser - Support Technique
> > > ?
> > > ylemarie at iraiser.eu
> > > 
> > > 
> > > _______________________________________________
> > > Gluster-users mailing list
> > > Gluster-users at gluster.org
> > > http://www.gluster.org/mailman/listinfo/gluster-users
> ?
> --?
> Yann Lemari?
> iRaiser - Support Technique
> ?
> ylemarie at iraiser.eu
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users

Yann LEMARIE

2016-Jul-01 13:16 UTC

head link

[Gluster-users] GlusterFS 3.7.11 crash issue

Hi,

Another crash this morning in another volume (srv-payment), always a 
problem with a file in "trashcan" but this time I haven't any dump
file
in /var/crash and something very strange, the volume seem to re-up 
itself 2 or 3 minutes later, how is it possible ? It doesn't really 
crashed ?
> Jul  1 06:36:19 nfs05 srv-payment[15744]: pending frames:
> Jul  1 06:36:19 nfs05 srv-payment[15744]: frame : type(0) op(10)
> Jul  1 06:36:19 nfs05 srv-payment[15744]: patchset: 
> git://git.gluster.com/glusterfs.git
> Jul  1 06:36:19 nfs05 srv-payment[15744]: signal received: 11
> Jul  1 06:36:19 nfs05 srv-payment[15744]: time of crash:
> Jul  1 06:36:19 nfs05 srv-payment[15744]: 2016-07-01 04:36:19
> Jul  1 06:36:19 nfs05 srv-payment[15744]: configuration details:
> Jul  1 06:36:19 nfs05 srv-payment[15744]: argp 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: backtrace 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: dlfcn 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: libpthread 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: llistxattr 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: setfsid 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: spinlock 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: epoll.h 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: xattr.h 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: st_atim.tv_nsec 1
> Jul  1 06:36:19 nfs05 srv-payment[15744]: package-string: glusterfs 3.7.11
> Jul  1 06:36:19 nfs05 srv-payment[15744]: ---------
> [2016-07-01 04:20:01.896593] E [MSGID: 113020] 
> [posix.c:2651:posix_create] 0-payment-posix: setting gfid on 
>
/srv/payment/.trashcan//sites/dons.fondationdefrance.org/log/synchroNetful.log_2016-07-01_042001
> failed
> [2016-07-01 04:20:01.896715] E [posix.c:2996:_fill_writev_xdata] 
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/trash.so(trash_truncate_readv_cbk+0x16c)
> [0x7f95a35a1c4c] 
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/storage/posix.so(posix_writev+0x1dc)
> [0x7f95a3de037c] 
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/storage/posix.so(_fill_writev_xdata+0x1ff)
> [0x7f95a3de016f] ) 0-payment-posix: fd: 0x7f9598005a14 inode: 
> 0x7f952ae0a6acgfid:00000000-0000-0000-0000-000000000000 [Invalid argument]
> [2016-07-01 04:20:01.896756] E [posix.c:2996:_fill_writev_xdata] 
>
(-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/trash.so(trash_truncate_readv_cbk+0x16c)
> [0x7f95a35a1c4c] 
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/storage/posix.so(posix_writev+0x1dc)
> [0x7f95a3de037c] 
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/storage/posix.so(_fill_writev_xdata+0x1ff)
> [0x7f95a3de016f] ) 0-payment-posix: fd: 0x7f9598005a14 inode: 
> 0x7f952ae0a6acgfid:00000000-0000-0000-0000-000000000000 [Invalid argument]
> [2016-07-01 04:25:53.012016] I [dict.c:473:dict_get] 
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xac)
> [0x7f95a97b417c] 
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f95a2028877] 
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) 
> [0x7f95a97a491c] ) 0-dict: !this || key=() [Invalid argument]
> [2016-07-01 04:25:53.013409] E [MSGID: 113091] 
> [posix.c:178:posix_lookup] 0-payment-posix: null gfid for path (null)
> [2016-07-01 04:25:53.013424] E [MSGID: 113018] 
> [posix.c:196:posix_lookup] 0-payment-posix: lstat on null failed 
> [Invalid argument]
> The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 
> 0-payment-posix: null gfid for path (null)" repeated 3 times between 
> [2016-07-01 04:25:53.013409] and [2016-07-01 04:25:53.025339]
> The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 
> 0-payment-posix: lstat on null failed [Invalid argument]" repeated 3 
> times between [2016-07-01 04:25:53.013424] and [2016-07-01 
> 04:25:53.025340]
> [2016-07-01 04:35:54.017530] I [dict.c:473:dict_get] 
>
(-->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_getxattr_cbk+0xac)
> [0x7f95a97b417c] 
>
-->/usr/lib/x86_64-linux-gnu/glusterfs/3.7.11/xlator/features/marker.so(marker_getxattr_cbk+0xa7)
> [0x7f95a2028877] 
> -->/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(dict_get+0xac) 
> [0x7f95a97a491c] ) 0-dict: !this || key=() [Invalid argument]
> [2016-07-01 04:35:54.019695] E [MSGID: 113091] 
> [posix.c:178:posix_lookup] 0-payment-posix: null gfid for path (null)
> [2016-07-01 04:35:54.019710] E [MSGID: 113018] 
> [posix.c:196:posix_lookup] 0-payment-posix: lstat on null failed 
> [Invalid argument]
> The message "E [MSGID: 113091] [posix.c:178:posix_lookup] 
> 0-payment-posix: null gfid for path (null)" repeated 3 times between 
> [2016-07-01 04:35:54.019695] and [2016-07-01 04:35:54.027856]
> The message "E [MSGID: 113018] [posix.c:196:posix_lookup] 
> 0-payment-posix: lstat on null failed [Invalid argument]" repeated 3 
> times between [2016-07-01 04:35:54.019710] and [2016-07-01 
> 04:35:54.027857]
> pending frames:
> frame : type(0) op(10)
> patchset: git://git.gluster.com/glusterfs.git
> signal received: 11
> time of crash:
> 2016-07-01 04:36:19


Le 29/06/2016 08:39, Anoop C S a ?crit :> On Tue, 2016-06-28 at 10:49 +0200, Yann LEMARIE wrote:
>> Hi,
>>
>> I found the coredump file, but it's a 15Mo file (zipped), I
can't
>> post it on this mailling list.
>>
> Great. In order to exactly pin point the crash location, can you please
> attach gdb to extracted coredump file and share us the complete back
> trace by executing `bt` command in gdb shell? Apart from gdb you may be
> instructed to install some debug-info packages for extracting a useful
> back trace while attaching gdb as follows:
>
> # gdb /usr/sbin/glusterfsd <path-to-coredump-file>
>
> If prompted install required packages and reattach the coredump file.
> When you are inside (gdb) prompt type 'bt' and paste the back
trace.
>
>> Here is some parts of the repport :
>>
>>> ProblemType: Crash
>>> Architecture: amd64
>>> Date: Sun Jun 26 11:27:44 2016
>>> DistroRelease: Ubuntu 14.04
>>> ExecutablePath: /usr/sbin/glusterfsd
>>> ExecutableTimestamp: 1460982898
>>> ProcCmdline: /usr/sbin/glusterfsd -s nfs05 --volfile-id
>>> cdn.nfs05.srv-cdn -p /var/lib/glusterd/vols/cdn/run/nfs05-srv-
>>> cdn.pid -S /var/run/gluster/d52ac3e6c0a3fa316a9e8360976f3af5.socket
>>> --brick-name /srv/cdn -l /var/log/glusterfs/bricks/srv-cdn.log --
>>> xlator-option *-posix.glusterd-uuid=6af63b78-a3da-459d-a909-
>>> c010e6c9072c --brick-port 49155 --xlator-option cdn-server.listen-
>>> port=49155
>>> ProcCwd: /
>>> ProcEnviron:
>>>   PATH=(custom, no user)
>>>   TERM=linux
>>> ProcMaps:
>>>   7f25f18d9000-7f25f18da000 ---p 00000000 00:00 0
>>>   7f25f18da000-7f25f19da000 rw-p 00000000 00:00
>>> 0                          [stack:849]
>>>   7f25f19da000-7f25f19db000 ---p 00000000 00:00 0
>>   ...
>>> ProcStatus:
>>>   Name:  glusterfsd
>>>   State: D (disk sleep)
>>>   Tgid:  7879
>>>   Ngid:  0
>>>   Pid:   7879
>>>   PPid:  1
>>>   TracerPid:     0
>>>   Uid:   0       0       0       0
>>>   Gid:   0       0       0       0
>>>   FDSize:        64
>>>   Groups:        0
>>>   VmPeak:          878404 kB
>>>   VmSize:          878404 kB
>>>   VmLck:        0 kB
>>>   VmPin:        0 kB
>>>   VmHWM:    96104 kB
>>>   VmRSS:    90652 kB
>>>   VmData:          792012 kB
>>>   VmStk:      276 kB
>>>   VmExe:       84 kB
>>>   VmLib:     7716 kB
>>>   VmPTE:      700 kB
>>>   VmSwap:           20688 kB
>>>   Threads:       22
>>>   SigQ:  0/30034
>>>   SigPnd:        0000000000000000
>>>   ShdPnd:        0000000000000000
>>>   SigBlk:        0000000000004a01
>>>   SigIgn:        0000000000001000
>>>   SigCgt:        00000001800000fa
>>>   CapInh:        0000000000000000
>>>   CapPrm:        0000001fffffffff
>>>   CapEff:        0000001fffffffff
>>>   CapBnd:        0000001fffffffff
>>>   Seccomp:       0
>>>   Cpus_allowed:  7fff
>>>   Cpus_allowed_list:     0-14
>>>   Mems_allowed:  00000000,00000001
>>>   Mems_allowed_list:     0
>>>   voluntary_ctxt_switches:       3
>>>   nonvoluntary_ctxt_switches:    1
>>> Signal: 11
>>> Uname: Linux 3.13.0-44-generic x86_64
>>> UserGroups:
>>> CoreDump: base64
>>   ...
>>
>> Yann
>>
>> Le 28/06/2016 09:31, Anoop C S a ?crit :
>>> On Mon, 2016-06-27 at 15:05 +0200, Yann LEMARIE wrote:
>>>>   @Anoop,
>>>>
>>>> Where can I find the coredump file ?
>>>>
>>> You will get hints about the crash from entries inside
>>> /var/log/messages(for example pid of the process, location of
>>> coredump
>>> etc).
>>>
>>>> The crash occurs 2 times last 7 days, each time a sunday
morning
>>>> with
>>>> no reason, no increase of traffic or something like this, the
>>>> volume
>>>> was mounted since 15 days.
>>>>
>>>> The bricks are used as a CDN like, distributting small images
and
>>>> css
>>>> files with a nginx https service (with a load balancer and 2
>>>> EC2), on
>>>> a sunday morning there is not a lot of activity ...
>>>>
>>>  From the very minimal back trace that we have from brick logs I
>>> would
>>> assume that a truncate operation was being handled by trash
>>> translator
>>> and it crashed.
>>>
>>>> Volume infos:
>>>>> root at nfs05 /var/log/glusterfs # gluster volume info cdn
>>>>>   
>>>>> Volume Name: cdn
>>>>> Type: Replicate
>>>>> Volume ID: c53b9bae-5e12-4f13-8217-53d8c96c302c
>>>>> Status: Started
>>>>> Number of Bricks: 1 x 2 = 2
>>>>> Transport-type: tcp
>>>>> Bricks:
>>>>> Brick1: nfs05:/srv/cdn
>>>>> Brick2: nfs06:/srv/cdn
>>>>> Options Reconfigured:
>>>>> performance.readdir-ahead: on
>>>>> features.trash: on
>>>>> features.trash-max-filesize: 20MB
>>>>   
>>>> I don't know if there is a link with this crash problem,
but I
>>>> have
>>>> another problem with my 2 servers that make GluserFS's
clients
>>>> disconnected (from another volume) :
>>>>> Jun 24 02:28:04 nfs05 kernel: [2039468.818617]
xen_netfront:
>>>>> xennet: skb rides the rocket: 19 slots
>>>>> Jun 24 02:28:11 nfs05 kernel: [2039475.744086]
net_ratelimit:
>>>>> 66
>>>>> callbacks suppressed
>>>>   It seem to be a network interface problem :
>>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811
>>>>
>>>> Yann
>>>>
>>>> Le 27/06/2016 12:59, Anoop C S a ?crit :
>>>>> On Mon, 2016-06-27 at 09:47 +0200, Yann LEMARIE wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm using GlusterFS since many years and never see
this
>>>>>> problem,
>>>>>> but
>>>>>> this is the second time in one week ...
>>>>>>
>>>>>> I have 3 volumes with 2 bricks and 1 volume crash with
no
>>>>>> reason,
>>>>> Did you observe the crash while mounting the volume? Or can
you
>>>>> be
>>>>> more
>>>>> specific on what were you doing just before you saw the
crash?
>>>>> Can
>>>>> you
>>>>> please share the output of `gluster volume info
<VOLNAME>`?
>>>>>
>>>>>>   I just have to stop/start the volume to make it up
again.
>>>>>> The only logs I can find are in syslog :
>>>>>>
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: pending
frames:
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: frame :
type(0) op(10)
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: patchset:
>>>>>>> git://git.gluster.com/glusterfs.git
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: signal
received: 11
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: time of crash:
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: 2016-06-26
09:27:44
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: configuration
details:
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: argp 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: backtrace 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: dlfcn 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: libpthread 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: llistxattr 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: setfsid 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: spinlock 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: epoll.h 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: xattr.h 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]:
st_atim.tv_nsec 1
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]:
package-string:
>>>>>>> glusterfs
>>>>>>> 3.7.11
>>>>>>> Jun 26 11:27:44 nfs05 srv-cdn[7879]: ---------
>>>>>>>
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: pending
frames:
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: frame :
type(0) op(10)
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: patchset:
>>>>>>> git://git.gluster.com/glusterfs.git
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: signal
received: 11
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: time of crash:
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: 2016-06-26
09:27:44
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: configuration
details:
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: argp 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: backtrace 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: dlfcn 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: libpthread 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: llistxattr 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: setfsid 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: spinlock 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: epoll.h 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: xattr.h 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]:
st_atim.tv_nsec 1
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]:
package-string:
>>>>>>> glusterfs
>>>>>>> 3.7.11
>>>>>>> Jun 26 11:27:44 nfs06 srv-cdn[1787]: ---------
>>>>>>>
>>>>>>   
>>>>>> Thanks for your help
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>> -- 
>>>>>> Yann Lemari?
>>>>>> iRaiser - Support Technique
>>>>>>   
>>>>>> ylemarie at iraiser.eu
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users at gluster.org
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>   
>>>> -- 
>>>> Yann Lemari?
>>>> iRaiser - Support Technique
>>>>   
>>>> ylemarie at iraiser.eu
>>>>
>>>>
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users at gluster.org
>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>   
>> -- 
>> Yann Lemari?
>> iRaiser - Support Technique
>>   
>> ylemarie at iraiser.eu
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
-- 
Yann Lemari?
iRaiser - Support Technique
iRaiser Logotype <http://www.iraiser.eu>
ylemarie at iraiser.eu <mailto:ylemarie at iraiser.eu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160701/7d856625/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dedihfgc.png
Type: image/png
Size: 1842 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160701/7d856625/attachment.png>

Gluster users - Jul 2016 - GlusterFS 3.7.11 crash issue

[Gluster-users] GlusterFS 3.7.11 crash issue

[Gluster-users] GlusterFS 3.7.11 crash issue

[Gluster-users] GlusterFS 3.7.11 crash issue