thr3ads.net - Gluster users - [Gluster-users] VM going down [May 2017]

If this information is useful, please help other people find it:
Share via:

Niels de Vos

2017-May-09 14:10 UTC

[Gluster-users] VM going down

...> > client from
> > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > (version: 3.8.11)
> > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
> > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
> > device or address]
The SEEK procedure translates to lseek() in the posix xlator. This can
return with "No suck device or address" (ENXIO) in only one case:

    ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
             beyond the end of the file.

This means that an lseek() was executed where the current offset of the
filedescriptor was higher than the size of the file. I'm not sure how
that could happen... Sharding prevents using SEEK at all atm.

...> > The strange part is that I cannot seem to find any other error.
> > If I restart the VM everything works as expected (it stopped at ~9.51
> > UTC and was started at ~10.01 UTC) .
> >
> > This is not the first time that this happened, and I do not see any
> > problems with networking or the hosts.
> >
> > Gluster version is 3.8.11
> > this is the incriminated volume (though it happened on a different one
too)
> >
> > Volume Name: datastore2
> > Type: Replicate
> > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 1 x (2 + 1) = 3
> > Transport-type: tcp
> > Bricks:
> > Brick1: srvpve2g:/data/brick2/brick
> > Brick2: srvpve3g:/data/brick2/brick
> > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > Options Reconfigured:
> > nfs.disable: on
> > performance.readdir-ahead: on
> > transport.address-family: inet
> >
> > Any hint on how to dig more deeply into the reason would be greatly
> > appreciated.
Probably the problem is with SEEK support in the arbiter functionality.
Just like with a READ or a WRITE on the arbiter brick, SEEK can only
succeed on bricks where the files with content are located. It does not
look like arbiter handles SEEK, so the offset in lseek() will likely be
higher than the size of the file on the brick (empty, 0 size file). I
don't know how the replication xlator responds on an error return from
SEEK on one of the bricks, but I doubt it likes it.

We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
SEEK for sharding. I suggest you open a bug for getting SEEK in the
arbiter xlator as well.

HTH,
Niels
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170509/bfe1505e/attachment.sig>

Alessandro Briosi

2017-May-09 14:59 UTC

head link

[Gluster-users] VM going down

Il 09/05/2017 16:10, Niels de Vos ha scritto:> ...
>>> client from
>>> srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
>>> (version: 3.8.11)
>>> [2017-05-08 10:01:06.237433] E [MSGID: 113107]
[posix.c:1079:posix_seek]
>>> 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
>>> device or address]
> The SEEK procedure translates to lseek() in the posix xlator. This can
> return with "No suck device or address" (ENXIO) in only one case:
>
>     ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
>              beyond the end of the file.
>
> This means that an lseek() was executed where the current offset of the
> filedescriptor was higher than the size of the file. I'm not sure how
> that could happen... Sharding prevents using SEEK at all atm.
>
> ...
>>> The strange part is that I cannot seem to find any other error.
>>> If I restart the VM everything works as expected (it stopped at
~9.51
>>> UTC and was started at ~10.01 UTC) .
>>>
>>> This is not the first time that this happened, and I do not see any
>>> problems with networking or the hosts.
>>>
>>> Gluster version is 3.8.11
>>> this is the incriminated volume (though it happened on a different
one too)
>>>
>>> Volume Name: datastore2
>>> Type: Replicate
>>> Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: srvpve2g:/data/brick2/brick
>>> Brick2: srvpve3g:/data/brick2/brick
>>> Brick3: srvpve1g:/data/brick2/brick (arbiter)
>>> Options Reconfigured:
>>> nfs.disable: on
>>> performance.readdir-ahead: on
>>> transport.address-family: inet
>>>
>>> Any hint on how to dig more deeply into the reason would be greatly
>>> appreciated.
> Probably the problem is with SEEK support in the arbiter functionality.
> Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> succeed on bricks where the files with content are located. It does not
> look like arbiter handles SEEK, so the offset in lseek() will likely be
> higher than the size of the file on the brick (empty, 0 size file). I
> don't know how the replication xlator responds on an error return from
> SEEK on one of the bricks, but I doubt it likes it.
>
> We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
> SEEK for sharding. I suggest you open a bug for getting SEEK in the
> arbiter xlator as well.
Well, I have not really clear the internals of gluster, but the arbiter
is not the host where the vm is running.
If gluster is aware of the arbiter, it should not look for data on that
brick beside metadata and "quorum".

Also the seek errors where there before when there was no arbiter (only
2 replica).
And finally seek error is triggered when the VM is started (at least the
one in the logs).


Alessandro
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170509/6f11284a/attachment.html>

Pranith Kumar Karampuri

2017-May-10 10:38 UTC

head link

[Gluster-users] VM going down

On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos at redhat.com> wrote:
> ...
> > > client from
> > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > (version: 3.8.11)
> > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> [posix.c:1079:posix_seek]
> > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
> > > device or address]
>
> The SEEK procedure translates to lseek() in the posix xlator. This can
> return with "No suck device or address" (ENXIO) in only one case:
>
>     ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
>              beyond the end of the file.
>
> This means that an lseek() was executed where the current offset of the
> filedescriptor was higher than the size of the file. I'm not sure how
> that could happen... Sharding prevents using SEEK at all atm.
>
> ...
> > > The strange part is that I cannot seem to find any other error.
> > > If I restart the VM everything works as expected (it stopped at
~9.51
> > > UTC and was started at ~10.01 UTC) .
> > >
> > > This is not the first time that this happened, and I do not see
any
> > > problems with networking or the hosts.
> > >
> > > Gluster version is 3.8.11
> > > this is the incriminated volume (though it happened on a
different one
> too)
> > >
> > > Volume Name: datastore2
> > > Type: Replicate
> > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: srvpve2g:/data/brick2/brick
> > > Brick2: srvpve3g:/data/brick2/brick
> > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > Options Reconfigured:
> > > nfs.disable: on
> > > performance.readdir-ahead: on
> > > transport.address-family: inet
> > >
> > > Any hint on how to dig more deeply into the reason would be
greatly
> > > appreciated.
>
> Probably the problem is with SEEK support in the arbiter functionality.
> Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> succeed on bricks where the files with content are located. It does not
> look like arbiter handles SEEK, so the offset in lseek() will likely be
> higher than the size of the file on the brick (empty, 0 size file). I
> don't know how the replication xlator responds on an error return from
> SEEK on one of the bricks, but I doubt it likes it.
>
inode-read fops don't get sent to arbiter brick. So this won't happen.

>
> We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
> SEEK for sharding. I suggest you open a bug for getting SEEK in the
> arbiter xlator as well.
>
> HTH,
> Niels
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>


-- 
Pranith
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170510/ccf38dfb/attachment.html>

Krutika Dhananjay

2017-May-11 07:05 UTC

head link

[Gluster-users] VM going down

Niels,

Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.

Copy-pasting volume-info output from the first mail:

Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet


-Krutika


On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos at redhat.com> wrote:
> ...
> > > client from
> > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > (version: 3.8.11)
> > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> [posix.c:1079:posix_seek]
> > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
such
> > > device or address]
>
> The SEEK procedure translates to lseek() in the posix xlator. This can
> return with "No suck device or address" (ENXIO) in only one case:
>
>     ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
>              beyond the end of the file.
>
> This means that an lseek() was executed where the current offset of the
> filedescriptor was higher than the size of the file. I'm not sure how
> that could happen... Sharding prevents using SEEK at all atm.
>
> ...
> > > The strange part is that I cannot seem to find any other error.
> > > If I restart the VM everything works as expected (it stopped at
~9.51
> > > UTC and was started at ~10.01 UTC) .
> > >
> > > This is not the first time that this happened, and I do not see
any
> > > problems with networking or the hosts.
> > >
> > > Gluster version is 3.8.11
> > > this is the incriminated volume (though it happened on a
different one
> too)
> > >
> > > Volume Name: datastore2
> > > Type: Replicate
> > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: srvpve2g:/data/brick2/brick
> > > Brick2: srvpve3g:/data/brick2/brick
> > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > Options Reconfigured:
> > > nfs.disable: on
> > > performance.readdir-ahead: on
> > > transport.address-family: inet
> > >
> > > Any hint on how to dig more deeply into the reason would be
greatly
> > > appreciated.
>
> Probably the problem is with SEEK support in the arbiter functionality.
> Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> succeed on bricks where the files with content are located. It does not
> look like arbiter handles SEEK, so the offset in lseek() will likely be
> higher than the size of the file on the brick (empty, 0 size file). I
> don't know how the replication xlator responds on an error return from
> SEEK on one of the bricks, but I doubt it likes it.
>
> We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
> SEEK for sharding. I suggest you open a bug for getting SEEK in the
> arbiter xlator as well.
>
> HTH,
> Niels
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.gluster.org/pipermail/gluster-users/attachments/20170511/0e8606c9/attachment.html>

Gluster users - May 2017 - VM going down

[Gluster-users] VM going down

[Gluster-users] VM going down

[Gluster-users] VM going down

[Gluster-users] VM going down