On 05/11/2017 05:49 PM, Niels de Vos wrote:> On Wed, May 10, 2017 at 09:08:03PM +0530, Pranith Kumar Karampuri wrote:
>> On Wed, May 10, 2017 at 7:11 PM, Niels de Vos <ndevos at
redhat.com> wrote:
>>
>>> On Wed, May 10, 2017 at 04:08:22PM +0530, Pranith Kumar Karampuri
wrote:
>>>> On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos at
redhat.com> wrote:
>>>>
>>>>> ...
>>>>>>> client from
>>>>>>>
srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
>>>>>>> (version: 3.8.11)
>>>>>>> [2017-05-08 10:01:06.237433] E [MSGID: 113107]
>>>>> [posix.c:1079:posix_seek]
>>>>>>> 0-datastore2-posix: seek failed on fd 18 length
42957209600 [No
>>> such
>>>>>>> device or address]
>>>>> The SEEK procedure translates to lseek() in the posix
xlator. This can
>>>>> return with "No suck device or address" (ENXIO)
in only one case:
>>>>>
>>>>> ENXIO whence is SEEK_DATA or SEEK_HOLE, and the
file offset is
>>>>> beyond the end of the file.
>>>>>
>>>>> This means that an lseek() was executed where the current
offset of the
>>>>> filedescriptor was higher than the size of the file.
I'm not sure how
>>>>> that could happen... Sharding prevents using SEEK at all
atm.
>>>>>
>>>>> ...
>>>>>>> The strange part is that I cannot seem to find any
other error.
>>>>>>> If I restart the VM everything works as expected
(it stopped at
>>> ~9.51
>>>>>>> UTC and was started at ~10.01 UTC) .
>>>>>>>
>>>>>>> This is not the first time that this happened, and
I do not see any
>>>>>>> problems with networking or the hosts.
>>>>>>>
>>>>>>> Gluster version is 3.8.11
>>>>>>> this is the incriminated volume (though it happened
on a different
>>> one
>>>>> too)
>>>>>>> Volume Name: datastore2
>>>>>>> Type: Replicate
>>>>>>> Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
>>>>>>> Status: Started
>>>>>>> Snapshot Count: 0
>>>>>>> Number of Bricks: 1 x (2 + 1) = 3
>>>>>>> Transport-type: tcp
>>>>>>> Bricks:
>>>>>>> Brick1: srvpve2g:/data/brick2/brick
>>>>>>> Brick2: srvpve3g:/data/brick2/brick
>>>>>>> Brick3: srvpve1g:/data/brick2/brick (arbiter)
>>>>>>> Options Reconfigured:
>>>>>>> nfs.disable: on
>>>>>>> performance.readdir-ahead: on
>>>>>>> transport.address-family: inet
>>>>>>>
>>>>>>> Any hint on how to dig more deeply into the reason
would be greatly
>>>>>>> appreciated.
>>>>> Probably the problem is with SEEK support in the arbiter
functionality.
>>>>> Just like with a READ or a WRITE on the arbiter brick, SEEK
can only
>>>>> succeed on bricks where the files with content are located.
It does not
>>>>> look like arbiter handles SEEK, so the offset in lseek()
will likely be
>>>>> higher than the size of the file on the brick (empty, 0
size file). I
>>>>> don't know how the replication xlator responds on an
error return from
>>>>> SEEK on one of the bricks, but I doubt it likes it.
>>>>>
>>>> inode-read fops don't get sent to arbiter brick. So this
won't happen.
>>> Yes, I see that the arbiter xlator returns on reads without going
to the
>>> bricks. Should that not be done for seek as well? It's the
first time I
>>> actually looked at the code of the arbiter xlator, so I might well
be
>>> misunderstanding how it works :)
>>>
>> inode-read fops are the fops which read some information from the
inode.
>> Like stat/getxattr/read. Even seek falls in that category. It is not
sent
>> on arbiter brick...
> What confuses me is that the arbiter xlator defines the following FOPs
> in xlators/features/arbiter/src/arbiter.c:
AFR has a list of readable subvols on which all read related FOPS are
wound. For arbiter volumes, we mark the arbiter as non-readable during
lookup cbk.
So any read FOP is not wound to arbiter anymore. This change was made at
a later stage after arbiter_readv was coded initially to send an error.
So in the current code, arbiter_readv should never get hit.
> struct xlator_fops fops = {
> .lookup = arbiter_lookup,
> .readv = arbiter_readv,
> .truncate = arbiter_truncate,
> .writev = arbiter_writev,
> .ftruncate = arbiter_ftruncate,
> .fallocate = arbiter_fallocate,
> .discard = arbiter_discard,
> .zerofill = arbiter_zerofill,
> };
>
>
> To go back to the error message:
>
> [posix.c:1079:posix_seek] 0-datastore2-posix: seek failed on fd 18
length 42957209600 [No such device or address]
>
> We need to know on which brick this occurs to confirm that is was not
> sent on the arbiter brick somehow.
This is what Alessandro said earlier in the thread:
"Also the seek errors where there before when there was no arbiter (only
2 replica).">
> Thanks,
> Niels
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> lists.gluster.org/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<lists.gluster.org/pipermail/gluster-users/attachments/20170511/e87974da/attachment.html>