Il 12/05/2017 11:36, Niels de Vos ha scritto:
> On Thu, May 11, 2017 at 03:49:27PM +0200, Alessandro Briosi wrote:
>> Il 11/05/2017 14:09, Niels de Vos ha scritto:
>>> On Thu, May 11, 2017 at 12:35:42PM +0530, Krutika Dhananjay wrote:
>>>> Niels,
>>>>
>>>> Allesandro's configuration does not have shard enabled. So
it has
>>>> definitely not got anything to do with shard not supporting
seek fop.
>>> Yes, but in case sharding would have been enabled, the seek FOP
would be
>>> handled correctly (detected as not supported at all).
>>>
>>> I'm still not sure how arbiter prevents doing shards though. We
normally
>>> advise to use sharding *and* (optional) arbiter for VM workloads,
>>> arbiter without sharding has not been tested much. In addition, the
seek
>>> functionality is only available in recent kernels, so there has
been
>>> little testing on CentOS or similar enterprise Linux distributions.
>> Where is stated that arbiter should be used with sharding?
>> Or that arbiter functionality without sharding is still in
"testing" phase?
>> I thought that having a 3 replica on a 3 nodes cluster would have been
a
>> waste of space. (I can only support loosing 1 host at a time, and
that's
>> fine.)
> There is no "arbiter should be used with sharding", our
recommendations
> are to use sharding for VM workloads, with an optional arbiter. But we
> still expect VMs on non-sharded volumes to work just fine, with or
> without arbiter.
Sure and I'd like to use it. Though as there were corruption bug
recently I preferred not using it yet.
>> Anyway I had this happen also before with the same VM when there was no
>> arbiter, and I thought it was for some strange reason a
"quorum" thing
>> which would trigger the file not beeing available in gluster, thogh
>> there were no clues in the logs.
>> So I added the arbiter brick, but it happened again last week.
> If it is always the same VM, I wonder if there could be a small
> filesystem corruption in that VM? Were there any actions done on the
> storage of that VM, like resizing the block-device (VM image) or
> something like that? Systems can sometimes try to access data outside of
> the block device when it was resized, but the filesystem on the block
> device was not. This would 'trick' the filesystem in thinking it
has
> more space to access than the block device has. If the filesystem access
> in the VM is 'passed the block device', and this gets through to
Gluster
> which does a seek with that too large offset, the log you posted would
> be a result.
>
The problem was on only 1 VM, but now it extended to another one, that's
why I started reporting.
>> The first VM I reported about going down was created on a volume with
>> arbiter enabled from the start, so I dubt it's something to do with
arbiter.
>>
>> I think it might have something to do with a load problem ? Though the
>> hosts are really not beeing used that much.
>>
>> Anyway this is a brief description of my setup.
>>
>> 3 dell servers with RAID 10 SAS Disks
>> each server has 2 bonded 1Gbps ethernets dedicated to gluster (2
>> dedicated to the proxmox cluster and 2 for comunication with the hosts
>> on the LAN) (each on it's VLAN in the switch)
>> Also jumbo frames are enabled on ethernets and switches.
>>
>> each server is a proxmox host which has gluster installed and
configured
>> as server and client.
> Do you know how proxmox accesses the VM images? Does it use QEMU+gfapi
> or is it all over a FUSE mount? New versions of QEMU+gfapi have seek
> support, and only new versions of the Linux kernel support seek over
> FUSE. In order to track where the problem may be, we need to look into
> the client (QEMU or FUSE) that does the seek with an invalid offset.
it uses quem+gfapi afaik
-drive
file=gluster://srvpve1g/datastore1/images/101/vm-101-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,cache=none,aio=native,detect-zeroes=on
-device
virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100
>> The RAID has a LVM thin provisioned which is divided into 3 bricks (2
>> big for the data and 1 small for the arbiter).
>> each Thin LVM is XFS formatted and mounted as brick.
>> There are 3 volumes configured which replicate 3 with arbiter (so 2
>> really holding the data).
>> Volumes are:
>> datastore1: data on srv1 and srv2, arbiter srv3
>> datastore2: data on srv2 and srv3, arbiter srv1
>> datastore3: data on srv1 and srv3, arbiter srv2
>>
>> On each datastore basically there is a main VM (plus some others which
>> though are not so important). (3 VM are mainly important)
>>
>> datastore1 was converted from 2 replica to 3 replica with arbiter, the
>> other 2 were created as described.
>>
>> The VM on the first datastore crashed more times (even where there was
>> no arbiter, which I thought for some reason there was a split brain
>> which gluster could not handle).
>>
>> Last week also the 2nd VM (on datastore2) crashed, and that's when
I
>> started the thread (before as there were no special errors logged I
>> thought it could have been caused by something in the VM)
>>
>> Till now the 3rd VM never crashed.
>>
>> Still any help on this would be really appreciated.
>>
>> I know it could also be a problem somewhere else, but I have other
>> setups without gluster which simply work.
>> That's why I want to start the VM with gdb, to check next time why
the
>> kvm process shuts down.
> If the problem in the log from the brick is any clue, I would say that
> QEMU aborts when the seek failed. Somehow the seek got executed with a
> too high offset (passed the size of the file), and that returned an
> error.
>
> We'll need to find out what makes QEMU (or FUSE) think the file is
> larger than it actually is on the brick. If you have a way of reprodcing
> it, you could enable more verbose logging on the client side
> (diagnostics.client-log-level volume option), but if you run many VMs,
> that may accumilate a lot of logs.
>
> You probably should open a bug so that we have all the troubleshooting
> and debugging details in one location. Once we find the problem we can
> move the bug to the right component.
> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS
>
> HTH,
> Niels
The thing is that when the VM is down and I check the logs there's nothing.
Then when I start the VM the logs get populated with the seek error.
Anyway I'll open a bug for this.
Alessandro