Hi Strahil,
so over the last two weeks, the system has been relatively stable. I
have powered off both servers at least once, for about 5 minutes each
time. server came up, auto-healed what it needed to, so all of that
part is working as expected.
will answer things inline and follow with more questions:
>>> Hm... OK. I guess you can try 7.7 whenever it's possible.
>>
>> Acknowledged.
Still on my list.> It could be a bad firmware also. If you get the opportunity, flash the
firmware and bump the OS to the max.
Datacenter says everything was up to date as of installation, not really
wanting them to take the servers offline for long enough to redo all the
hardware.
>>>> more number of CPU cycles than needed, increasing the event
thread
>>>> count
>>>> would enhance the performance of the Red Hat Storage
Server." which
>> is
>>>> why I had it at 8.
>>> Yeah, but you got only 6 cores and they are not dedicated for
>> gluster only. I think that you need to test with lower values.
figured out my magic number for client/server threads, it should be 5.
I set it to 5, observed no change I could attribute to it, so tried 4,
and got the same thing; no visible effect.
>>>> right now the only suggested parameter I haven't played
with is the
>>>> performance.io-thread-count, which I currently have at 64.
>> not really sure what would be a reasonable value for my system.
> I guess you can try to increase it a little bit and check how is it going.
turns out if you try to set this higher than 64, you get an error saying
64 is the max.
>>> What I/O scheduler are you using for the SSDs (you can check via
'cat
>> /sys/block/sdX/queue/scheduler)?
>>
>> # cat /sys/block/vda/queue/scheduler
>> [mq-deadline] none
>
> Deadline prioritizes reads in a 2:1 ratio /default tunings/ . You can
consider testing 'none' if your SSDs are good.
I did this. I would say it did have a positive effect, but it was a
minimal one.
> I see vda , please share details on the infra as this is very important.
Virtual disks have their limitations and if you are on a VM, then there might
be chance to increase the CPU count.
> If you are on a VM, I would recommend you to use more (in numbers) and
smaller disks in stripe sets (either raid0 via mdadm, or pure striped LV).
> Also, if you are on a VM -> there is no reason to reorder your I/O
requests in the VM, just to do it again on the Hypervisour. In such case
'none' can bring better performance, but this varies on the workload.
hm, this is a good question, one I have been asking the datacenter for a
while, but they are a little bit slippery on what exactly it is they
have going on there. They advertise the servers as metal with a virtual
layer. The virtual layer is so you can log into a site and power the
server down or up, mount an ISO to boot from, access a console, and some
other nifty things. can't any more, but when they first introduced the
system, you could even access the BIOS of the server. But apparently,
and they swear up and down by this, it is a physical server, with real
dedicated SSDs and real sticks of RAM. I have found virtio and qemu as
loaded kernel modules, so certainly there is something virtual involved,
but other than that and their nifty little tools, it has always acted
and worked like a metal server to me.
> All necessary data is in the file attributes on the brick. I doubt you will
need to have access times on the brick itself. Another possibility is to use
'relatime'.
remounted all bricks with noatime, no significant difference.
>> cache unless flush-behind is on. So seems that is a way to throw ram
>> to
>> it? I put performance.write-behind-window-size: 512MB and
>> performance.flush-behind: on and the whole system calmed down pretty
>> much immediately. could be just timing, though, will have to see
>> tomorrow during business hours whether the system stays at a reasonable
Tried increasing this to its max of 1GB, no noticeable change from 512MB.
The 2nd server is not acting inline with the first server. glusterfsd
processes are running at 50-80% of a core each, with one brick often
going over 200%, where as they usually stick to 30-45% on the first
server. apache processes consume as much as 90% of a core where as they
rarely go over 15% on the first server, and they frequently stack up to
having more than 100 running at once, which drives load average up to
40-60. It's very much like the first server was before I found the
flush-behind setting, but not as bad; at least it isn't going completely
non-responsive.
Additionally, it is still taking an excessive time to load the first
page of most sites. I am guessing I need to increase read speeds to fix
this, so I have played with
performance.io-cache/cache-max-file-size(slight positive change),
read-ahead/read-ahead-page-count(negative change till page count set to
max of 16, then no noticeable difference), and
rda-cache-limit/rda-request-size(minimal positive effect). I still have
RAM to spare, so would be nice if I could be using it to improve things
on the read side of things, but have found no magic bullet like
flush-behind was.
I found a good number of more options to try, have been going a little
crazy with them, will post them at the bottom. I found a post that
suggested mount options are also important:
https://lists.gluster.org/pipermail/gluster-users/2018-September/034937.html
I confirmed these are in the man pages, so I tried umounting and
re-mounting with the -o option to include these thusly:
mount -t glusterfs moogle:webisms /Computerisms/ -o
negative-timeout=10,attribute-timeout=30,fopen-keep-cache,direct-io-mode=enable,fetch-attempts=5
But I don't think they are working:
/# mount | grep glus
moogle:webisms on /Computerisms type fuse.glusterfs
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
would be grateful if there are any other suggestions anyone can think of.
root at moogle:/# gluster v info
Volume Name: webisms
Type: Distributed-Replicate
Volume ID: 261901e7-60b4-4760-897d-0163beed356e
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: mooglian:/var/GlusterBrick/replset-0/webisms-replset-0
Brick2: moogle:/var/GlusterBrick/replset-0/webisms-replset-0
Brick3: moogle:/var/GlusterBrick/replset-0-arb/webisms-replset-0-arb
(arbiter)
Brick4: moogle:/var/GlusterBrick/replset-1/webisms-replset-1
Brick5: mooglian:/var/GlusterBrick/replset-1/webisms-replset-1
Brick6: mooglian:/var/GlusterBrick/replset-1-arb/webisms-replset-1-arb
(arbiter)
Options Reconfigured:
performance.rda-cache-limit: 1GB
performance.client-io-threads: off
nfs.disable: on
storage.fips-mode-rchecksum: off
transport.address-family: inet
performance.stat-prefetch: on
network.inode-lru-limit: 200000
performance.write-behind-window-size: 1073741824
performance.readdir-ahead: on
performance.io-thread-count: 64
performance.cache-size: 12GB
server.event-threads: 4
client.event-threads: 4
performance.nl-cache-timeout: 600
auth.allow: xxxxxx
performance.open-behind: off
performance.quick-read: off
cluster.lookup-optimize: off
cluster.rebal-throttle: lazy
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-invalidation: on
performance.md-cache-timeout: 600
performance.flush-behind: on
cluster.read-hash-mode: 0
performance.strict-o-direct: on
cluster.readdir-optimize: on
cluster.lookup-unhashed: off
performance.cache-refresh-timeout: 30
performance.enable-least-priority: off
cluster.choose-local: on
performance.rda-request-size: 128KB
performance.read-ahead: on
performance.read-ahead-page-count: 16
performance.cache-max-file-size: 5MB
performance.io-cache: on