thr3ads.net - dovecot - Btrfs RAID-10 performance [Sep 2020]

If this information is useful, please help other people find it:
Share via:

Miloslav Hůla

2020-Sep-10 13:20 UTC

Btrfs RAID-10 performance

Dne 09.09.2020 v 17:52 John Stoffel napsal(a):> Miloslav> There is a one PCIe RAID controller in a chasis. AVAGO
> Miloslav> MegaRAID SAS 9361-8i. And 16x SAS 15k drives conneced to
> Miloslav> it. Because the controller does not support pass-through for
> Miloslav> the drives, we use 16x RAID-0 on controller. So, we get
> Miloslav> /dev/sda ... /dev/sdp (roughly) in OS. And over that we have
> Miloslav> single btrfs RAID-10, composed of 16 devices, mounted as
> Miloslav> /data.
> 
> I will bet that this is one of your bottlenecks as well.  Get a secord
> or third controller and split your disks across them evenly.
That's plan for a next step.
> Miloslav> We run 'rsync' to remote NAS daily. It takes about 6.5
hours to finish,
> Miloslav> 12'265'387 files last night.
>>>
>>> That's.... sucky.  So basically you're hitting the drives
hard with
>>> random IOPs and you're probably running out of performance. 
How much
>>> space are you using on the filesystem?
> 
> Miloslav> It's not so sucky how it seems. rsync runs during the
> Miloslav> night. And even reading is high, server load stays low. We
> Miloslav> have problems with writes.
> 
> Ok.  So putting in an SSD pair to cache things should help.
> 
>>> And why not use brtfs send to ship off snapshots instead of using
>>> rsync?  I'm sure that would be an improvement...
> 
> Miloslav> We run backup to external NAS (NetApp) for a disaster
> Miloslav> recovery scenario.  Moreover NAS is spreaded across multiple
> Miloslav> locations. Then we create NAS snapshot, tens days
> Miloslav> backward. All snapshots easily available via NFS mount. And
> Miloslav> NAS capacity is cheaper.
> 
> So why not run the backend storage on the Netapp, and just keep the
> indexes and such local to the system?  I've run Netapps for many years
> and they work really well.  And then you'd get automatic backups using
> schedule snapshots.
> 
> Keep the index files local on disk/SSDs and put the maildirs out to
> NFSv3 volume(s) on the Netapp(s).  Should do wonders.  And you'll stop
> needing to do rsync at night.
It's the option we have in minds. As you wrote, NetApp is very solid. 
The main reason for local storage is, that IMAP server is completely 
isolated from network. But maybe one day will use it.
> Miloslav> Last half year, we encoutered into performace
> Miloslav> troubles. Server load grows up to 30 in rush hours, due to
> Miloslav> IO waits. We tried to attach next harddrives (the 838G ones
> Miloslav> in a list below) and increase a free space by rebalace. I
> Miloslav> think, it helped a little bit, not not so rapidly.
> 
>>> If you're IOPs bound, but not space bound, then you *really*
want to
>>> get an SSD in there for the indexes and such.  Basically the stuff
>>> that gets written/read from all the time no matter what, but which
>>> isn't large in terms of space.
> 
> Miloslav> Yes. We are now on 66% capacity. Adding SSD for indexes is
> Miloslav> our next step.
> 
> This *should* give you a boost in performance.  But finding a way to
> take before and after latency/performance measurements is key.  I
> would look into using 'fio' to test your latency numbers.  You
might
> also want to try using XFS or even ext4 as your filesystem.  I
> understand not wanting to 'fsck', so that might be right out.
Unfortunately, to quickly fix the problem and make server usable again, 
we already added SSD and moved indexes on it. So we have no measurements 
in old state.

Situation is better, but I guess, problem still exists. I takes some 
time to load be growing. We will see.

Thank you for the fio tip. Definetly I'll try that.

Kind regards
Milo

John Stoffel

2020-Sep-10 15:40 UTC

head link

Btrfs RAID-10 performance

>>>>> "Miloslav" == Miloslav H?la <miloslav.hula at
gmail.com> writes:
Miloslav> Dne 09.09.2020 v 17:52 John Stoffel napsal(a):
Miloslav> There is a one PCIe RAID controller in a chasis. AVAGO
Miloslav> MegaRAID SAS 9361-8i. And 16x SAS 15k drives conneced to
Miloslav> it. Because the controller does not support pass-through for
Miloslav> the drives, we use 16x RAID-0 on controller. So, we get
Miloslav> /dev/sda ... /dev/sdp (roughly) in OS. And over that we have
Miloslav> single btrfs RAID-10, composed of 16 devices, mounted as
Miloslav> /data.>> 
>> I will bet that this is one of your bottlenecks as well.  Get a secord
>> or third controller and split your disks across them evenly.
Miloslav> That's plan for a next step.

Miloslav> We run 'rsync' to remote NAS daily. It takes about 6.5
hours to finish,
Miloslav> 12'265'387 files last night.>>>> 
>>>> That's.... sucky.  So basically you're hitting the
drives hard with
>>>> random IOPs and you're probably running out of performance.
How much
>>>> space are you using on the filesystem?
>> Miloslav> It's not so sucky how it seems. rsync runs during the
Miloslav> night. And even reading is high, server load stays low. We
Miloslav> have problems with writes.>> 
>> Ok.  So putting in an SSD pair to cache things should help.
>> 
>>>> And why not use brtfs send to ship off snapshots instead of
using
>>>> rsync?  I'm sure that would be an improvement...
>> Miloslav> We run backup to external NAS (NetApp) for a disaster
Miloslav> recovery scenario.  Moreover NAS is spreaded across multiple
Miloslav> locations. Then we create NAS snapshot, tens days
Miloslav> backward. All snapshots easily available via NFS mount. And
Miloslav> NAS capacity is cheaper.>> 
>> So why not run the backend storage on the Netapp, and just keep the
>> indexes and such local to the system?  I've run Netapps for many
years
>> and they work really well.  And then you'd get automatic backups
using
>> schedule snapshots.
>> 
>> Keep the index files local on disk/SSDs and put the maildirs out to
>> NFSv3 volume(s) on the Netapp(s).  Should do wonders.  And you'll
stop
>> needing to do rsync at night.
Miloslav> It's the option we have in minds. As you wrote, NetApp is very
solid.
Miloslav> The main reason for local storage is, that IMAP server is
completely
Miloslav> isolated from network. But maybe one day will use it.

It's not completely isolated, it can rsync data to another host that
has access to the Netapp.  *grin*

Miloslav> Unfortunately, to quickly fix the problem and make server
Miloslav> usable again, we already added SSD and moved indexes on
Miloslav> it. So we have no measurements in old state.

That's ok, if it's better, then its better.  How is the load now?
Looking at the output of 'iostat -x 30' might be a good thing.

Miloslav> Situation is better, but I guess, problem still exists. I
Miloslav> takes some time to load be growing. We will see.

Hmm... how did you setup the new indexes volume?  Did you just use
btrfs again?  Did you mirror your SSDs as well?

Do the indexes fill the SSD, or is there 20-30% free space?  When an
SSD gets fragmented, it's performance can drop quite a bit.  Did you
put the SSDs onto a seperate controller?  Probably not.  So now you've
just increased the load on the single controller, when you really
should be spreading it out more to improve things.

Another possible hack would be to move some stuff to a RAM disk,
assuming your server is on a UPS/Generator incase of power loss.  But
that's an unsafe hack. 

Also, do you have quotas turned on?  That's a performance hit for
sure. 

Miloslav> Thank you for the fio tip. Definetly I'll try that.

It's a good way to test and measure how the system will react.
Unfortunately, you will need to do your testing outside of normal work
hours so as to not impact your users too much.

Good luck!   Please post some numbers if you get them.  If you see
only a few disks are 75% or more busy, then *maybe* you have a bad
disk in the system, and moving off that disk or replacing it might
help.  Again, hard to know.

Rebalancing btrfs might also help, especially now that you've moved
the indexes off that volume.

John

Miloslav Hůla

2020-Sep-15 09:14 UTC

head link

Btrfs RAID-10 performance

Dne 10.09.2020 v 17:40 John Stoffel napsal(a):>>> So why not run the backend storage on the Netapp, and just keep the
>>> indexes and such local to the system?  I've run Netapps for
many years
>>> and they work really well.  And then you'd get automatic
backups using
>>> schedule snapshots.
>>>
>>> Keep the index files local on disk/SSDs and put the maildirs out to
>>> NFSv3 volume(s) on the Netapp(s).  Should do wonders.  And
you'll stop
>>> needing to do rsync at night.
> 
> Miloslav> It's the option we have in minds. As you wrote, NetApp is
very solid.
> Miloslav> The main reason for local storage is, that IMAP server is
completely
> Miloslav> isolated from network. But maybe one day will use it.
> 
> It's not completely isolated, it can rsync data to another host that
> has access to the Netapp.  *grin*
:o)
> Miloslav> Unfortunately, to quickly fix the problem and make server
> Miloslav> usable again, we already added SSD and moved indexes on
> Miloslav> it. So we have no measurements in old state.
> 
> That's ok, if it's better, then its better.  How is the load now?
> Looking at the output of 'iostat -x 30' might be a good thing.
Load is between 1 and 2. We can live with that for now.
> Miloslav> Situation is better, but I guess, problem still exists. I
> Miloslav> takes some time to load be growing. We will see.
> 
> Hmm... how did you setup the new indexes volume?  Did you just use
> btrfs again?  Did you mirror your SSDs as well?
Yes. Just two SSD into free slots, propagate them as two RAID-0 into OS 
and btrfs RAID-1.

It is a nasty, I know, but without outage. It is a just quick attempt to 
improve the situation. Our next plan is to buy more controllers, 
schedule an outage on weekend and do it properly.
> Do the indexes fill the SSD, or is there 20-30% free space?  When an
> SSD gets fragmented, it's performance can drop quite a bit.  Did you
> put the SSDs onto a seperate controller?  Probably not.  So now you've
> just increased the load on the single controller, when you really
> should be spreading it out more to improve things.
SSD are almost empty, 2.4GB of 93GB is used after 'doveadm index' on all
mailboxes.
> Another possible hack would be to move some stuff to a RAM disk,
> assuming your server is on a UPS/Generator incase of power loss.  But
> that's an unsafe hack.
> 
> Also, do you have quotas turned on?  That's a performance hit for
> sure.
No, we are running without quotas.
> Miloslav> Thank you for the fio tip. Definetly I'll try that.
> 
> It's a good way to test and measure how the system will react.
> Unfortunately, you will need to do your testing outside of normal work
> hours so as to not impact your users too much.
> 
> Good luck!   Please post some numbers if you get them.  If you see
> only a few disks are 75% or more busy, then *maybe* you have a bad
> disk in the system, and moving off that disk or replacing it might
> help.  Again, hard to know.
> 
> Rebalancing btrfs might also help, especially now that you've moved
> the indexes off that volume.
> 
> John
Thank you
Milo

Apparently Analagous Threads

Search for more reasonably related threads

dovecot - Sep 2020 - Btrfs RAID-10 performance

Btrfs RAID-10 performance

Btrfs RAID-10 performance

Btrfs RAID-10 performance

Apparently Analagous Threads