thr3ads.net - freebsd stable - Spectre/Meltdown mitigation in 11.1-p10 bogging down zfs send/receive? [May 2018]

If this information is useful, please help other people find it:
Share via:

Patrick M. Hausen

2018-May-14 15:35 UTC

Spectre/Meltdown mitigation in 11.1-p10 bogging down zfs send/receive?

Hey guys,

as some might know we run our hosting products in ZFS and iocage based
jails. The backup concept relies on recurring local snapshots and a copy of
these on one (more planned) central storage server. The storage server
does essentially nothing but run zfs receive for each dataset on each
hosting node. 12x spinning rust and 128G of RAM. Lots of space ;-)

In preparation of rolling out (among other patches) the Meltdown and Spectre
mitigation fixes and microcode updates we already ran benchmarks that
measured our primary applications - the TYPO3 and Neos CMS. We did not
see much of an impact.

We updated that central storage system last Friday.

Today we provisioned a new server meaning a new hosting hardware and
a couple of jails on that one. The new system already has got all the latest
patches.
Part of the provisioning process is creating an initial snapshot of every
dataset
and sending an initial copy to the storage server, so we can send nightly
incrementals.

That step took surprisingly long for the first of the new jails.

At least an order of magnitude, I cannot provide exact measurements yet,
because this is all part of rather complex Ansible task and it really caught us
by
surprise.

We already received a couple of warnings from the Icinga service monitoring
the nightly replication runs - we still need to investigate this. We suspect
they
ran slower than usual, too.

To narrow down the cause of the problem we tried this in chronological order:

1. storage server (receiving end):

Disable microcode update and hw.ibrs_active		still slow
Disable vm.pmap.pti							still slow

2. new jail host (sending end):

Disable both									fast
Re-enable microcode update and hw.ibrs_active		still fast
Re-enable vm.pmap.pti							still fast

Reboot as necessary, of course. And we double checked the current value
of the respective sysctls before running the tests.

That last step is *quite* unexpected, because it just does not make sense to me.


Does anybody know what impact the fixes, both PTI and IBRS are *expected*
to have on bulk zfs send/receive operations from/to two different hosts?

Possibly we are on the wrong track altogether. We suspected the CPU fixes
because of the general "what did you change last" approach ...


Thank you very much
Patrick
-- 
punkt.de GmbH			Internet - Dienstleistungen - Beratung
Kaiserallee 13a			Tel.: 0721 9109-0 Fax: -100
76133 Karlsruhe			info at punkt.de	http://punkt.de
AG Mannheim 108285		Gf: Juergen Egeling

Patrick M. Hausen

2018-May-14 15:48 UTC

head link

Spectre/Meltdown mitigation in 11.1-p10 bogging down zfs send/receive?

Hi!
> Am 14.05.2018 um 17:35 schrieb Patrick M. Hausen <hausen at
punkt.de>:
> Possibly we are on the wrong track altogether.
We were - please just forget it ...

ZFS scrub running during our activity ... everybody who already put
more than five minutes of thought into this deserves a beer at the next
EuroBSDCon ;-)

Patrick
-- 
punkt.de GmbH			Internet - Dienstleistungen - Beratung
Kaiserallee 13a			Tel.: 0721 9109-0 Fax: -100
76133 Karlsruhe			info at punkt.de	http://punkt.de
AG Mannheim 108285		Gf: Juergen Egeling

freebsd stable - May 2018 - Spectre/Meltdown mitigation in 11.1-p10 bogging down zfs send/receive?

Spectre/Meltdown mitigation in 11.1-p10 bogging down zfs send/receive?

Spectre/Meltdown mitigation in 11.1-p10 bogging down zfs send/receive?