Pete French
2016-Nov-21 17:47 UTC
Help! two machines ran out of swap and corrupted their zpools!
So, I am off sick and my colleagues decided to load test our set of five servers excesively. All ran out of swap. So far so irritating, but whats has happened is that twoof them now will not boot, as it appears the ZFS pool they are booting from has become corrupted. One starts to boot, then crases importing the root pool. The other doenst even get that far with gptzfsboot saying it can't find the pool to boot from! Now I can recover these, but I am a bit worried, that it got like this at all, as I havent ever seen ZFS corrupt a pool like this. Anyone got any insights, or suggstions as to how to stop it happening again ? We are swapping to a separate partition, not to the pool by theway. -pete.
Jan Bramkamp
2016-Nov-21 17:50 UTC
Help! two machines ran out of swap and corrupted their zpools!
On 21/11/2016 18:47, Pete French wrote:> So, I am off sick and my colleagues decided to load test our set of five > servers excesively. All ran out of swap. So far so irritating, but whats has > happened is that twoof them now will not boot, as it appears the ZFS pool > they are booting from has become corrupted. > > One starts to boot, then crases importing the root pool. The other doenst > even get that far with gptzfsboot saying it can't find the pool to boot from! > > Now I can recover these, but I am a bit worried, that it got like this at > all, as I havent ever seen ZFS corrupt a pool like this. Anyone got any insights, > or suggstions as to how to stop it happening again ? > > We are swapping to a separate partition, not to the pool by theway.How much trust do you put in your hardware? Have you ever put the hardware under full load for extended periods before e.g. run poudriere to build pkg repos? -- Jan Bramkamp
Volodymyr Kostyrko
2016-Nov-21 18:16 UTC
Help! two machines ran out of swap and corrupted their zpools!
Pete French wrote:> So, I am off sick and my colleagues decided to load test our set of five > servers excesively. All ran out of swap. So far so irritating, but whats has > happened is that twoof them now will not boot, as it appears the ZFS pool > they are booting from has become corrupted. > > One starts to boot, then crases importing the root pool. The other doenst > even get that far with gptzfsboot saying it can't find the pool to boot from! > > Now I can recover these, but I am a bit worried, that it got like this at > all, as I havent ever seen ZFS corrupt a pool like this. Anyone got any insights, > or suggstions as to how to stop it happening again ? > > We are swapping to a separate partition, not to the pool by theway.Good. Try downloading live disc or mfsBSD and importing pool r/o from there. zpool import -N -O readonly=on -f -R /mnt/somezpoool If that doesn't help try: zpool import -N -O readonly=on -f -R /mnt/somezpoool -Fn Drop us a line of your configuration and used ZFS features. Like dedup, snapshots, external l2 logs and caches. -- Sphinx of black quartz judge my vow.
Gary Palmer
2016-Nov-21 18:29 UTC
Help! two machines ran out of swap and corrupted their zpools!
On Mon, Nov 21, 2016 at 05:47:29PM +0000, Pete French wrote:> So, I am off sick and my colleagues decided to load test our set of five > servers excesively. All ran out of swap. So far so irritating, but whats has > happened is that twoof them now will not boot, as it appears the ZFS pool > they are booting from has become corrupted. > > One starts to boot, then crases importing the root pool. The other doenst > even get that far with gptzfsboot saying it can't find the pool to boot from! > > Now I can recover these, but I am a bit worried, that it got like this at > all, as I havent ever seen ZFS corrupt a pool like this. Anyone got any insights, > or suggstions as to how to stop it happening again ? > > We are swapping to a separate partition, not to the pool by theway.Silly question - have you checked that the swap partition does not overlap your boot pool partition? It could well be that the end of the swap partition intrudes into the affected ZFS pool Gary
Steven Hartland
2016-Nov-22 09:40 UTC
Help! two machines ran out of swap and corrupted their zpools!
When you say corrupt what do you mean, specifically what's the output from zpool status? One thing that springs to mind if zpool status doesn't show any issues, and: 1. You have large disks 2. You have performed an update and not rebooted since. You may be at the scenario where there's enough data on the pool such that the kernel / loader are out range of the BIOS. All depends on exactly what you're seeing? On 21/11/2016 17:47, Pete French wrote:> So, I am off sick and my colleagues decided to load test our set of five > servers excesively. All ran out of swap. So far so irritating, but whats has > happened is that twoof them now will not boot, as it appears the ZFS pool > they are booting from has become corrupted. > > One starts to boot, then crases importing the root pool. The other doenst > even get that far with gptzfsboot saying it can't find the pool to boot from! > > Now I can recover these, but I am a bit worried, that it got like this at > all, as I havent ever seen ZFS corrupt a pool like this. Anyone got any insights, > or suggstions as to how to stop it happening again ? > > We are swapping to a separate partition, not to the pool by theway. > > -pete. > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"