Hi. I installed FreeBSD 7 a few days ago and upgraded to the latest stable release using GENERIC kernel. I also added these entries to /boot/loader.conf: vm.kmem_size="1536M" vm.kmem_size_max="1536M" vfs.zfs.prefetch_disable=1 Initially prefetch was enabled and I would experience hangs but after disabling prefetch copying large amounts of data would go along without problems. To see if FreeBSD 8 (current) had better (copy) performance I upgraded to current as of yesterday. After upgrading and rebooting the server responded fine. The server is a supermicro with a quad-core harpertown e5405 with two internal sata-drives and 8 GB of ram. I installed an areca arc-1680 sas-controller and configured it in jbod-mode. I attached an external sas-cabinet with 16 sas-disks at 1 TB (931 binary GB). I created a raidz2 pool with 10 disks and added one spare. I copied approx. 1 TB of small files (each approx. 1 MB) and during the copy I simulated a disk-crash by pulling one of the disks out of the cabinet. Zfs did not activate the spare and the copying stopped until I rebooted after 5-10 minutes. When I performed a 'zpool status' the command would not complete. I did not see any messages in /var/log/message. State in top showed 'ufs-'. A similar test on solaris express developer edition b79 activated the spare after zfs tried to write to the missing disk enough times and then marked it as faulted. Has any one else tried to simulate a disk-crash in raidz(2) and succeeded? -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare
Claus Guttesen wrote:> Hi. > > I installed FreeBSD 7 a few days ago and upgraded to the latest stable > release using GENERIC kernel. I also added these entries to > /boot/loader.conf: > > vm.kmem_size="1536M" > vm.kmem_size_max="1536M" > vfs.zfs.prefetch_disable=1 > > Initially prefetch was enabled and I would experience hangs but after > disabling prefetch copying large amounts of data would go along > without problems. To see if FreeBSD 8 (current) had better (copy) > performance I upgraded to current as of yesterday. After upgrading and > rebooting the server responded fine. > > The server is a supermicro with a quad-core harpertown e5405 with two > internal sata-drives and 8 GB of ram. I installed an areca arc-1680 > sas-controller and configured it in jbod-mode. I attached an external > sas-cabinet with 16 sas-disks at 1 TB (931 binary GB). > > I created a raidz2 pool with 10 disks and added one spare. I copied > approx. 1 TB of small files (each approx. 1 MB) and during the copy I > simulated a disk-crash by pulling one of the disks out of the cabinet. > Zfs did not activate the spare and the copying stopped until I > rebooted after 5-10 minutes. When I performed a 'zpool status' the > command would not complete. I did not see any messages in > /var/log/message. State in top showed 'ufs-'.That means that it was UFS that hung, not ZFS. What was the process backtrace, and what role does UFS play on this system? Kris> A similar test on solaris express developer edition b79 activated the > spare after zfs tried to write to the missing disk enough times and > then marked it as faulted. Has any one else tried to simulate a > disk-crash in raidz(2) and succeeded? >
On Fri, Jul 25, 2008 at 09:46:34AM +0200, Claus Guttesen wrote:> Hi. > > I installed FreeBSD 7 a few days ago and upgraded to the latest stable > release using GENERIC kernel. I also added these entries to > /boot/loader.conf: > > vm.kmem_size="1536M" > vm.kmem_size_max="1536M" > vfs.zfs.prefetch_disable=1 > > Initially prefetch was enabled and I would experience hangs but after > disabling prefetch copying large amounts of data would go along > without problems. To see if FreeBSD 8 (current) had better (copy) > performance I upgraded to current as of yesterday. After upgrading and > rebooting the server responded fine.With regards to RELENG_7, I completely agree with disabling prefetch. The overall performance (of the system and disk I/O) appears signicantly "smoother", e.g. less hard lock-ups and stalls, is better when prefetch is disabled. I have not tried CURRENT. I'm told the ZFS code in CURRENT is the same as RELENG_7, so I'm not sure what you were trying to test by switching from RELENG_7 to CURRENT.> The server is a supermicro with a quad-core harpertown e5405 with two > internal sata-drives and 8 GB of ram. I installed an areca arc-1680 > sas-controller and configured it in jbod-mode. I attached an external > sas-cabinet with 16 sas-disks at 1 TB (931 binary GB). > > I created a raidz2 pool with 10 disks and added one spare. I copied > approx. 1 TB of small files (each approx. 1 MB) and during the copy I > simulated a disk-crash by pulling one of the disks out of the cabinet. > Zfs did not activate the spare and the copying stopped until I > rebooted after 5-10 minutes. When I performed a 'zpool status' the > command would not complete. I did not see any messages in > /var/log/message. State in top showed 'ufs-'. > > A similar test on solaris express developer edition b79 activated the > spare after zfs tried to write to the missing disk enough times and > then marked it as faulted. Has any one else tried to simulate a > disk-crash in raidz(2) and succeeded?Is there any way to confirm the behaviour is specific to raidz2, or would it affect raidz1 as well? I have a raidz1 pool at home (3 disks though; pulling one will probably result in bad things) which I can pull a disk from, though it's off of an ICHx controller. I have no experience with Areca controllers or their driver, but I do have experience with standard onboard Intel ICHx chips. WRT those chips, "pulling disks" without administratively downing the ATA channel will cause a kernel panic. If the Areca controller/driver handles things better, great. I'm trying to say that I can offer to help with raidz1, but not on Areca controllers. The hardware is similar to yours; Supermicro PDSMi+, Intel E6600 (C2D), 4GB RAM, running RELENG_7 amd64. System contains 4 disks, ad6,8,10 are in a ZFS pool, ad4 is the OS disk: ad4: 190782MB <WDC WD2000JD-00HBB0 08.02D08> at ata2-master SATA150 ad6: 476940MB <WDC WD5000AAKS-00YGA0 12.01C02> at ata3-master SATA300 ad8: 476940MB <WDC WD5000AAKS-00TMA0 12.01C01> at ata4-master SATA300 ad10: 476940MB <WDC WD5000AAKS-00TMA0 12.01C01> at ata5-master SATA300 NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 raidz1 ONLINE 0 0 0 ad6 ONLINE 0 0 0 ad8 ONLINE 0 0 0 ad10 ONLINE 0 0 0 -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
Jeremy Chadwick wrote:> On Fri, Jul 25, 2008 at 09:46:34AM +0200, Claus Guttesen wrote: >> Hi. >> >> I installed FreeBSD 7 a few days ago and upgraded to the latest stable >> release using GENERIC kernel. I also added these entries to >> /boot/loader.conf: >> >> vm.kmem_size="1536M" >> vm.kmem_size_max="1536M" >> vfs.zfs.prefetch_disable=1 >> >> Initially prefetch was enabled and I would experience hangs but after >> disabling prefetch copying large amounts of data would go along >> without problems. To see if FreeBSD 8 (current) had better (copy) >> performance I upgraded to current as of yesterday. After upgrading and >> rebooting the server responded fine. > > With regards to RELENG_7, I completely agree with disabling prefetch. > The overall performance (of the system and disk I/O) appears signicantly > "smoother", e.g. less hard lock-ups and stalls, is better when prefetch > is disabled.FYI I do not get "lock-ups" when running with prefetch. It is supposed to just affect performance, i.e. if you have few disks or they have low bandwidth or high seek times (e.g. crappy ATA) then it can saturate them and you will have poor response times. However if your hardware is more capable then it is a performance optimization. Someone needs to obtain the usual debugging information. Kris
> I installed FreeBSD 7 a few days ago and upgraded to the latest stable > release using GENERIC kernel. I also added these entries to > /boot/loader.conf: > > vm.kmem_size="1536M" > vm.kmem_size_max="1536M" > vfs.zfs.prefetch_disable=1 > > Initially prefetch was enabled and I would experience hangs but after > disabling prefetch copying large amounts of data would go along > without problems. To see if FreeBSD 8 (current) had better (copy) > performance I upgraded to current as of yesterday. After upgrading and > rebooting the server responded fine. > > The server is a supermicro with a quad-core harpertown e5405 with two > internal sata-drives and 8 GB of ram. I installed an areca arc-1680 > sas-controller and configured it in jbod-mode. I attached an external > sas-cabinet with 16 sas-disks at 1 TB (931 binary GB). >Replying to my own mail! :-) I believe I have found at least a work-around to alleviate this issue. First of all I upgraded to zfs ver. 11 by using Pawels patch. The upgrade itself does not solve any issue. Then I upgraded the firmware as advised by Areca-support. The firmware-upgrade has minor changes related to disk-temperature-reading and will probably not change anything either. I changed the configuration on each disk on the areca arc-1680-controller to passthrough-mode and rebooted. After re-creating the zpool I have not had any errors while copying 718 GB of small files from my solaris-nfs-server. I'm performing a local copy atm. and have 'zpool offline'd one disk and 'zpool replace'd with the spare. While the replace is in progress write-performance naturally takes a hit but the resilver-progress is progressing steadily and will use approx. 2 hours and 45 minutes in total to complete. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare