Arnaud Brand
2010-Jan-08 15:54 UTC
[zfs-discuss] ZFS partially hangs when removing an rpool mirrored disk while having some IO on another pool on another partition of the same disk
Hello, Sorry for the (very) long subject but I''ve pinpointed the problem to this exact situation. I know about the other threads related to hangs, but in my case there was no < zfs destroy > involved, nor any compression or deduplication. To make a long story short, when - a disk contains 2 partitions (p1=32GB, p2=1800 GB) and - p1 is used as part of a zfs mirror of rpool and - p2 is used as part of a raidz (tested raidz1 and raidz2) of tank and - some serious work is underway on tank (tested write, copy, scrub), If you physically remove the disk, zfs partially hangs. Putting back the physical disk does not help. For the long story : About the hardware : 1 x intel X25E (64GB SSD), 15x2TB SATA drives (7xWD, 8xHitachi), 2xQuadCore Xeon, 12GB RAM, 2xAreca-1680 (8-ports SAS controller), tyan S7002 mainboard. About the software / firmware : Opensolaris b130 installed on the SSD drive, on the first 32 GB. The areca cards are configured as a JBOD and are running the latest release firmware. Initial setup : We created a 32GB partition on all of the 2TB drives and mirrored the system partition, giving us a 16-way rpool mirror. The rest of the 2TB drives''s space was put in a second partition and used for a raidz2 pool (named tank) Problem : Whenever we physically removed a disk from its tray while doing some speed testing on the tank pool, the system hung. At that time I hadn''t read all the thread about zfs hangs and couldn''t determine wether the system was hung or just zfs. In order to pinpoint the problem, we made another setup. Second setup : I reduced the number of partitions in the rpool mirror down to 3 (p1 from the SSD, p1 from a 2TB drive on the same controller as the SSD and p1 from a 2TB drive on the other controller). Problem : When the system is quiet, I am able to physically remove any disk, plug it back and resilver it. When I am putting some load on the tank pool, I can remove any disk that does *not* contain the rpool mirror (I can plug it back and resilver it while the load keeps running without noticeable performance impact). When I am putting some load on the tank pool, I cannot physically remove a disk that also contains a mirror of the rpool or zfs partially hangs. When I say partially, I mean that : - zpool iostat -v tank 5 freezes - if I run any zpool command related to rpool, I''m stuck (zpool clear rpool c4t0d7s0 for example or zpool status rpool) I can''t launch new programms, but already launched programs continue to run (at least in an ssh session, since gnome becomes more and more frozen as you move from window to window).>From ssh sessions :- prstat shows that only gnome-system-monitor, xorg, ssh, bash and various *stat utils (prstat, fstat, iostat, mpstat) are consumming some CPU. - zpool iostat -v tank 5 is frozen (It freezes when I issue a zpool clear rpool c4t0d7s0 in another session) - iostat -xn is not stuck but shows all zeroes since the very moment zpool iostat froze (which is quite strange if you look at fsstat ouput hereafter). NB: when I say all zeroes, I really mea nit, it''s not zero dot domething, its zero dot zero. - mpstat shows normal activity (almost nothing since this is a test machine, so only a few percent are used, but it still shows some activity and refreshes correctly) CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 125 428 109 113 1 4 0 0 251 2 0 0 98 1 0 0 20 56 16 44 2 2 1 0 277 11 1 0 88 2 0 0 163 152 13 309 1 3 0 0 1370 4 0 0 96 3 0 0 19 111 41 90 0 4 0 0 80 0 0 0 100 4 0 0 69 192 17 66 0 3 0 0 20 0 0 0 100 5 0 0 10 61 7 92 0 4 0 0 167 0 0 0 100 6 0 0 96 191 25 74 0 4 1 0 5 0 0 0 100 7 0 0 16 58 6 63 0 3 1 0 59 0 0 0 100 - fsstat -F 5 shows all zeroes but for the zfs line (the figures hereunder stay almost the same over time) new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 1,25K 0 2,51K 0 803 11,0M 473 11,0M zfs - disk leds show no activity - I cannot run any other command (neither from ssh, nor from gnome) - I cannot open another ssh session (I don''t even get the login prompt in putty) - I can successfully ping the machine - I cannot establish a new cifs session (the login prompt should not appear since the machine is in an active directory domain, but when it''s stuck the prompt appear and I cannot authenticate. I guess it''s related to ldap or kerberos or whatever cannot be read on rpool), but an already active session will stay open (last time I even managed to create a text file with a few lines in it that was still there after I hard-rebooted). - after some time (an hour or so) my ssh sessions eventually stuck too. Having worked for quite some time with Opensolaris/ZFS (though not with so much disks), I doubted the problem came from opensolaris and I already opened a case with areca''s tech support which is trying (at least they told me so) to reproduce the problem. That''s until I''ve read on zfs hangs issues. We''ve found a workaround : we''re going to put one internal 2.5'''' disk to mirror rpool and dedicate the whole of the 2TB disks for tank, but : - I thought it might somehow be related to the other hang issues (and so it might help developpers to hear about other similar issues) - I would really like to rule out an opensolaris bug so that I can bring proofs to areca that their driver has a problem and either request them to correct it or request my supplier to replace the cards with working ones. I think their driver is at fault because I found a message in /var/adm/message saying "WARNING: arcmsr duplicate scsi_hba_pkt_comp(9F) on same scsi_pkt(9S)" and immediately after that " WARNING: kstat rcnt == 0 when exiting runq, please check". Then the system was hung. The comments in the code introducing these changes tell us that drivers behaving this way could panic the system (or at least that how I understood these comments). I''ve reached my limits here. If anyone has ideas on how to determine what''s going on, on other information I should publish, on other things to run or to check, I''ll be happy to test them. If I can be of any help to help troubleshoot the zfs hang problems that other are experiencing, I''d be happy to give a hand. Thanks and have a nice day, Regards, Arnaud Brand -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100108/e56fc2fa/attachment.html>