Hey folks! We''re using zfs-based file servers for our backups and we''ve been having some issues as of late with certain situations causing zfs/ zpool commands to hang. Currently, it appears that raid3155 is in this broken state: root at homiebackup10:~# ps auxwww | grep zfs root 15873 0.0 0.0 4216 1236 pts/2 S 15:56:54 0:00 grep zfs root 13678 0.0 0.1 7516 2176 ? S 14:18:00 0:00 zfs list - t filesystem raid3155/angels root 13691 0.0 0.1 7516 2188 ? S 14:18:04 0:00 zfs list - t filesystem raid3155/blazers root 13731 0.0 0.1 7516 2200 ? S 14:18:20 0:00 zfs list - t filesystem raid3155/broncos root 13792 0.0 0.1 7516 2220 ? S 14:18:51 0:00 zfs list - t filesystem raid3155/diamondbacks root 13910 0.0 0.1 7516 2216 ? S 14:19:52 0:00 zfs list - t filesystem raid3155/knicks root 13911 0.0 0.1 7516 2196 ? S 14:19:53 0:00 zfs list - t filesystem raid3155/lions root 13916 0.0 0.1 7516 2220 ? S 14:19:55 0:00 zfs list - t filesystem raid3155/magic root 13933 0.0 0.1 7516 2232 ? S 14:20:01 0:00 zfs list - t filesystem raid3155/mariners root 13966 0.0 0.1 7516 2212 ? S 14:20:11 0:00 zfs list - t filesystem raid3155/mets root 13971 0.0 0.1 7516 2208 ? S 14:20:21 0:00 zfs list - t filesystem raid3155/niners root 13982 0.0 0.1 7516 2220 ? S 14:20:32 0:00 zfs list - t filesystem raid3155/padres root 14064 0.0 0.1 7516 2220 ? S 14:21:03 0:00 zfs list - t filesystem raid3155/redwings root 14123 0.0 0.1 7516 2212 ? S 14:21:20 0:00 zfs list - t filesystem raid3155/seahawks root 14323 0.0 0.1 7420 2184 ? S 14:22:51 0:00 zfs allow zfsrcv create,mount,receive,share raid3155 root 15245 0.0 0.1 7468 2256 ? S 15:17:59 0:00 zfs create raid3155/angels root 15250 0.0 0.1 7468 2244 ? S 15:18:03 0:00 zfs create raid3155/blazers root 15256 0.0 0.1 7468 2248 ? S 15:18:19 0:00 zfs create raid3155/broncos root 15284 0.0 0.1 7468 2256 ? S 15:18:51 0:00 zfs create raid3155/diamondbacks root 15322 0.0 0.1 7468 2260 ? S 15:19:51 0:00 zfs create raid3155/knicks root 15332 0.0 0.1 7468 2260 ? S 15:19:53 0:00 zfs create raid3155/magic root 15333 0.0 0.1 7468 2236 ? S 15:19:53 0:00 zfs create raid3155/lions root 15345 0.0 0.1 7468 2264 ? S 15:20:01 0:00 zfs create raid3155/mariners root 15355 0.0 0.1 7468 2260 ? S 15:20:10 0:00 zfs create raid3155/mets root 15363 0.0 0.1 7468 2252 ? S 15:20:20 0:00 zfs create raid3155/niners root 15368 0.0 0.1 7468 2256 ? S 15:20:33 0:00 zfs create raid3155/padres root 15384 0.0 0.1 7468 2256 ? S 15:21:01 0:00 zfs create raid3155/redwings root 15389 0.0 0.1 7468 2264 ? S 15:21:20 0:00 zfs create raid3155/seahawks attempting to do a zpool list hangs, as does attempting to do a zpool status raid3155. Rebooting the system (forcefully) seems to ''fix'' the problem, but once it comes back up, doing a zpool list or zpool status shows no issues with any of the drives. (after a reboot): root at homiebackup10:~# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT raid3066 32.5T 18.1T 14.4T 55% ONLINE - raid3154 32.5T 18.2T 14.3T 55% ONLINE - raid3155 32.5T 18.7T 13.8T 57% ONLINE - raid3156 32.5T 22.0T 10.5T 67% ONLINE - rpool 59.5G 14.1G 45.4G 23% ONLINE - We are using silmech storform iserv r505 machines with 3x silmech storform D55J jbod sas expanders connected to LSI Logic SAS1068E B3 esas cards all containing 1.5TB seagate 7200.11 sata hard drives. We make a single striped raidz2 pool out of each chassis giving us ~29TB of storage out of each ''brick'' and we use rsync to copy the data from the machines to be backed up. They''re currently running OpenSolaris 2009.06 (snv_111b) We have had issues with the backplanes on these machines, but this particular machine has been up and running for nearly a year without any problems. It''s currently at about 50% capacity on all pools. I''m not really sure how to proceed from here as far as getting debug information while it''s hung like this. I saw someone with similar issues post a few days ago but don''t see any replies. The thread title is [zfs-discuss] Problem with resilvering and faulty disk. We''ve been seeing that issue as well while rebuilding these drives. Any assistance with this would be greatly appreciated, and any information you folks might need to help troubleshoot this issue I can provide, just let me know what you need! -Jeremy
Jeremy Kitchen wrote:> Hey folks! > > We''re using zfs-based file servers for our backups and we''ve been having > some issues as of late with certain situations causing zfs/zpool > commands to hang.anyone? this is happening right now and because we''re doing a restore I can''t reboot the machine, so it''s a prime opportunity to get debugging information if it''ll help. Thanks! -Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091026/5f0e9655/attachment.bin>
Hi Jeremy, Can you use the command below and send me the output, please? Thanks, Cindy # mdb -k > ::stacks -m zfs On 10/26/09 11:58, Jeremy Kitchen wrote:> Jeremy Kitchen wrote: >> Hey folks! >> >> We''re using zfs-based file servers for our backups and we''ve been having >> some issues as of late with certain situations causing zfs/zpool >> commands to hang. > > anyone? this is happening right now and because we''re doing a restore I > can''t reboot the machine, so it''s a prime opportunity to get debugging > information if it''ll help. > > Thanks! > > -Jeremy > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Cindy Swearingen wrote:> > Hi Jeremy, > > > > Can you use the command below and send me the output, please? > > > > Thanks, > > > > Cindy > > > > # mdb -k >> >> ::stacks -m zfsack! it *just* fully died. I''ve had our noc folks reset the machine and I will get this info to you as soon as it happens again (I''m fairly certain it will, if not on this specific machine, one of our other machines!) -Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091026/a26d5952/attachment.bin>
Jeremy, I generally suspect device failures in this case and if possible, review the contents of /var/adm/messages and fmdump -eV to see if the pool hang could be attributed to failed or failing devices. Cindy On 10/26/09 17:28, Jeremy Kitchen wrote:> Cindy Swearingen wrote: >>> Hi Jeremy, >>> >>> Can you use the command below and send me the output, please? >>> >>> Thanks, >>> >>> Cindy >>> >>> # mdb -k >>>>> ::stacks -m zfs > > ack! it *just* fully died. I''ve had our noc folks reset the machine > and I will get this info to you as soon as it happens again (I''m fairly > certain it will, if not on this specific machine, one of our other > machines!) > > -Jeremy > >
Cindy Swearingen wrote:> Jeremy, > > I generally suspect device failures in this case and if possible, > review the contents of /var/adm/messages and fmdump -eV to see > if the pool hang could be attributed to failed or failing devices.perusing /var/adm/messages, I see: Oct 22 05:06:11 homiebackup10 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): Oct 22 05:06:11 homiebackup10 Log info 0x31080000 received for target 5. Oct 22 05:06:11 homiebackup10 scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0 Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): Oct 22 05:06:19 homiebackup10 Log info 0x31080000 received for target 5. Oct 22 05:06:19 homiebackup10 scsi_status=0x0, ioc_status=0x804b, scsi_state=0x1 Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info] /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): Oct 22 05:06:19 homiebackup10 Log info 0x31080000 received for target 5. Oct 22 05:06:19 homiebackup10 scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0 lots of messages like that just prior to rsync warnings: Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning] rsync: connection unexpectedly closed (0 bytes received so far) [receiver] Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning] rsync error: error in rsync protocol data stream (code 12) at io.c(453) [receiver=2.6.9] Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning] rsync: connection unexpectedly closed (0 bytes received so far) [receiver] Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning] rsync error: error in rsync protocol data stream (code 12) at io.c(453) [receiver=2.6.9] Oct 22 06:25:27 homiebackup10 rsyncd[776]: [ID 702911 daemon.warning] rsync: connection unexpectedly closed (0 bytes received so far) [receiver] I think the rsync warnings are indicative of the pool being hung. So it would seem that the bus is freaking out and then the pool dies and that''s that? The strange thing is that this machine is way underloaded compared to another one we have (which has 5 shelves, so ~150TB of storage attached) which hasn''t really had any problems like this. We had issues with that one when rebuilding drives, but it''s been pretty stable since. looking at fmdump -eV, I see lots and lots of these: Oct 24 2009 05:02:54.098815545 ereport.io.scsi.cmd.disk.tran nvlist version: 0 class = ereport.io.scsi.cmd.disk.tran ena = 0x882108543f200401 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at 0/sd at 30,0 (end detector) driver-assessment = retry op-code = 0x28 cdb = 0x28 0x0 0x51 0x9c 0xa5 0x80 0x0 0x0 0x80 0x0 pkt-reason = 0x4 pkt-state = 0x0 pkt-stats = 0x10 __ttl = 0x1 __tod = 0x4ae2ecee 0x5e3ce39 always with the same device name. So, it would appear that the drive at that location is probably broken, and zfs just isn''t detecting it properly? Also, I''m wondering if this is related to the thread just recently titled [zfs-discuss] SNV_125 MPT warning in logfile, as we''re using the same controller that person mentions. We''re going to order some beefier controllers with the next shipment, any suggestions on what to get? If we find that the new controllers work much better, we may even go as far as replacing the ones in the existing machines (or at least any machines experiencing these issues). We''re not married to LSI, but we use LSI controllers in our webservers for the most part and they''re pretty solid there (though admittedly those are hardware raid, rather than JBOD) Thanks so much for your help! -Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091027/f8f3c403/attachment.bin>
Jeremy Kitchen wrote:> Cindy Swearingen wrote: >> Jeremy, >> >> I generally suspect device failures in this case and if possible, >> review the contents of /var/adm/messages and fmdump -eV to see >> if the pool hang could be attributed to failed or failing devices. > > perusing /var/adm/messages, I see: > > Oct 22 05:06:11 homiebackup10 scsi: [ID 365881 kern.info] > /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): > Oct 22 05:06:11 homiebackup10 Log info 0x31080000 received for target 5. > Oct 22 05:06:11 homiebackup10 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0x0 > Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info] > /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): > Oct 22 05:06:19 homiebackup10 Log info 0x31080000 received for target 5. > Oct 22 05:06:19 homiebackup10 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0x1 > Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info] > /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): > Oct 22 05:06:19 homiebackup10 Log info 0x31080000 received for target 5. > Oct 22 05:06:19 homiebackup10 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0x0 > > lots of messages like that just prior to rsync warnings: > > Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning] > rsync: connection unexpectedly closed (0 bytes received so far) [receiver] > Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning] > rsync error: error in rsync protocol data stream (code 12) at io.c(453) > [receiver=2.6.9] > Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning] > rsync: connection unexpectedly closed (0 bytes received so far) [receiver] > Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning] > rsync error: error in rsync protocol data stream (code 12) at io.c(453) > [receiver=2.6.9] > Oct 22 06:25:27 homiebackup10 rsyncd[776]: [ID 702911 daemon.warning] > rsync: connection unexpectedly closed (0 bytes received so far) [receiver] > > I think the rsync warnings are indicative of the pool being hung. So it > would seem that the bus is freaking out and then the pool dies and > that''s that? The strange thing is that this machine is way underloaded > compared to another one we have (which has 5 shelves, so ~150TB of > storage attached) which hasn''t really had any problems like this. We > had issues with that one when rebuilding drives, but it''s been pretty > stable since. > > looking at fmdump -eV, I see lots and lots of these: > > Oct 24 2009 05:02:54.098815545 ereport.io.scsi.cmd.disk.tran > nvlist version: 0 > class = ereport.io.scsi.cmd.disk.tran > ena = 0x882108543f200401 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = dev > device-path = /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at 0/sd at 30,0 > (end detector) > > driver-assessment = retry > op-code = 0x28 > cdb = 0x28 0x0 0x51 0x9c 0xa5 0x80 0x0 0x0 0x80 0x0 > pkt-reason = 0x4 > pkt-state = 0x0 > pkt-stats = 0x10 > __ttl = 0x1 > __tod = 0x4ae2ecee 0x5e3ce39so doing some more reading here on the list and mucking about a bit more, I''ve come across this in the fmdump log: Oct 22 2009 05:03:56.687818542 ereport.fs.zfs.io nvlist version: 0 class = ereport.fs.zfs.io ena = 0x99eb889c6fe00001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x90ed10dfd0191c3b vdev = 0xf41193d6d1deedc2 (end detector) pool = raid3155 pool_guid = 0x90ed10dfd0191c3b pool_context = 0 pool_failmode = wait vdev_guid = 0xf41193d6d1deedc2 vdev_type = disk vdev_path = /dev/dsk/c6t5d0s0 vdev_devid = id1,sd at n5000c50010a7666b/a parent_guid = 0xcbaa8ea60a3c133 parent_type = raidz zio_err = 5 zio_offset = 0xab2901da00 zio_size = 0x200 zio_objset = 0x4b zio_object = 0xa26ef4 zio_level = 0 zio_blkid = 0xf __ttl = 0x1 __tod = 0x4ae04a2c 0x28ff472e c6t5d0 is in the problem pool (raid3155) so I''ve gone ahead and offlined the drive and will be replacing it shortly. Hopefully that will take care of the problem! If this doesn''t solve the problem, do you have any suggestions on what more I can look at to try to figure out what''s wrong? Is there some sort of setting I can set which will prevent the zpool from hanging up the entire system in the event of a single drive failure like this? It''s really annoying to not be able to log into the machine (and having to forcefully reboot the machine) when this happens. Thanks again for your help! -Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 258 bytes Desc: OpenPGP digital signature URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091027/cc0949df/attachment.bin>
Hi Jeremy, The ereport.io.scsi.cmd.disk.tran is describing connections problems to the /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at 0/sd at 30,0 device. I think the .tran suffix is for transient. ZFS might be reporting problems with device as well, but if the zpool/zfs commands are hanging, then it might be difficult to get this confirmation. The zpool status command will report device problems. When a device in a pool fails, then I/O to the pool is blocked, reads might be successful. See the failmode property description in zpool.1m. Is this pool redundant? If so, you can attempt to offline this device until it is replaced. If you have another device available, you might replace the suspect drive and see if that solves the pool hang problem. Cindy On 10/27/09 12:04, Jeremy Kitchen wrote:> Cindy Swearingen wrote: >> Jeremy, >> >> I generally suspect device failures in this case and if possible, >> review the contents of /var/adm/messages and fmdump -eV to see >> if the pool hang could be attributed to failed or failing devices. > > perusing /var/adm/messages, I see: > > Oct 22 05:06:11 homiebackup10 scsi: [ID 365881 kern.info] > /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): > Oct 22 05:06:11 homiebackup10 Log info 0x31080000 received for target 5. > Oct 22 05:06:11 homiebackup10 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0x0 > Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info] > /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): > Oct 22 05:06:19 homiebackup10 Log info 0x31080000 received for target 5. > Oct 22 05:06:19 homiebackup10 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0x1 > Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info] > /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): > Oct 22 05:06:19 homiebackup10 Log info 0x31080000 received for target 5. > Oct 22 05:06:19 homiebackup10 scsi_status=0x0, ioc_status=0x804b, > scsi_state=0x0 > > lots of messages like that just prior to rsync warnings: > > Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning] > rsync: connection unexpectedly closed (0 bytes received so far) [receiver] > Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning] > rsync error: error in rsync protocol data stream (code 12) at io.c(453) > [receiver=2.6.9] > Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning] > rsync: connection unexpectedly closed (0 bytes received so far) [receiver] > Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning] > rsync error: error in rsync protocol data stream (code 12) at io.c(453) > [receiver=2.6.9] > Oct 22 06:25:27 homiebackup10 rsyncd[776]: [ID 702911 daemon.warning] > rsync: connection unexpectedly closed (0 bytes received so far) [receiver] > > I think the rsync warnings are indicative of the pool being hung. So it > would seem that the bus is freaking out and then the pool dies and > that''s that? The strange thing is that this machine is way underloaded > compared to another one we have (which has 5 shelves, so ~150TB of > storage attached) which hasn''t really had any problems like this. We > had issues with that one when rebuilding drives, but it''s been pretty > stable since. > > looking at fmdump -eV, I see lots and lots of these: > > Oct 24 2009 05:02:54.098815545 ereport.io.scsi.cmd.disk.tran > nvlist version: 0 > class = ereport.io.scsi.cmd.disk.tran > ena = 0x882108543f200401 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = dev > device-path = /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at 0/sd at 30,0 > (end detector) > > driver-assessment = retry > op-code = 0x28 > cdb = 0x28 0x0 0x51 0x9c 0xa5 0x80 0x0 0x0 0x80 0x0 > pkt-reason = 0x4 > pkt-state = 0x0 > pkt-stats = 0x10 > __ttl = 0x1 > __tod = 0x4ae2ecee 0x5e3ce39 > > > > always with the same device name. So, it would appear that the drive at > that location is probably broken, and zfs just isn''t detecting it properly? > > Also, I''m wondering if this is related to the thread just recently > titled [zfs-discuss] SNV_125 MPT warning in logfile, as we''re using the > same controller that person mentions. > > We''re going to order some beefier controllers with the next shipment, > any suggestions on what to get? If we find that the new controllers > work much better, we may even go as far as replacing the ones in the > existing machines (or at least any machines experiencing these issues). > > We''re not married to LSI, but we use LSI controllers in our webservers > for the most part and they''re pretty solid there (though admittedly > those are hardware raid, rather than JBOD) > > Thanks so much for your help! > > -Jeremy >
Jeremy, I can''t comment on your hardware because I''m not familiar with it. If you have a storage pool with ZFS redundancy and one device fails or begins failing, then the pool keeps going, in a degraded mode but is generally available. You can try setting the failmode property to continue, which would allow reads to continue in case of a device failure, might prevent the pool from hanging. If offlining the disk or replacing the disk doesn''t help, let us know. Cindy On 10/27/09 13:13, Jeremy Kitchen wrote:> Jeremy Kitchen wrote: >> Cindy Swearingen wrote: >>> Jeremy, >>> >>> I generally suspect device failures in this case and if possible, >>> review the contents of /var/adm/messages and fmdump -eV to see >>> if the pool hang could be attributed to failed or failing devices. >> perusing /var/adm/messages, I see: >> >> Oct 22 05:06:11 homiebackup10 scsi: [ID 365881 kern.info] >> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): >> Oct 22 05:06:11 homiebackup10 Log info 0x31080000 received for target 5. >> Oct 22 05:06:11 homiebackup10 scsi_status=0x0, ioc_status=0x804b, >> scsi_state=0x0 >> Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info] >> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): >> Oct 22 05:06:19 homiebackup10 Log info 0x31080000 received for target 5. >> Oct 22 05:06:19 homiebackup10 scsi_status=0x0, ioc_status=0x804b, >> scsi_state=0x1 >> Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info] >> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1): >> Oct 22 05:06:19 homiebackup10 Log info 0x31080000 received for target 5. >> Oct 22 05:06:19 homiebackup10 scsi_status=0x0, ioc_status=0x804b, >> scsi_state=0x0 >> >> lots of messages like that just prior to rsync warnings: >> >> Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning] >> rsync: connection unexpectedly closed (0 bytes received so far) [receiver] >> Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning] >> rsync error: error in rsync protocol data stream (code 12) at io.c(453) >> [receiver=2.6.9] >> Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning] >> rsync: connection unexpectedly closed (0 bytes received so far) [receiver] >> Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning] >> rsync error: error in rsync protocol data stream (code 12) at io.c(453) >> [receiver=2.6.9] >> Oct 22 06:25:27 homiebackup10 rsyncd[776]: [ID 702911 daemon.warning] >> rsync: connection unexpectedly closed (0 bytes received so far) [receiver] >> >> I think the rsync warnings are indicative of the pool being hung. So it >> would seem that the bus is freaking out and then the pool dies and >> that''s that? The strange thing is that this machine is way underloaded >> compared to another one we have (which has 5 shelves, so ~150TB of >> storage attached) which hasn''t really had any problems like this. We >> had issues with that one when rebuilding drives, but it''s been pretty >> stable since. >> >> looking at fmdump -eV, I see lots and lots of these: >> >> Oct 24 2009 05:02:54.098815545 ereport.io.scsi.cmd.disk.tran >> nvlist version: 0 >> class = ereport.io.scsi.cmd.disk.tran >> ena = 0x882108543f200401 >> detector = (embedded nvlist) >> nvlist version: 0 >> version = 0x0 >> scheme = dev >> device-path = /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at 0/sd at 30,0 >> (end detector) >> >> driver-assessment = retry >> op-code = 0x28 >> cdb = 0x28 0x0 0x51 0x9c 0xa5 0x80 0x0 0x0 0x80 0x0 >> pkt-reason = 0x4 >> pkt-state = 0x0 >> pkt-stats = 0x10 >> __ttl = 0x1 >> __tod = 0x4ae2ecee 0x5e3ce39 > > so doing some more reading here on the list and mucking about a bit > more, I''ve come across this in the fmdump log: > > Oct 22 2009 05:03:56.687818542 ereport.fs.zfs.io > nvlist version: 0 > class = ereport.fs.zfs.io > ena = 0x99eb889c6fe00001 > detector = (embedded nvlist) > nvlist version: 0 > version = 0x0 > scheme = zfs > pool = 0x90ed10dfd0191c3b > vdev = 0xf41193d6d1deedc2 > (end detector) > > pool = raid3155 > pool_guid = 0x90ed10dfd0191c3b > pool_context = 0 > pool_failmode = wait > vdev_guid = 0xf41193d6d1deedc2 > vdev_type = disk > vdev_path = /dev/dsk/c6t5d0s0 > vdev_devid = id1,sd at n5000c50010a7666b/a > parent_guid = 0xcbaa8ea60a3c133 > parent_type = raidz > zio_err = 5 > zio_offset = 0xab2901da00 > zio_size = 0x200 > zio_objset = 0x4b > zio_object = 0xa26ef4 > zio_level = 0 > zio_blkid = 0xf > __ttl = 0x1 > __tod = 0x4ae04a2c 0x28ff472e > > > c6t5d0 is in the problem pool (raid3155) so I''ve gone ahead and offlined > the drive and will be replacing it shortly. Hopefully that will take > care of the problem! > > If this doesn''t solve the problem, do you have any suggestions on what > more I can look at to try to figure out what''s wrong? Is there some > sort of setting I can set which will prevent the zpool from hanging up > the entire system in the event of a single drive failure like this? > It''s really annoying to not be able to log into the machine (and having > to forcefully reboot the machine) when this happens. > > Thanks again for your help! > > -Jeremy >
Hi Jeremy, I had a loosely similar problem with my 2009.06 box. In my case (which may not be yours), working with support we found a bug that was causing my pool to hang. I also got erroneous errors when I did a scrub ( 3 x 5 disk raidz). I am using the same LSI controller. A sure fire way to kill the box was to setup a file system as an iSCSI target, and write a lot of data to it, around 1-2MB/s. It would usually die inside of a few hours. NFS writing was not as bad, but within a day it would panic there too. The solution for me was to upgrade to 124. Since the upgrade three weeks ago, I have had no problems. Again, I don''t know if this would fix your problem, but it may be worth a try. Just don''t upgrade your ZFS version, and you will be able to roll back to 2009.06 at any time. -Scott -- This message posted from opensolaris.org
On Oct 26, 2009, at 4:00 PM, Cindy Swearingen wrote:> Hi Jeremy, > > Can you use the command below and send me the output, please? > > Thanks, > > Cindy > > # mdb -k > > ::stacks -m zfsOk, it did it again. I replaced the drive and it''s currently resilvering (13.66% done, 135h33m to go it says) and the output of that command is this: > ::stacks -m zfs THREAD STATE SOBJ COUNT ffffff02f0078a80 SLEEP MUTEX 31 swtch+0x147 turnstile_block+0x764 mutex_vector_enter+0x261 zfs_zget+0x47 zfs_root+0x57 fsop_root+0x2e traverse+0x61 lookuppnvp+0x423 lookuppnat+0x12c lookupnameat+0x91 lookupname+0x28 chroot+0x30 _sys_sysenter_post_swapgs+0x14b ffffff02efa33500 SLEEP CV 18 swtch+0x147 cv_wait+0x61 dbuf_read+0x237 dmu_buf_hold+0x96 zap_lockdir+0x67 zap_lookup_norm+0x55 zap_lookup+0x2d zfs_match_find+0xfd zfs_dirent_lock+0x3d1 zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0x91 cstatat_getvp+0x164 cstatat64_32+0x82 stat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff02da1a91e0 SLEEP CV 9 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dmu_buf_hold+0x96 zap_lockdir+0x67 zap_lookup_norm+0x55 zap_lookup+0x2d zfs_match_find+0xfd zfs_dirent_lock+0x3d1 zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0x91 cstatat_getvp+0x164 cstatat64_32+0x82 stat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff02d8c46ac0 SLEEP CV 7 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dbuf_findbp+0xe7 dbuf_hold_impl+0x81 dbuf_findbp+0xcf dbuf_hold_impl+0x81 dbuf_hold+0x2e dnode_hold_impl+0xb5 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_root+0x57 fsop_root+0x2e traverse+0x61 lookuppnvp+0x423 lookuppnat+0x12c lookupnameat+0x91 lookupname+0x28 chroot+0x30 _sys_sysenter_post_swapgs+0x14b ffffff02da1a2720 SLEEP CV 6 swtch+0x147 cv_wait+0x61 txg_wait_open+0x7a dmu_tx_wait+0xb3 dmu_tx_assign+0x4b zfs_inactive+0xa8 fop_inactive+0xaf vn_rele+0x5f closef+0x75 closeandsetf+0x44a close+0x18 _sys_sysenter_post_swapgs+0x14b ffffff000f61bc60 SLEEP CV 5 swtch+0x147 cv_wait+0x61 txg_thread_wait+0x5f txg_quiesce_thread+0x94 thread_start+8 ffffff02d8514aa0 SLEEP CV 5 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dbuf_findbp+0xe7 dbuf_hold_impl+0x81 dbuf_findbp+0xcf dbuf_hold_impl+0x81 dbuf_findbp+0xcf dbuf_hold_impl+0x81 dbuf_hold+0x2e dnode_hold_impl+0xb5 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_root+0x57 fsop_root+0x2e traverse+0x61 lookuppnvp+0x423 lookuppnat+0x12c lookupnameat+0x91 lookupname+0x28 chroot+0x30 _sys_sysenter_post_swapgs+0x14b ffffff02d89b4c20 SLEEP CV 3 swtch+0x147 cv_wait+0x61 txg_wait_synced+0x7f dmu_tx_wait+0xcd zfs_create+0x44d fop_create+0xfc vn_createat+0x5e1 vn_openat+0x1fb copen+0x418 open64+0x34 _sys_sysenter_post_swapgs+0x14b ffffff02dc0a5400 SLEEP CV 3 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dmu_buf_hold+0x96 zap_get_leaf_byblk+0x56 zap_deref_leaf+0x78 fzap_lookup+0x78 zap_lookup_norm+0x116 zap_lookup+0x2d zfs_match_find+0xfd zfs_dirent_lock+0x3d1 zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0xd2 cstatat_getvp+0x164 cstatat64_32+0x82 lstat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff02efc771a0 SLEEP CV 3 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dmu_buf_hold_array_by_dnode+0x220 dmu_buf_hold_array+0x73 dmu_read_uio+0x4d zfs_read+0x19a fop_read+0x6b rfs3_read+0x393 common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff001076dc60 SLEEP CV 3 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dsl_pool_sync+0xe1 spa_sync+0x32a txg_sync_thread+0x265 thread_start+8 ffffff02d8c4d900 SLEEP CV 2 swtch+0x147 cv_wait+0x61 txg_wait_open+0x7a dmu_tx_wait+0xb3 zfs_write+0x3bd fop_write+0x6b write+0x2e2 write32+0x22 _sys_sysenter_post_swapgs+0x14b ffffff02efa98e80 SLEEP CV 2 swtch+0x147 cv_wait+0x61 zil_commit+0x62 zfs_putpage+0x2c9 fop_putpage+0x74 rfs3_commit+0x180 common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02ef89a3c0 SLEEP CV 2 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dbuf_findbp+0xe7 dbuf_hold_impl+0x81 dbuf_findbp+0xcf dbuf_hold_impl+0x81 dbuf_hold+0x2e dnode_hold_impl+0xb5 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_root+0x57 fsop_root+0x2e traverse+0x61 lookuppnvp+0x423 lookuppnat+0x12c lookupnameat+0x91 cstatat_getvp+0x164 cstatat64_32+0x82 stat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff02efb24b00 SLEEP CV 2 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dmu_buf_hold+0x96 zap_lockdir+0x67 zap_cursor_retrieve+0x74 zfs_readdir+0x29e fop_readdir+0xab getdents64+0xbc _sys_sysenter_post_swapgs+0x14b ffffff02efce3c40 SLEEP CV 2 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dnode_hold_impl+0xd9 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_vget+0x25e fsop_vget+0x67 nfs3_fhtovp+0x47 rfs3_getattr+0x40 common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02efc763a0 SLEEP MUTEX 2 swtch+0x147 turnstile_block+0x764 mutex_vector_enter+0x261 zfs_zget+0x47 zfs_root+0x57 fsop_root+0x2e traverse+0x61 lookuppnvp+0x423 lookuppnat+0x12c lookupnameat+0x91 cstatat_getvp+0x164 cstatat64_32+0x82 stat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff000f531c60 SLEEP CV 1 swtch+0x147 cv_timedwait+0xba arc_reclaim_thread+0x17b thread_start+8 ffffff000f537c60 SLEEP CV 1 swtch+0x147 cv_timedwait+0xba l2arc_feed_thread+0xa5 thread_start+8 ffffff000f621c60 SLEEP CV 1 swtch+0x147 cv_timedwait+0xba txg_thread_wait+0x7b txg_sync_thread+0x114 thread_start+8 ffffff02d8c4ee60 SLEEP CV 1 swtch+0x147 cv_wait+0x61 dmu_buf_hold_array_by_dnode+0x2b7 dmu_buf_hold_array+0x73 dmu_read_uio+0x4d zfs_read+0x19a fop_read+0x6b rfs3_read+0x393 common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02dc0a6c80 SLEEP CV 1 swtch+0x147 cv_wait+0x61 spa_config_enter+0x7d spa_vdev_enter+0x2f spa_vdev_setpath+0x32 zfs_ioc_vdev_setpath+0x48 zfsdev_ioctl+0x10b cdev_ioctl+0x45 spec_ioctl+0x83 fop_ioctl+0x7b ioctl+0x18e _sys_sysenter_post_swapgs+0x14b ffffff052a2551c0 SLEEP CV 1 swtch+0x147 cv_wait+0x61 txg_wait_open+0x7a dmu_tx_wait+0xb3 zfs_setattr+0xca8 fop_setattr+0xad vpsetattr+0x12a fdsetattr+0x30 fchmod+0x3a _sys_sysenter_post_swapgs+0x14b ffffff02d89b7a80 SLEEP CV 1 swtch+0x147 cv_wait+0x61 txg_wait_open+0x7a dmu_tx_wait+0xb3 zfs_write+0x3bd fop_write+0x6b rfs3_write+0x507 common_dispatch+0x4a7 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02dc0a5780 SLEEP CV 1 swtch+0x147 cv_wait+0x61 txg_wait_synced+0x7f dmu_tx_wait+0xcd zfs_rename+0x553 fop_rename+0xc5 vn_renameat+0x2ff vn_rename+0x2f rename+0x17 _sys_sysenter_post_swapgs+0x14b ffffff02d8c34720 SLEEP CV 1 swtch+0x147 cv_wait+0x61 txg_wait_synced+0x7f spa_vdev_state_exit+0x4e vdev_fault+0xde zfs_ioc_vdev_set_state+0xa6 zfsdev_ioctl+0x10b cdev_ioctl+0x45 spec_ioctl+0x83 fop_ioctl+0x7b ioctl+0x18e _sys_sysenter_post_swapgs+0x14b ffffff02efa3f540 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zil_commit+0x62 zfs_fsync+0xd3 fop_fsync+0x5a rfs3_create+0x767 common_dispatch+0x4a7 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02efa3e3c0 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zil_commit+0x62 zfs_fsync+0xd3 fop_fsync+0x5a rfs3_setattr+0x447 common_dispatch+0x4a7 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02efc77520 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dbuf_findbp+0xe7 dbuf_hold_impl+0x81 dbuf_findbp+0xcf dbuf_hold_impl+0x81 dbuf_findbp+0xcf dbuf_hold_impl+0x81 dbuf_hold+0x2e dnode_hold_impl+0xb5 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_root+0x57 fsop_root+0x2e traverse+0x61 lookuppnvp+0x423 lookuppnat+0x12c lookupnameat+0x91 cstatat_getvp+0x164 cstatat64_32+0x82 stat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff02ef791c60 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dbuf_findbp+0xe7 dbuf_hold_impl+0x81 dbuf_hold+0x2e dnode_hold_impl+0xb5 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_root+0x57 fsop_root+0x2e traverse+0x61 lookuppnvp+0x423 lookuppnat+0x12c lookupnameat+0x91 lookupname+0x28 chroot+0x30 _sys_sysenter_post_swapgs+0x14b ffffff02d8c40b00 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dbuf_findbp+0xe7 dbuf_hold_impl+0x81 dbuf_hold+0x2e dnode_hold_impl+0xb5 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_vget+0x25e fsop_vget+0x67 nfs3_fhtovp+0x47 rfs3_getattr+0x40 common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02efb398e0 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dbuf_findbp+0xe7 dbuf_prefetch+0x93 dmu_zfetch_fetch+0x62 dmu_zfetch_dofetch+0xb8 dmu_zfetch_find+0x394 dmu_zfetch+0xac dbuf_read+0x170 dnode_hold_impl+0xd9 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_dirent_lock+0x3fc zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0x91 cstatat_getvp+0x164 cstatat64_32+0x82 lstat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff0372bdac60 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dmu_buf_hold+0x96 zap_get_leaf_byblk+0x56 zap_deref_leaf+0x78 fzap_lookup+0x78 zap_lookup_norm+0x116 zap_lookup+0x2d zfs_match_find+0xfd zfs_dirent_lock+0x3d1 zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0x91 cstatat_getvp+0x164 cstatat64_32+0x82 lstat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff02da1a4380 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dmu_buf_hold+0x96 zap_lockdir+0x67 zap_lookup_norm+0x55 zap_lookup+0x2d zfs_match_find+0xfd zfs_dirent_lock+0x3d1 zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0x91 cstatat_getvp+0x164 cstatat64_32+0x82 lstat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff02efce3540 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dmu_buf_hold+0x96 zap_lockdir+0x67 zap_lookup_norm+0x55 zap_lookup+0x2d zfs_match_find+0xfd zfs_dirent_lock+0x3d1 zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed rfs3_lookup+0x395 common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02d8c44e60 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dmu_buf_hold+0x96 zap_table_load+0x76 zap_idx_to_blk+0x56 zap_deref_leaf+0x60 fzap_lookup+0x78 zap_lookup_norm+0x116 zap_lookup+0x2d zfs_match_find+0xfd zfs_dirent_lock+0x3d1 zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0xd2 cstatat_getvp+0x164 cstatat64_32+0x82 lstat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff02d8c45c60 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dnode_hold_impl+0xd9 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_dirent_lock+0x3fc zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed rfs3_lookup+0x395 common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02da1a7c80 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dnode_hold_impl+0xd9 dnode_hold+0x2b dmu_bonus_hold+0x36 zfs_zget+0x5a zfs_root+0x57 fsop_root+0x2e traverse+0x61 lookuppnvp+0x423 lookuppnat+0x12c lookupnameat+0x91 lookupname+0x28 chroot+0x30 _sys_sysenter_post_swapgs+0x14b ffffff02d8c50ac0 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dbuf_read+0x1e8 dnode_hold_impl+0xd9 dnode_hold+0x2b dmu_object_info+0x30 bplist_open+0x36 dsl_dataset_get_ref+0x139 dsl_dataset_hold+0x100 dmu_objset_prefetch+0x24 findfunc+0x23 dmu_objset_find_spa+0x30d dmu_objset_find+0x40 zfs_ioc_snapshot_list_next+0x59 zfsdev_ioctl+0x10b cdev_ioctl+0x45 spec_ioctl+0x83 fop_ioctl+0x7b ioctl+0x18e _sys_sysenter_post_swapgs+0x14b ffffff02efa3fc40 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dmu_tx_hold_free+0x19f zfs_remove+0x2b9 fop_remove+0xa5 vn_removeat+0x2f0 vn_remove+0x2c unlink+0x19 _sys_sysenter_post_swapgs+0x14b ffffff001073dc60 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d dsl_pool_sync+0x1f6 spa_sync+0x3cd txg_sync_thread+0x265 thread_start+8 ffffff02d8c40780 SLEEP CV 1 swtch+0x147 cv_wait+0x61 zio_wait+0x5d zil_commit_writer+0x2ac zil_commit+0x8c zfs_putpage+0x2c9 fop_putpage+0x74 rfs3_commit+0x180 common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff02d89bac80 SLEEP MUTEX 1 swtch+0x147 turnstile_block+0x764 mutex_vector_enter+0x261 zfs_zget+0x47 zfs_vget+0x25e fsop_vget+0x67 nfs3_fhtovp+0x47 rfs3_access+0x4a common_dispatch+0x3a0 rfs_dispatch+0x2d svc_getreq+0x19c svc_run+0x16b svc_do_run+0x81 nfssys+0x765 _sys_sysenter_post_swapgs+0x14b ffffff0010adfc60 SLEEP MUTEX 1 swtch+0x147 turnstile_block+0x764 mutex_vector_enter+0x261 zfs_zinactive+0x3c zfs_inactive+0xee fop_inactive+0xaf vn_rele_dnlc+0x6c do_dnlc_reduce_cache+0x16a taskq_d_thread+0xb1 thread_start+8 ffffff02da197560 SLEEP CV 1 zfs_lookup+0xb1 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0xd2 cstatat_getvp+0x164 cstatat64_32+0x82 lstat64_32+0x31 _sys_sysenter_post_swapgs+0x14b ffffff000fc92c60 SLEEP CV 1 zio_destroy+0x5c zio_done+0xc4 zio_execute+0xa0 zio_notify_parent+0xa6 zio_done+0x2c9 zio_execute+0xa0 taskq_thread+0x193 thread_start+8 ffffff001006ac60 SLEEP CV 1 zio_vdev_io_done+0x62 zio_execute+0xa0 taskq_thread+0x193 thread_start+8 ffffff052a254e40 ONPROC <NONE> 1 apic_intr_exit+0x32 hilevel_intr_epilog+0x123 do_interrupt+0xe9 _sys_rtt_ints_disabled+8 dmu_zfetch_colinear+0x87 dmu_zfetch+0xd3 dbuf_read+0x272 dnode_hold_impl+0x22b dnode_hold+0x2b dmu_buf_hold+0x75 zap_get_leaf_byblk+0x56 zap_deref_leaf+0x78 fzap_lookup+0x78 zap_lookup_norm+0x116 zap_lookup+0x2d zfs_match_find+0xfd zfs_dirent_lock+0x3d1 zfs_dirlook+0xd9 zfs_lookup+0x104 fop_lookup+0xed lookuppnvp+0x3a3 lookuppnat+0x12c lookupnameat+0xd2 cstatat_getvp+0x164 cstatat64_32+0x82 lstat64_32+0x31 _sys_sysenter_post_swapgs+0x14b -Jeremy