thr3ads.net - zfs discuss - [zfs-discuss] zpool getting in a stuck state? [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Jeremy Kitchen

2009-Oct-22 23:22 UTC

[zfs-discuss] zpool getting in a stuck state?

Hey folks!

We''re using zfs-based file servers for our backups and we''ve
been
having some issues as of late with certain situations causing zfs/ 
zpool commands to hang.

Currently, it appears that raid3155 is in this broken state:

root at homiebackup10:~# ps auxwww | grep zfs
root     15873  0.0  0.0 4216 1236 pts/2    S 15:56:54  0:00 grep zfs
root     13678  0.0  0.1 7516 2176 ?        S 14:18:00  0:00 zfs list - 
t filesystem raid3155/angels
root     13691  0.0  0.1 7516 2188 ?        S 14:18:04  0:00 zfs list - 
t filesystem raid3155/blazers
root     13731  0.0  0.1 7516 2200 ?        S 14:18:20  0:00 zfs list - 
t filesystem raid3155/broncos
root     13792  0.0  0.1 7516 2220 ?        S 14:18:51  0:00 zfs list - 
t filesystem raid3155/diamondbacks
root     13910  0.0  0.1 7516 2216 ?        S 14:19:52  0:00 zfs list - 
t filesystem raid3155/knicks
root     13911  0.0  0.1 7516 2196 ?        S 14:19:53  0:00 zfs list - 
t filesystem raid3155/lions
root     13916  0.0  0.1 7516 2220 ?        S 14:19:55  0:00 zfs list - 
t filesystem raid3155/magic
root     13933  0.0  0.1 7516 2232 ?        S 14:20:01  0:00 zfs list - 
t filesystem raid3155/mariners
root     13966  0.0  0.1 7516 2212 ?        S 14:20:11  0:00 zfs list - 
t filesystem raid3155/mets
root     13971  0.0  0.1 7516 2208 ?        S 14:20:21  0:00 zfs list - 
t filesystem raid3155/niners
root     13982  0.0  0.1 7516 2220 ?        S 14:20:32  0:00 zfs list - 
t filesystem raid3155/padres
root     14064  0.0  0.1 7516 2220 ?        S 14:21:03  0:00 zfs list - 
t filesystem raid3155/redwings
root     14123  0.0  0.1 7516 2212 ?        S 14:21:20  0:00 zfs list - 
t filesystem raid3155/seahawks
root     14323  0.0  0.1 7420 2184 ?        S 14:22:51  0:00 zfs allow  
zfsrcv create,mount,receive,share raid3155
root     15245  0.0  0.1 7468 2256 ?        S 15:17:59  0:00 zfs  
create raid3155/angels
root     15250  0.0  0.1 7468 2244 ?        S 15:18:03  0:00 zfs  
create raid3155/blazers
root     15256  0.0  0.1 7468 2248 ?        S 15:18:19  0:00 zfs  
create raid3155/broncos
root     15284  0.0  0.1 7468 2256 ?        S 15:18:51  0:00 zfs  
create raid3155/diamondbacks
root     15322  0.0  0.1 7468 2260 ?        S 15:19:51  0:00 zfs  
create raid3155/knicks
root     15332  0.0  0.1 7468 2260 ?        S 15:19:53  0:00 zfs  
create raid3155/magic
root     15333  0.0  0.1 7468 2236 ?        S 15:19:53  0:00 zfs  
create raid3155/lions
root     15345  0.0  0.1 7468 2264 ?        S 15:20:01  0:00 zfs  
create raid3155/mariners
root     15355  0.0  0.1 7468 2260 ?        S 15:20:10  0:00 zfs  
create raid3155/mets
root     15363  0.0  0.1 7468 2252 ?        S 15:20:20  0:00 zfs  
create raid3155/niners
root     15368  0.0  0.1 7468 2256 ?        S 15:20:33  0:00 zfs  
create raid3155/padres
root     15384  0.0  0.1 7468 2256 ?        S 15:21:01  0:00 zfs  
create raid3155/redwings
root     15389  0.0  0.1 7468 2264 ?        S 15:21:20  0:00 zfs  
create raid3155/seahawks

attempting to do a zpool list hangs, as does attempting to do a zpool  
status raid3155.  Rebooting the system (forcefully) seems to
''fix'' the
problem, but once it comes back up, doing a zpool list or zpool status  
shows no issues with any of the drives.

(after a reboot):
root at homiebackup10:~# zpool list
NAME       SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
raid3066  32.5T  18.1T  14.4T    55%  ONLINE  -
raid3154  32.5T  18.2T  14.3T    55%  ONLINE  -
raid3155  32.5T  18.7T  13.8T    57%  ONLINE  -
raid3156  32.5T  22.0T  10.5T    67%  ONLINE  -
rpool     59.5G  14.1G  45.4G    23%  ONLINE  -

We are using silmech storform iserv r505 machines with 3x silmech  
storform D55J jbod sas expanders connected to LSI Logic SAS1068E B3  
esas cards all containing 1.5TB seagate 7200.11 sata hard drives.  We  
make a single striped raidz2 pool out of each chassis giving us ~29TB  
of storage out of each ''brick'' and we use rsync to copy the
data from
the machines to be backed up.

They''re currently running OpenSolaris 2009.06 (snv_111b)

We have had issues with the backplanes on these machines, but this  
particular machine has been up and running for nearly a year without  
any problems.  It''s currently at about 50% capacity on all pools.

I''m not really sure how to proceed from here as far as getting debug  
information while it''s hung like this.  I saw someone with similar  
issues post a few days ago but don''t see any replies.  The thread  
title is [zfs-discuss] Problem with resilvering and faulty disk.   
We''ve been seeing that issue as well while rebuilding these drives.

Any assistance with this would be greatly appreciated, and any  
information you folks might need to help troubleshoot this issue I can  
provide, just let me know what you need!

-Jeremy

Jeremy Kitchen

2009-Oct-26 17:58 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Jeremy Kitchen wrote:> Hey folks!
> 
> We''re using zfs-based file servers for our backups and
we''ve been having
> some issues as of late with certain situations causing zfs/zpool
> commands to hang.
anyone?  this is happening right now and because we''re doing a restore
I
can''t reboot the machine, so it''s a prime opportunity to get
debugging
information if it''ll help.

Thanks!

-Jeremy


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091026/5f0e9655/attachment.bin>

Cindy Swearingen

2009-Oct-26 23:00 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Hi Jeremy,

Can you use the command below and send me the output, please?

Thanks,

Cindy

# mdb -k
 > ::stacks -m zfs

On 10/26/09 11:58, Jeremy Kitchen wrote:> Jeremy Kitchen wrote:
>> Hey folks!
>>
>> We''re using zfs-based file servers for our backups and
we''ve been having
>> some issues as of late with certain situations causing zfs/zpool
>> commands to hang.
> 
> anyone?  this is happening right now and because we''re doing a
restore I
> can''t reboot the machine, so it''s a prime opportunity to
get debugging
> information if it''ll help.
> 
> Thanks!
> 
> -Jeremy
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jeremy Kitchen

2009-Oct-26 23:28 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Cindy Swearingen wrote:> > Hi Jeremy,
> >
> > Can you use the command below and send me the output, please?
> >
> > Thanks,
> >
> > Cindy
> >
> > # mdb -k
>> >> ::stacks -m zfs
ack!  it *just* fully died.  I''ve had our noc folks reset the machine
and I will get this info to you as soon as it happens again (I''m fairly
certain it will, if not on this specific machine, one of our other
machines!)

-Jeremy


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091026/a26d5952/attachment.bin>

Cindy Swearingen

2009-Oct-27 14:28 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Jeremy,

I generally suspect device failures in this case and if possible,
review the contents of /var/adm/messages and fmdump -eV to see
if the pool hang could be attributed to failed or failing devices.

Cindy



On 10/26/09 17:28, Jeremy Kitchen wrote:> Cindy Swearingen wrote:
>>> Hi Jeremy,
>>>
>>> Can you use the command below and send me the output, please?
>>>
>>> Thanks,
>>>
>>> Cindy
>>>
>>> # mdb -k
>>>>> ::stacks -m zfs
> 
> ack!  it *just* fully died.  I''ve had our noc folks reset the
machine
> and I will get this info to you as soon as it happens again (I''m
fairly
> certain it will, if not on this specific machine, one of our other
> machines!)
> 
> -Jeremy
> 
>

Jeremy Kitchen

2009-Oct-27 18:04 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Cindy Swearingen wrote:> Jeremy,
> 
> I generally suspect device failures in this case and if possible,
> review the contents of /var/adm/messages and fmdump -eV to see
> if the pool hang could be attributed to failed or failing devices.
perusing /var/adm/messages, I see:

Oct 22 05:06:11 homiebackup10 scsi: [ID 365881 kern.info]
/pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
Oct 22 05:06:11 homiebackup10   Log info 0x31080000 received for target 5.
Oct 22 05:06:11 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0x0
Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info]
/pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
Oct 22 05:06:19 homiebackup10   Log info 0x31080000 received for target 5.
Oct 22 05:06:19 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0x1
Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info]
/pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
Oct 22 05:06:19 homiebackup10   Log info 0x31080000 received for target 5.
Oct 22 05:06:19 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0x0

lots of messages like that just prior to rsync warnings:

Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning]
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning]
rsync error: error in rsync protocol data stream (code 12) at io.c(453)
[receiver=2.6.9]
Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning]
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning]
rsync error: error in rsync protocol data stream (code 12) at io.c(453)
[receiver=2.6.9]
Oct 22 06:25:27 homiebackup10 rsyncd[776]: [ID 702911 daemon.warning]
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]

I think the rsync warnings are indicative of the pool being hung.  So it
would seem that the bus is freaking out and then the pool dies and
that''s that?  The strange thing is that this machine is way underloaded
compared to another one we have (which has 5 shelves, so ~150TB of
storage attached) which hasn''t really had any problems like this.  We
had issues with that one when rebuilding drives, but it''s been pretty
stable since.

looking at fmdump -eV, I see lots and lots of these:

Oct 24 2009 05:02:54.098815545 ereport.io.scsi.cmd.disk.tran
nvlist version: 0
        class = ereport.io.scsi.cmd.disk.tran
        ena = 0x882108543f200401
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = dev
                device-path = /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at 0/sd
at 30,0
        (end detector)

        driver-assessment = retry
        op-code = 0x28
        cdb = 0x28 0x0 0x51 0x9c 0xa5 0x80 0x0 0x0 0x80 0x0
        pkt-reason = 0x4
        pkt-state = 0x0
        pkt-stats = 0x10
        __ttl = 0x1
        __tod = 0x4ae2ecee 0x5e3ce39



always with the same device name.  So, it would appear that the drive at
 that location is probably broken, and zfs just isn''t detecting it
properly?

Also, I''m wondering if this is related to the thread just recently
titled [zfs-discuss] SNV_125 MPT warning in logfile, as we''re using the
same controller that person mentions.

We''re going to order some beefier controllers with the next shipment,
any suggestions on what to get?  If we find that the new controllers
work much better, we may even go as far as replacing the ones in the
existing machines (or at least any machines experiencing these issues).

We''re not married to LSI, but we use LSI controllers in our webservers
for the most part and they''re pretty solid there (though admittedly
those are hardware raid, rather than JBOD)

Thanks so much for your help!

-Jeremy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091027/f8f3c403/attachment.bin>

Jeremy Kitchen

2009-Oct-27 19:13 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Jeremy Kitchen wrote:> Cindy Swearingen wrote:
>> Jeremy,
>>
>> I generally suspect device failures in this case and if possible,
>> review the contents of /var/adm/messages and fmdump -eV to see
>> if the pool hang could be attributed to failed or failing devices.
> 
> perusing /var/adm/messages, I see:
> 
> Oct 22 05:06:11 homiebackup10 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
> Oct 22 05:06:11 homiebackup10   Log info 0x31080000 received for target 5.
> Oct 22 05:06:11 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
> scsi_state=0x0
> Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
> Oct 22 05:06:19 homiebackup10   Log info 0x31080000 received for target 5.
> Oct 22 05:06:19 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
> scsi_state=0x1
> Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
> Oct 22 05:06:19 homiebackup10   Log info 0x31080000 received for target 5.
> Oct 22 05:06:19 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
> scsi_state=0x0
> 
> lots of messages like that just prior to rsync warnings:
> 
> Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning]
> rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
> Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning]
> rsync error: error in rsync protocol data stream (code 12) at io.c(453)
> [receiver=2.6.9]
> Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning]
> rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
> Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning]
> rsync error: error in rsync protocol data stream (code 12) at io.c(453)
> [receiver=2.6.9]
> Oct 22 06:25:27 homiebackup10 rsyncd[776]: [ID 702911 daemon.warning]
> rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
> 
> I think the rsync warnings are indicative of the pool being hung.  So it
> would seem that the bus is freaking out and then the pool dies and
> that''s that?  The strange thing is that this machine is way
underloaded
> compared to another one we have (which has 5 shelves, so ~150TB of
> storage attached) which hasn''t really had any problems like this. 
We
> had issues with that one when rebuilding drives, but it''s been
pretty
> stable since.
> 
> looking at fmdump -eV, I see lots and lots of these:
> 
> Oct 24 2009 05:02:54.098815545 ereport.io.scsi.cmd.disk.tran
> nvlist version: 0
>         class = ereport.io.scsi.cmd.disk.tran
>         ena = 0x882108543f200401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at
0/sd at 30,0
>         (end detector)
> 
>         driver-assessment = retry
>         op-code = 0x28
>         cdb = 0x28 0x0 0x51 0x9c 0xa5 0x80 0x0 0x0 0x80 0x0
>         pkt-reason = 0x4
>         pkt-state = 0x0
>         pkt-stats = 0x10
>         __ttl = 0x1
>         __tod = 0x4ae2ecee 0x5e3ce39
so doing some more reading here on the list and mucking about a bit
more, I''ve come across this in the fmdump log:

Oct 22 2009 05:03:56.687818542 ereport.fs.zfs.io
nvlist version: 0
        class = ereport.fs.zfs.io
        ena = 0x99eb889c6fe00001
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x90ed10dfd0191c3b
                vdev = 0xf41193d6d1deedc2
        (end detector)

        pool = raid3155
        pool_guid = 0x90ed10dfd0191c3b
        pool_context = 0
        pool_failmode = wait
        vdev_guid = 0xf41193d6d1deedc2
        vdev_type = disk
        vdev_path = /dev/dsk/c6t5d0s0
        vdev_devid = id1,sd at n5000c50010a7666b/a
        parent_guid = 0xcbaa8ea60a3c133
        parent_type = raidz
        zio_err = 5
        zio_offset = 0xab2901da00
        zio_size = 0x200
        zio_objset = 0x4b
        zio_object = 0xa26ef4
        zio_level = 0
        zio_blkid = 0xf
        __ttl = 0x1
        __tod = 0x4ae04a2c 0x28ff472e


c6t5d0 is in the problem pool (raid3155) so I''ve gone ahead and
offlined
the drive and will be replacing it shortly.  Hopefully that will take
care of the problem!

If this doesn''t solve the problem, do you have any suggestions on what
more I can look at to try to figure out what''s wrong?  Is there some
sort of setting I can set which will prevent the zpool from hanging up
the entire system in the event of a single drive failure like this?
It''s really annoying to not be able to log into the machine (and having
to forcefully reboot the machine) when this happens.

Thanks again for your help!

-Jeremy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20091027/cc0949df/attachment.bin>

Cindy Swearingen

2009-Oct-27 19:22 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Hi Jeremy,

The ereport.io.scsi.cmd.disk.tran is describing connections
problems to the /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at 0/sd at 30,0
device. I think the .tran suffix is for transient.

ZFS might be reporting problems with device as well, but if the
zpool/zfs commands are hanging, then it might be difficult to
get this confirmation. The zpool status command will report
device problems.

When a device in a pool fails, then I/O to the pool is blocked,
reads might be successful. See the failmode property description
in zpool.1m.

Is this pool redundant? If so, you can attempt to offline this
device until it is replaced. If you have another device available,
you might replace the suspect drive and see if that solves the
pool hang problem.

Cindy



On 10/27/09 12:04, Jeremy Kitchen wrote:> Cindy Swearingen wrote:
>> Jeremy,
>>
>> I generally suspect device failures in this case and if possible,
>> review the contents of /var/adm/messages and fmdump -eV to see
>> if the pool hang could be attributed to failed or failing devices.
> 
> perusing /var/adm/messages, I see:
> 
> Oct 22 05:06:11 homiebackup10 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
> Oct 22 05:06:11 homiebackup10   Log info 0x31080000 received for target 5.
> Oct 22 05:06:11 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
> scsi_state=0x0
> Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
> Oct 22 05:06:19 homiebackup10   Log info 0x31080000 received for target 5.
> Oct 22 05:06:19 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
> scsi_state=0x1
> Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info]
> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
> Oct 22 05:06:19 homiebackup10   Log info 0x31080000 received for target 5.
> Oct 22 05:06:19 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
> scsi_state=0x0
> 
> lots of messages like that just prior to rsync warnings:
> 
> Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning]
> rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
> Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning]
> rsync error: error in rsync protocol data stream (code 12) at io.c(453)
> [receiver=2.6.9]
> Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning]
> rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
> Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning]
> rsync error: error in rsync protocol data stream (code 12) at io.c(453)
> [receiver=2.6.9]
> Oct 22 06:25:27 homiebackup10 rsyncd[776]: [ID 702911 daemon.warning]
> rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
> 
> I think the rsync warnings are indicative of the pool being hung.  So it
> would seem that the bus is freaking out and then the pool dies and
> that''s that?  The strange thing is that this machine is way
underloaded
> compared to another one we have (which has 5 shelves, so ~150TB of
> storage attached) which hasn''t really had any problems like this. 
We
> had issues with that one when rebuilding drives, but it''s been
pretty
> stable since.
> 
> looking at fmdump -eV, I see lots and lots of these:
> 
> Oct 24 2009 05:02:54.098815545 ereport.io.scsi.cmd.disk.tran
> nvlist version: 0
>         class = ereport.io.scsi.cmd.disk.tran
>         ena = 0x882108543f200401
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = dev
>                 device-path = /pci at 0,0/pci8086,4025 at 5/pci1000,3140 at
0/sd at 30,0
>         (end detector)
> 
>         driver-assessment = retry
>         op-code = 0x28
>         cdb = 0x28 0x0 0x51 0x9c 0xa5 0x80 0x0 0x0 0x80 0x0
>         pkt-reason = 0x4
>         pkt-state = 0x0
>         pkt-stats = 0x10
>         __ttl = 0x1
>         __tod = 0x4ae2ecee 0x5e3ce39
> 
> 
> 
> always with the same device name.  So, it would appear that the drive at
>  that location is probably broken, and zfs just isn''t detecting it
properly?
> 
> Also, I''m wondering if this is related to the thread just recently
> titled [zfs-discuss] SNV_125 MPT warning in logfile, as we''re
using the
> same controller that person mentions.
> 
> We''re going to order some beefier controllers with the next
shipment,
> any suggestions on what to get?  If we find that the new controllers
> work much better, we may even go as far as replacing the ones in the
> existing machines (or at least any machines experiencing these issues).
> 
> We''re not married to LSI, but we use LSI controllers in our
webservers
> for the most part and they''re pretty solid there (though
admittedly
> those are hardware raid, rather than JBOD)
> 
> Thanks so much for your help!
> 
> -Jeremy
>

Cindy Swearingen

2009-Oct-27 20:31 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Jeremy,

I can''t comment on your hardware because I''m not familiar with
it.

If you have a storage pool with ZFS redundancy and one device fails
or begins failing, then the pool keeps going, in a degraded mode but
is generally available.

You can try setting the failmode property to continue, which would
allow reads to continue in case of a device failure, might prevent
the pool from hanging.

If offlining the disk or replacing the disk doesn''t help, let us know.

Cindy

On 10/27/09 13:13, Jeremy Kitchen wrote:> Jeremy Kitchen wrote:
>> Cindy Swearingen wrote:
>>> Jeremy,
>>>
>>> I generally suspect device failures in this case and if possible,
>>> review the contents of /var/adm/messages and fmdump -eV to see
>>> if the pool hang could be attributed to failed or failing devices.
>> perusing /var/adm/messages, I see:
>>
>> Oct 22 05:06:11 homiebackup10 scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
>> Oct 22 05:06:11 homiebackup10   Log info 0x31080000 received for target
5.
>> Oct 22 05:06:11 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
>> scsi_state=0x0
>> Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
>> Oct 22 05:06:19 homiebackup10   Log info 0x31080000 received for target
5.
>> Oct 22 05:06:19 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
>> scsi_state=0x1
>> Oct 22 05:06:19 homiebackup10 scsi: [ID 365881 kern.info]
>> /pci at 0,0/pci8086,4021 at 1/pci1000,3140 at 0 (mpt1):
>> Oct 22 05:06:19 homiebackup10   Log info 0x31080000 received for target
5.
>> Oct 22 05:06:19 homiebackup10   scsi_status=0x0, ioc_status=0x804b,
>> scsi_state=0x0
>>
>> lots of messages like that just prior to rsync warnings:
>>
>> Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning]
>> rsync: connection unexpectedly closed (0 bytes received so far)
[receiver]
>> Oct 22 05:55:29 homiebackup10 rsyncd[29746]: [ID 702911 daemon.warning]
>> rsync error: error in rsync protocol data stream (code 12) at io.c(453)
>> [receiver=2.6.9]
>> Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning]
>> rsync: connection unexpectedly closed (0 bytes received so far)
[receiver]
>> Oct 22 06:10:29 homiebackup10 rsyncd[178]: [ID 702911 daemon.warning]
>> rsync error: error in rsync protocol data stream (code 12) at io.c(453)
>> [receiver=2.6.9]
>> Oct 22 06:25:27 homiebackup10 rsyncd[776]: [ID 702911 daemon.warning]
>> rsync: connection unexpectedly closed (0 bytes received so far)
[receiver]
>>
>> I think the rsync warnings are indicative of the pool being hung.  So
it
>> would seem that the bus is freaking out and then the pool dies and
>> that''s that?  The strange thing is that this machine is way
underloaded
>> compared to another one we have (which has 5 shelves, so ~150TB of
>> storage attached) which hasn''t really had any problems like
this.  We
>> had issues with that one when rebuilding drives, but it''s been
pretty
>> stable since.
>>
>> looking at fmdump -eV, I see lots and lots of these:
>>
>> Oct 24 2009 05:02:54.098815545 ereport.io.scsi.cmd.disk.tran
>> nvlist version: 0
>>         class = ereport.io.scsi.cmd.disk.tran
>>         ena = 0x882108543f200401
>>         detector = (embedded nvlist)
>>         nvlist version: 0
>>                 version = 0x0
>>                 scheme = dev
>>                 device-path = /pci at 0,0/pci8086,4025 at
5/pci1000,3140 at 0/sd at 30,0
>>         (end detector)
>>
>>         driver-assessment = retry
>>         op-code = 0x28
>>         cdb = 0x28 0x0 0x51 0x9c 0xa5 0x80 0x0 0x0 0x80 0x0
>>         pkt-reason = 0x4
>>         pkt-state = 0x0
>>         pkt-stats = 0x10
>>         __ttl = 0x1
>>         __tod = 0x4ae2ecee 0x5e3ce39
> 
> so doing some more reading here on the list and mucking about a bit
> more, I''ve come across this in the fmdump log:
> 
> Oct 22 2009 05:03:56.687818542 ereport.fs.zfs.io
> nvlist version: 0
>         class = ereport.fs.zfs.io
>         ena = 0x99eb889c6fe00001
>         detector = (embedded nvlist)
>         nvlist version: 0
>                 version = 0x0
>                 scheme = zfs
>                 pool = 0x90ed10dfd0191c3b
>                 vdev = 0xf41193d6d1deedc2
>         (end detector)
> 
>         pool = raid3155
>         pool_guid = 0x90ed10dfd0191c3b
>         pool_context = 0
>         pool_failmode = wait
>         vdev_guid = 0xf41193d6d1deedc2
>         vdev_type = disk
>         vdev_path = /dev/dsk/c6t5d0s0
>         vdev_devid = id1,sd at n5000c50010a7666b/a
>         parent_guid = 0xcbaa8ea60a3c133
>         parent_type = raidz
>         zio_err = 5
>         zio_offset = 0xab2901da00
>         zio_size = 0x200
>         zio_objset = 0x4b
>         zio_object = 0xa26ef4
>         zio_level = 0
>         zio_blkid = 0xf
>         __ttl = 0x1
>         __tod = 0x4ae04a2c 0x28ff472e
> 
> 
> c6t5d0 is in the problem pool (raid3155) so I''ve gone ahead and
offlined
> the drive and will be replacing it shortly.  Hopefully that will take
> care of the problem!
> 
> If this doesn''t solve the problem, do you have any suggestions on
what
> more I can look at to try to figure out what''s wrong?  Is there
some
> sort of setting I can set which will prevent the zpool from hanging up
> the entire system in the event of a single drive failure like this?
> It''s really annoying to not be able to log into the machine (and
having
> to forcefully reboot the machine) when this happens.
> 
> Thanks again for your help!
> 
> -Jeremy
>

Scott Meilicke

2009-Oct-28 16:16 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

Hi Jeremy,

I had a loosely similar problem with my 2009.06 box. In my case (which may not
be yours), working with support we found a bug that was causing my pool to hang.
I also got erroneous errors when I did a scrub ( 3 x 5 disk raidz). I am using
the same LSI controller. A sure fire way to kill the box was to setup a file
system as an iSCSI target, and write a lot of data to it, around 1-2MB/s. It
would usually die inside of a few hours. NFS writing was not as bad, but within
a day it would panic there too.

The solution for me was to upgrade to 124. Since the upgrade three weeks ago, I
have had no problems.

Again, I don''t know if this would fix your problem, but it may be worth
a try. Just don''t upgrade your ZFS version, and you will be able to
roll back to 2009.06 at any time.

-Scott
-- 
This message posted from opensolaris.org

Jeremy Kitchen

2009-Oct-28 18:01 UTC

head link

[zfs-discuss] zpool getting in a stuck state?

On Oct 26, 2009, at 4:00 PM, Cindy Swearingen wrote:
> Hi Jeremy,
>
> Can you use the command below and send me the output, please?
>
> Thanks,
>
> Cindy
>
> # mdb -k
> > ::stacks -m zfs
Ok, it did it again.  I replaced the drive and it''s currently  
resilvering (13.66% done, 135h33m to go it says) and the output of  
that command is this:

 > ::stacks -m zfs
THREAD           STATE    SOBJ                COUNT
ffffff02f0078a80 SLEEP    MUTEX                  31
                  swtch+0x147
                  turnstile_block+0x764
                  mutex_vector_enter+0x261
                  zfs_zget+0x47
                  zfs_root+0x57
                  fsop_root+0x2e
                  traverse+0x61
                  lookuppnvp+0x423
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  lookupname+0x28
                  chroot+0x30
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efa33500 SLEEP    CV                     18
                  swtch+0x147
                  cv_wait+0x61
                  dbuf_read+0x237
                  dmu_buf_hold+0x96
                  zap_lockdir+0x67
                  zap_lookup_norm+0x55
                  zap_lookup+0x2d
                  zfs_match_find+0xfd
                  zfs_dirent_lock+0x3d1
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  stat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff02da1a91e0 SLEEP    CV                      9
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dmu_buf_hold+0x96
                  zap_lockdir+0x67
                  zap_lookup_norm+0x55
                  zap_lookup+0x2d
                  zfs_match_find+0xfd
                  zfs_dirent_lock+0x3d1
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  stat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d8c46ac0 SLEEP    CV                      7
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dbuf_findbp+0xe7
                  dbuf_hold_impl+0x81
                  dbuf_findbp+0xcf
                  dbuf_hold_impl+0x81
                  dbuf_hold+0x2e
                  dnode_hold_impl+0xb5
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_root+0x57
                  fsop_root+0x2e
                  traverse+0x61
                  lookuppnvp+0x423
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  lookupname+0x28
                  chroot+0x30
                  _sys_sysenter_post_swapgs+0x14b

ffffff02da1a2720 SLEEP    CV                      6
                  swtch+0x147
                  cv_wait+0x61
                  txg_wait_open+0x7a
                  dmu_tx_wait+0xb3
                  dmu_tx_assign+0x4b
                  zfs_inactive+0xa8
                  fop_inactive+0xaf
                  vn_rele+0x5f
                  closef+0x75
                  closeandsetf+0x44a
                  close+0x18
                  _sys_sysenter_post_swapgs+0x14b

ffffff000f61bc60 SLEEP    CV                      5
                  swtch+0x147
                  cv_wait+0x61
                  txg_thread_wait+0x5f
                  txg_quiesce_thread+0x94
                  thread_start+8

ffffff02d8514aa0 SLEEP    CV                      5
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dbuf_findbp+0xe7
                  dbuf_hold_impl+0x81
                  dbuf_findbp+0xcf
                  dbuf_hold_impl+0x81
                  dbuf_findbp+0xcf
                  dbuf_hold_impl+0x81
                  dbuf_hold+0x2e
                  dnode_hold_impl+0xb5
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_root+0x57
                  fsop_root+0x2e
                  traverse+0x61
                  lookuppnvp+0x423
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  lookupname+0x28
                  chroot+0x30
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d89b4c20 SLEEP    CV                      3
                  swtch+0x147
                  cv_wait+0x61
                  txg_wait_synced+0x7f
                  dmu_tx_wait+0xcd
                  zfs_create+0x44d
                  fop_create+0xfc
                  vn_createat+0x5e1
                  vn_openat+0x1fb
                  copen+0x418
                  open64+0x34
                  _sys_sysenter_post_swapgs+0x14b

ffffff02dc0a5400 SLEEP    CV                      3
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dmu_buf_hold+0x96
                  zap_get_leaf_byblk+0x56
                  zap_deref_leaf+0x78
                  fzap_lookup+0x78
                  zap_lookup_norm+0x116
                  zap_lookup+0x2d
                  zfs_match_find+0xfd
                  zfs_dirent_lock+0x3d1
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0xd2
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  lstat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efc771a0 SLEEP    CV                      3
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dmu_buf_hold_array_by_dnode+0x220
                  dmu_buf_hold_array+0x73
                  dmu_read_uio+0x4d
                  zfs_read+0x19a
                  fop_read+0x6b
                  rfs3_read+0x393
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff001076dc60 SLEEP    CV                      3
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dsl_pool_sync+0xe1
                  spa_sync+0x32a
                  txg_sync_thread+0x265
                  thread_start+8

ffffff02d8c4d900 SLEEP    CV                      2
                  swtch+0x147
                  cv_wait+0x61
                  txg_wait_open+0x7a
                  dmu_tx_wait+0xb3
                  zfs_write+0x3bd
                  fop_write+0x6b
                  write+0x2e2
                  write32+0x22
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efa98e80 SLEEP    CV                      2
                  swtch+0x147
                  cv_wait+0x61
                  zil_commit+0x62
                  zfs_putpage+0x2c9
                  fop_putpage+0x74
                  rfs3_commit+0x180
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02ef89a3c0 SLEEP    CV                      2
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dbuf_findbp+0xe7
                  dbuf_hold_impl+0x81
                  dbuf_findbp+0xcf
                  dbuf_hold_impl+0x81
                  dbuf_hold+0x2e
                  dnode_hold_impl+0xb5
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_root+0x57
                  fsop_root+0x2e
                  traverse+0x61
                  lookuppnvp+0x423
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  stat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efb24b00 SLEEP    CV                      2
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dmu_buf_hold+0x96
                  zap_lockdir+0x67
                  zap_cursor_retrieve+0x74
                  zfs_readdir+0x29e
                  fop_readdir+0xab
                  getdents64+0xbc
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efce3c40 SLEEP    CV                      2
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dnode_hold_impl+0xd9
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_vget+0x25e
                  fsop_vget+0x67
                  nfs3_fhtovp+0x47
                  rfs3_getattr+0x40
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efc763a0 SLEEP    MUTEX                   2
                  swtch+0x147
                  turnstile_block+0x764
                  mutex_vector_enter+0x261
                  zfs_zget+0x47
                  zfs_root+0x57
                  fsop_root+0x2e
                  traverse+0x61
                  lookuppnvp+0x423
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  stat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff000f531c60 SLEEP    CV                      1
                  swtch+0x147
                  cv_timedwait+0xba
                  arc_reclaim_thread+0x17b
                  thread_start+8

ffffff000f537c60 SLEEP    CV                      1
                  swtch+0x147
                  cv_timedwait+0xba
                  l2arc_feed_thread+0xa5
                  thread_start+8

ffffff000f621c60 SLEEP    CV                      1
                  swtch+0x147
                  cv_timedwait+0xba
                  txg_thread_wait+0x7b
                  txg_sync_thread+0x114
                  thread_start+8

ffffff02d8c4ee60 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  dmu_buf_hold_array_by_dnode+0x2b7
                  dmu_buf_hold_array+0x73
                  dmu_read_uio+0x4d
                  zfs_read+0x19a
                  fop_read+0x6b
                  rfs3_read+0x393
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02dc0a6c80 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  spa_config_enter+0x7d
                  spa_vdev_enter+0x2f
                  spa_vdev_setpath+0x32
                  zfs_ioc_vdev_setpath+0x48
                  zfsdev_ioctl+0x10b
                  cdev_ioctl+0x45
                  spec_ioctl+0x83
                  fop_ioctl+0x7b
                  ioctl+0x18e
                  _sys_sysenter_post_swapgs+0x14b

ffffff052a2551c0 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  txg_wait_open+0x7a
                  dmu_tx_wait+0xb3
                  zfs_setattr+0xca8
                  fop_setattr+0xad
                  vpsetattr+0x12a
                  fdsetattr+0x30
                  fchmod+0x3a
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d89b7a80 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  txg_wait_open+0x7a
                  dmu_tx_wait+0xb3
                  zfs_write+0x3bd
                  fop_write+0x6b
                  rfs3_write+0x507
                  common_dispatch+0x4a7
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02dc0a5780 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  txg_wait_synced+0x7f
                  dmu_tx_wait+0xcd
                  zfs_rename+0x553
                  fop_rename+0xc5
                  vn_renameat+0x2ff
                  vn_rename+0x2f
                  rename+0x17
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d8c34720 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  txg_wait_synced+0x7f
                  spa_vdev_state_exit+0x4e
                  vdev_fault+0xde
                  zfs_ioc_vdev_set_state+0xa6
                  zfsdev_ioctl+0x10b
                  cdev_ioctl+0x45
                  spec_ioctl+0x83
                  fop_ioctl+0x7b
                  ioctl+0x18e
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efa3f540 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zil_commit+0x62
                  zfs_fsync+0xd3
                  fop_fsync+0x5a
                  rfs3_create+0x767
                  common_dispatch+0x4a7
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efa3e3c0 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zil_commit+0x62
                  zfs_fsync+0xd3
                  fop_fsync+0x5a
                  rfs3_setattr+0x447
                  common_dispatch+0x4a7
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efc77520 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dbuf_findbp+0xe7
                  dbuf_hold_impl+0x81
                  dbuf_findbp+0xcf
                  dbuf_hold_impl+0x81
                  dbuf_findbp+0xcf
                  dbuf_hold_impl+0x81
                  dbuf_hold+0x2e
                  dnode_hold_impl+0xb5
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_root+0x57
                  fsop_root+0x2e
                  traverse+0x61
                  lookuppnvp+0x423
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  stat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff02ef791c60 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dbuf_findbp+0xe7
                  dbuf_hold_impl+0x81
                  dbuf_hold+0x2e
                  dnode_hold_impl+0xb5
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_root+0x57
                  fsop_root+0x2e
                  traverse+0x61
                  lookuppnvp+0x423
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  lookupname+0x28
                  chroot+0x30
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d8c40b00 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dbuf_findbp+0xe7
                  dbuf_hold_impl+0x81
                  dbuf_hold+0x2e
                  dnode_hold_impl+0xb5
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_vget+0x25e
                  fsop_vget+0x67
                  nfs3_fhtovp+0x47
                  rfs3_getattr+0x40
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efb398e0 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dbuf_findbp+0xe7
                  dbuf_prefetch+0x93
                  dmu_zfetch_fetch+0x62
                  dmu_zfetch_dofetch+0xb8
                  dmu_zfetch_find+0x394
                  dmu_zfetch+0xac
                  dbuf_read+0x170
                  dnode_hold_impl+0xd9
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_dirent_lock+0x3fc
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  lstat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff0372bdac60 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dmu_buf_hold+0x96
                  zap_get_leaf_byblk+0x56
                  zap_deref_leaf+0x78
                  fzap_lookup+0x78
                  zap_lookup_norm+0x116
                  zap_lookup+0x2d
                  zfs_match_find+0xfd
                  zfs_dirent_lock+0x3d1
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  lstat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff02da1a4380 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dmu_buf_hold+0x96
                  zap_lockdir+0x67
                  zap_lookup_norm+0x55
                  zap_lookup+0x2d
                  zfs_match_find+0xfd
                  zfs_dirent_lock+0x3d1
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  lstat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efce3540 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dmu_buf_hold+0x96
                  zap_lockdir+0x67
                  zap_lookup_norm+0x55
                  zap_lookup+0x2d
                  zfs_match_find+0xfd
                  zfs_dirent_lock+0x3d1
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  rfs3_lookup+0x395
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d8c44e60 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dmu_buf_hold+0x96
                  zap_table_load+0x76
                  zap_idx_to_blk+0x56
                  zap_deref_leaf+0x60
                  fzap_lookup+0x78
                  zap_lookup_norm+0x116
                  zap_lookup+0x2d
                  zfs_match_find+0xfd
                  zfs_dirent_lock+0x3d1
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0xd2
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  lstat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d8c45c60 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dnode_hold_impl+0xd9
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_dirent_lock+0x3fc
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  rfs3_lookup+0x395
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02da1a7c80 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dnode_hold_impl+0xd9
                  dnode_hold+0x2b
                  dmu_bonus_hold+0x36
                  zfs_zget+0x5a
                  zfs_root+0x57
                  fsop_root+0x2e
                  traverse+0x61
                  lookuppnvp+0x423
                  lookuppnat+0x12c
                  lookupnameat+0x91
                  lookupname+0x28
                  chroot+0x30
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d8c50ac0 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dbuf_read+0x1e8
                  dnode_hold_impl+0xd9
                  dnode_hold+0x2b
                  dmu_object_info+0x30
                  bplist_open+0x36
                  dsl_dataset_get_ref+0x139
                  dsl_dataset_hold+0x100
                  dmu_objset_prefetch+0x24
                  findfunc+0x23
                  dmu_objset_find_spa+0x30d
                  dmu_objset_find+0x40
                  zfs_ioc_snapshot_list_next+0x59
                  zfsdev_ioctl+0x10b
                  cdev_ioctl+0x45
                  spec_ioctl+0x83
                  fop_ioctl+0x7b
                  ioctl+0x18e
                  _sys_sysenter_post_swapgs+0x14b

ffffff02efa3fc40 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dmu_tx_hold_free+0x19f
                  zfs_remove+0x2b9
                  fop_remove+0xa5
                  vn_removeat+0x2f0
                  vn_remove+0x2c
                  unlink+0x19
                  _sys_sysenter_post_swapgs+0x14b

ffffff001073dc60 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  dsl_pool_sync+0x1f6
                  spa_sync+0x3cd
                  txg_sync_thread+0x265
                  thread_start+8

ffffff02d8c40780 SLEEP    CV                      1
                  swtch+0x147
                  cv_wait+0x61
                  zio_wait+0x5d
                  zil_commit_writer+0x2ac
                  zil_commit+0x8c
                  zfs_putpage+0x2c9
                  fop_putpage+0x74
                  rfs3_commit+0x180
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff02d89bac80 SLEEP    MUTEX                   1
                  swtch+0x147
                  turnstile_block+0x764
                  mutex_vector_enter+0x261
                  zfs_zget+0x47
                  zfs_vget+0x25e
                  fsop_vget+0x67
                  nfs3_fhtovp+0x47
                  rfs3_access+0x4a
                  common_dispatch+0x3a0
                  rfs_dispatch+0x2d
                  svc_getreq+0x19c
                  svc_run+0x16b
                  svc_do_run+0x81
                  nfssys+0x765
                  _sys_sysenter_post_swapgs+0x14b

ffffff0010adfc60 SLEEP    MUTEX                   1
                  swtch+0x147
                  turnstile_block+0x764
                  mutex_vector_enter+0x261
                  zfs_zinactive+0x3c
                  zfs_inactive+0xee
                  fop_inactive+0xaf
                  vn_rele_dnlc+0x6c
                  do_dnlc_reduce_cache+0x16a
                  taskq_d_thread+0xb1
                  thread_start+8

ffffff02da197560 SLEEP    CV                      1
                  zfs_lookup+0xb1
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0xd2
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  lstat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b

ffffff000fc92c60 SLEEP    CV                      1
                  zio_destroy+0x5c
                  zio_done+0xc4
                  zio_execute+0xa0
                  zio_notify_parent+0xa6
                  zio_done+0x2c9
                  zio_execute+0xa0
                  taskq_thread+0x193
                  thread_start+8

ffffff001006ac60 SLEEP    CV                      1
                  zio_vdev_io_done+0x62
                  zio_execute+0xa0
                  taskq_thread+0x193
                  thread_start+8

ffffff052a254e40 ONPROC   <NONE>                  1
                  apic_intr_exit+0x32
                  hilevel_intr_epilog+0x123
                  do_interrupt+0xe9
                  _sys_rtt_ints_disabled+8
                  dmu_zfetch_colinear+0x87
                  dmu_zfetch+0xd3
                  dbuf_read+0x272
                  dnode_hold_impl+0x22b
                  dnode_hold+0x2b
                  dmu_buf_hold+0x75
                  zap_get_leaf_byblk+0x56
                  zap_deref_leaf+0x78
                  fzap_lookup+0x78
                  zap_lookup_norm+0x116
                  zap_lookup+0x2d
                  zfs_match_find+0xfd
                  zfs_dirent_lock+0x3d1
                  zfs_dirlook+0xd9
                  zfs_lookup+0x104
                  fop_lookup+0xed
                  lookuppnvp+0x3a3
                  lookuppnat+0x12c
                  lookupnameat+0xd2
                  cstatat_getvp+0x164
                  cstatat64_32+0x82
                  lstat64_32+0x31
                  _sys_sysenter_post_swapgs+0x14b


-Jeremy

zfs discuss - Oct 2009 - zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?

[zfs-discuss] zpool getting in a stuck state?