thr3ads.net - zfs discuss - [zfs-discuss] zpool import hanging [May 2010]

If this information is useful, please help other people find it:
Share via:

Eduardo Bragatto

2010-May-07 18:57 UTC

[zfs-discuss] zpool import hanging

Hi everyone,

I have a backup server with a considerably large pool > 40TB running
on a Solaris 10 5/09 s10x_u7wos_08 (x86_64). All disks are 2TB SATA
harddrivers. Here''s how it is configured:

# zpool import
pool: backup
id: 9395034695502046623
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

backup ONLINE
raidz1 ONLINE
c3t102d0 ONLINE
c3t103d0 ONLINE
c3t104d0 ONLINE
c3t105d0 ONLINE
c3t106d0 ONLINE
c3t107d0 ONLINE
c3t108d0 ONLINE
raidz1 ONLINE
c3t109d0 ONLINE
c3t110d0 ONLINE
c3t111d0 ONLINE
c3t112d0 ONLINE
c3t113d0 ONLINE
c3t114d0 ONLINE
c3t115d0 ONLINE
raidz1 ONLINE
c3t87d0 ONLINE
c3t88d0 ONLINE
c3t89d0 ONLINE
c3t90d0 ONLINE
c3t91d0 ONLINE
c3t92d0 ONLINE
c3t93d0 ONLINE
raidz1 ONLINE
c3t94d0 ONLINE
c3t95d0 ONLINE
c3t96d0 ONLINE
c3t97d0 ONLINE
c3t98d0 ONLINE
c3t99d0 ONLINE
c3t100d0 ONLINE
spares
c3t116d0
c3t101d0

This server suffered an unexpected crash and after a power cycle it
refused to mount all ZFS filesystems during the boot procedure,
keeping Solaris from initiating the network services (such as SSH). On
every attempt to mount all filesystems, it stops at the same point and
I see the numbers "190/339" in the progress indicator during boot (I
have 399 zfs filesystems within the only pool on that system).

I rebooted the server in failsafe mode, but that did not load my JBOD
controller''s drivers and I can''t access the pool. So, to allow
Solaris
to boot normally, I have moved /etc/zfs/zpool.cache away and restarted
the OS.

After that change, no ZFS filesystems were attempted to be mounted at
boot and "zfs import" would show me the information above.

So I went ahead and started importing it with: "zpool import backup",
however that hangs and never completes. I noticed while the process is
hanging the server is still responsive to all non-zfs related commands
and some zfs commands as well. For example, I can run "zfs list" and
"zpool status" and it does show the entire pool.

The process refuses to die with a traditional "kill -9", so I rebooted
the server and repeated the command using "truss". It was clear that a
lot of ZFS filesystem were mounting cleanly and it was hanging only on
a specific filesystem (even if I reboot everything and start again the
exact same behavior is seen). Attached you will find the "zpool-import-
output.txt" file. The last filesystem it tries to mount is "backup/
insightiq".

It''s interesting to note that while the "zpool import backup"
command
is running, I can access all the zfs filesystems it already mounted --
and I can manually mount others using "zfs mount". There''s
only one
filesystem that causes "mount" / "zpool import" to hang.

Because the filesystem in question (backup/insightiq) is being
accessed by the "zpool import" hanging command, I can not try to
simply destroy it. I also can not use "zfs set mount=noauto" on that
zfs (or any other one), while "zpool import" is running.

Additionally, I''m attaching "threads-list.txt" which is the
output of:
echo "::threadlist -v" | mdb -k -- and also "zdb-output.txt"
which is
the output of: zdb

Does anyone know what''s my next step here? I need to restore the pool
as soon as possible (I didn''t want to, but at this point I''m
cool with
destroying the zfs filesystem with the problem if that saves me the
entire pool).

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: zpool-import-output.txt
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100507/0d99e3dd/attachment.txt>
-------------- next part --------------

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: threads-list.txt
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100507/0d99e3dd/attachment-0001.txt>
-------------- next part --------------

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: zdb-output.txt
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100507/0d99e3dd/attachment-0002.txt>
-------------- next part --------------

Any help will be greatly appreciated.

Thanks in advance,
Eduardo Bragatto.

Eduardo Bragatto

2010-May-09 15:43 UTC

head link

[zfs-discuss] zpool import hanging

Additionally, I would like to mention that the only ZFS filesystem not  
mounting -- causing the entire "zpool import backup" command to hang,
is the only filesystem configured to be exported via NFS:

backup/insightiq  sharenfs         root=*                 local

Is there any chance the NFS share is the culprit here? If so, how to  
avoid it?

Thanks,
Eduardo Bragatto

Eduardo Bragatto

2010-May-10 19:35 UTC

head link

[zfs-discuss] zpool import hanging

Hi again,

As for the NFS issue I mentioned before, I made sure the NFS server  
was working and was able to export before I attempted to import  
anything, then I started a new "zpool import backup: -- my hope was  
that the NFS share was causing the issue, since the only filesystem  
shared is the one causing the problem, but that doesn''t seem to be the
case.

I''ve done a lot of research and could not find a similar case to mine.
The most similar one I''ve found was this from 2008:

http://opensolaris.org/jive/thread.jspa?threadID=70205&tstart=15

I simply can not import the pool although ZFS reports it as OK.

In that old thread, the user was also having the "zpool import" hang  
issue, however he was able to run these two commands (his pool was  
named data1):

zdb -e -bb data1
zdb -e -dddd data1

While my system returns:

# zdb -e -bb backup
zdb: can''t open backup: File exists
# zdb -e -ddd backup
zdb: can''t open backup: File exists

Every documentation assumes you will be able to run "zpool import"  
before troubleshooting, however my problem is exactly on that command.  
I don''t even know where to find more detailed documentation.

I believe there''s very knowledgeable people in this list. Could  
someone be kind enough to take a look and at least point me in the  
right direction?

Thanks,
Eduardo Bragatto.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100510/8b80b26e/attachment.html>

John Balestrini

2010-May-10 19:46 UTC

head link

[zfs-discuss] zpool import hanging

Howdy Eduardo,

Recently I had a similar issue where the pool wouldn''t import and
attempting to import it would essentially lock the server up. Finally I used
pfexec zpool import -F pool1 and simply let it do it''s thing. After
almost 60 hours the imported finished and all has been well since (except my
backup procedures have improved!).

Good luck!

John




On May 10, 2010, at 12:35 PM, Eduardo Bragatto wrote:
> Hi again,
> 
> As for the NFS issue I mentioned before, I made sure the NFS server was
working and was able to export before I attempted to import anything, then I
started a new "zpool import backup: -- my hope was that the NFS share was
causing the issue, since the only filesystem shared is the one causing the
problem, but that doesn''t seem to be the case.
> 
> I''ve done a lot of research and could not find a similar case to
mine. The most similar one I''ve found was this from 2008:
> 
> http://opensolaris.org/jive/thread.jspa?threadID=70205&tstart=15
> 
> I simply can not import the pool although ZFS reports it as OK.
> 
> In that old thread, the user was also having the "zpool import"
hang issue, however he was able to run these two commands (his pool was named
data1):
> 
> zdb -e -bb data1
> zdb -e -dddd data1
> 
> While my system returns:
> 
> # zdb -e -bb backup
> zdb: can''t open backup: File exists
> # zdb -e -ddd backup
> zdb: can''t open backup: File exists
> 
> Every documentation assumes you will be able to run "zpool
import" before troubleshooting, however my problem is exactly on that
command. I don''t even know where to find more detailed documentation.
> 
> I believe there''s very knowledgeable people in this list. Could
someone be kind enough to take a look and at least point me in the right
direction?
> 
> Thanks,
> Eduardo Bragatto.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100510/69383905/attachment.html>

Eduardo Bragatto

2010-May-10 20:22 UTC

head link

[zfs-discuss] zpool import hanging

On May 10, 2010, at 4:46 PM, John Balestrini wrote:
> Recently I had a similar issue where the pool wouldn''t import and
> attempting to import it would essentially lock the server up.  
> Finally I used pfexec zpool import -F pool1 and simply let it do  
> it''s thing. After almost 60 hours the imported finished and all
has
> been well since (except my backup procedures have improved!).
Hey John,

thanks a lot for answering -- I already allowed the "zpool import"  
command to run from Friday to Monday and it did not complete -- I also  
made sure to start it using "truss" and literally nothing has happened
during that time (the truss output file does not have anything new).

While the "zpool import" command runs, I don''t see any CPU or
Disk I/O
usage. "zpool iostat" shows very little I/O too:

# zpool iostat -v
                  capacity     operations    bandwidth
pool           used  avail   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
backup        31.4T  19.1T     11      2  29.5K  11.8K
   raidz1      11.9T   741G      2      0  3.74K  3.35K
     c3t102d0      -      -      0      0  23.8K  1.99K
     c3t103d0      -      -      0      0  23.5K  1.99K
     c3t104d0      -      -      0      0  23.0K  1.99K
     c3t105d0      -      -      0      0  21.3K  1.99K
     c3t106d0      -      -      0      0  21.5K  1.98K
     c3t107d0      -      -      0      0  24.2K  1.98K
     c3t108d0      -      -      0      0  23.1K  1.98K
   raidz1      12.2T   454G      3      0  6.89K  3.94K
     c3t109d0      -      -      0      0  43.7K  2.09K
     c3t110d0      -      -      0      0  42.9K  2.11K
     c3t111d0      -      -      0      0  43.9K  2.11K
     c3t112d0      -      -      0      0  43.8K  2.09K
     c3t113d0      -      -      0      0  47.0K  2.08K
     c3t114d0      -      -      0      0  42.9K  2.08K
     c3t115d0      -      -      0      0  44.1K  2.08K
   raidz1      3.69T  8.93T      3      0  9.42K    610
     c3t87d0       -      -      0      0  43.6K  1.50K
     c3t88d0       -      -      0      0  43.9K  1.48K
     c3t89d0       -      -      0      0  44.2K  1.49K
     c3t90d0       -      -      0      0  43.4K  1.49K
     c3t91d0       -      -      0      0  42.5K  1.48K
     c3t92d0       -      -      0      0  44.5K  1.49K
     c3t93d0       -      -      0      0  44.8K  1.49K
   raidz1      3.64T  8.99T      3      0  9.40K  3.94K
     c3t94d0       -      -      0      0  31.9K  2.09K
     c3t95d0       -      -      0      0  31.6K  2.09K
     c3t96d0       -      -      0      0  30.8K  2.08K
     c3t97d0       -      -      0      0  34.2K  2.08K
     c3t98d0       -      -      0      0  34.4K  2.08K
     c3t99d0       -      -      0      0  35.2K  2.09K
     c3t100d0      -      -      0      0  34.9K  2.08K
------------  -----  -----  -----  -----  -----  -----

Also, the third "raidz" entry shows less "write" in
bandwidth (610).
This is actually the first time it''s a non-zero value.

My last attempt to import it, was using this command:

zpool import -o failmode=panic -f -R /altmount backup

However it did not panic. As I mentioned in the first message, it  
mounts 189 filesystems and hangs on #190. While the command is  
hanging, I can use "zfs mount" to mount filesystems #191 and above  
(only one filesystem does not mount and causes the import procedure to  
hang).

Before trying the command above, I was using only "zpool import  
backup", and the "iostat" output was showing ZERO for the third
raidz
from the list above (not sure if that means something, but it does  
look odd).

I''m really on a dead end here, any help is appreciated.

Thanks,
Eduardo Bragatto.

Cindy Swearingen

2010-May-10 21:28 UTC

head link

[zfs-discuss] zpool import hanging

Hi Eduardo,

Please use the following steps to collect more information:

1. Use the following command to get the PID of the zpool import process,
  like this:

# ps -ef | grep zpool

2. Use the actual <PID of zpool import> found in step 1 in the following
command, like this:

echo "0t<PID of zpool import>::pid2proc|::walk
thread|::findstack" | mdb -k

Then, send the output.

Thanks,

Cindy
On 05/10/10 14:22, Eduardo Bragatto wrote:> On May 10, 2010, at 4:46 PM, John Balestrini wrote:
> 
>> Recently I had a similar issue where the pool wouldn''t import
and
>> attempting to import it would essentially lock the server up. Finally 
>> I used pfexec zpool import -F pool1 and simply let it do it''s
thing.
>> After almost 60 hours the imported finished and all has been well 
>> since (except my backup procedures have improved!).
> 
> Hey John,
> 
> thanks a lot for answering -- I already allowed the "zpool
import"
> command to run from Friday to Monday and it did not complete -- I also 
> made sure to start it using "truss" and literally nothing has
happened
> during that time (the truss output file does not have anything new).
> 
> While the "zpool import" command runs, I don''t see any
CPU or Disk I/O
> usage. "zpool iostat" shows very little I/O too:
> 
> # zpool iostat -v
>                  capacity     operations    bandwidth
> pool           used  avail   read  write   read  write
> ------------  -----  -----  -----  -----  -----  -----
> backup        31.4T  19.1T     11      2  29.5K  11.8K
>   raidz1      11.9T   741G      2      0  3.74K  3.35K
>     c3t102d0      -      -      0      0  23.8K  1.99K
>     c3t103d0      -      -      0      0  23.5K  1.99K
>     c3t104d0      -      -      0      0  23.0K  1.99K
>     c3t105d0      -      -      0      0  21.3K  1.99K
>     c3t106d0      -      -      0      0  21.5K  1.98K
>     c3t107d0      -      -      0      0  24.2K  1.98K
>     c3t108d0      -      -      0      0  23.1K  1.98K
>   raidz1      12.2T   454G      3      0  6.89K  3.94K
>     c3t109d0      -      -      0      0  43.7K  2.09K
>     c3t110d0      -      -      0      0  42.9K  2.11K
>     c3t111d0      -      -      0      0  43.9K  2.11K
>     c3t112d0      -      -      0      0  43.8K  2.09K
>     c3t113d0      -      -      0      0  47.0K  2.08K
>     c3t114d0      -      -      0      0  42.9K  2.08K
>     c3t115d0      -      -      0      0  44.1K  2.08K
>   raidz1      3.69T  8.93T      3      0  9.42K    610
>     c3t87d0       -      -      0      0  43.6K  1.50K
>     c3t88d0       -      -      0      0  43.9K  1.48K
>     c3t89d0       -      -      0      0  44.2K  1.49K
>     c3t90d0       -      -      0      0  43.4K  1.49K
>     c3t91d0       -      -      0      0  42.5K  1.48K
>     c3t92d0       -      -      0      0  44.5K  1.49K
>     c3t93d0       -      -      0      0  44.8K  1.49K
>   raidz1      3.64T  8.99T      3      0  9.40K  3.94K
>     c3t94d0       -      -      0      0  31.9K  2.09K
>     c3t95d0       -      -      0      0  31.6K  2.09K
>     c3t96d0       -      -      0      0  30.8K  2.08K
>     c3t97d0       -      -      0      0  34.2K  2.08K
>     c3t98d0       -      -      0      0  34.4K  2.08K
>     c3t99d0       -      -      0      0  35.2K  2.09K
>     c3t100d0      -      -      0      0  34.9K  2.08K
> ------------  -----  -----  -----  -----  -----  -----
> 
> Also, the third "raidz" entry shows less "write" in
bandwidth (610).
> This is actually the first time it''s a non-zero value.
> 
> My last attempt to import it, was using this command:
> 
> zpool import -o failmode=panic -f -R /altmount backup
> 
> However it did not panic. As I mentioned in the first message, it mounts 
> 189 filesystems and hangs on #190. While the command is hanging, I can 
> use "zfs mount" to mount filesystems #191 and above (only one
filesystem
> does not mount and causes the import procedure to hang).
> 
> Before trying the command above, I was using only "zpool import
backup",
> and the "iostat" output was showing ZERO for the third raidz from
the
> list above (not sure if that means something, but it does look odd).
> 
> I''m really on a dead end here, any help is appreciated.
> 
> Thanks,
> Eduardo Bragatto.
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Eduardo Bragatto

2010-May-10 21:41 UTC

head link

[zfs-discuss] zpool import hanging

On May 10, 2010, at 6:28 PM, Cindy Swearingen wrote:
> Hi Eduardo,
>
> Please use the following steps to collect more information:
>
> 1. Use the following command to get the PID of the zpool import  
> process,
> like this:
>
> # ps -ef | grep zpool
>
> 2. Use the actual <PID of zpool import> found in step 1 in the  
> following
> command, like this:
>
> echo "0t<PID of zpool import>::pid2proc|::walk
thread|::findstack" |
> mdb -k
>
> Then, send the output.
Hi Cindy,

first of all, thank you for taking your time to answer my question.  
Here''s the output of the command you requested:

# echo "0t733::pid2proc|::walk thread|::findstack" | mdb -k
stack pointer for thread ffffffff94e4db40: fffffe8000d3e5b0
[ fffffe8000d3e5b0 _resume_from_idle+0xf8() ]
   fffffe8000d3e5e0 swtch+0x12a()
   fffffe8000d3e600 cv_wait+0x68()
   fffffe8000d3e640 txg_wait_open+0x73()
   fffffe8000d3e670 dmu_tx_wait+0xc5()
   fffffe8000d3e6a0 dmu_tx_assign+0x38()
   fffffe8000d3e700 dmu_free_long_range_impl+0xe6()
   fffffe8000d3e740 dmu_free_long_range+0x65()
   fffffe8000d3e790 zfs_trunc+0x77()
   fffffe8000d3e7e0 zfs_freesp+0x66()
   fffffe8000d3e830 zfs_space+0xa9()
   fffffe8000d3e850 zfs_shim_space+0x15()
   fffffe8000d3e890 fop_space+0x2e()
   fffffe8000d3e910 zfs_replay_truncate+0xa8()
   fffffe8000d3e9b0 zil_replay_log_record+0x1ec()
   fffffe8000d3eab0 zil_parse+0x2ff()
   fffffe8000d3eb30 zil_replay+0xde()
   fffffe8000d3eb50 zfsvfs_setup+0x93()
   fffffe8000d3ebc0 zfs_domount+0x2e4()
   fffffe8000d3ecc0 zfs_mount+0x15d()
   fffffe8000d3ecd0 fsop_mount+0xa()
   fffffe8000d3ee00 domount+0x4d7()
   fffffe8000d3ee80 mount+0x105()
   fffffe8000d3eec0 syscall_ap+0x97()
   fffffe8000d3ef10 _sys_sysenter_post_swapgs+0x14b()

The first message from this thread has three files attached with  
information from truss (tracing zpool import), zdb output and the  
entire list of threads taken from ''echo "::threadlist -v" |
mdb -k''.

Thanks,
Eduardo Bragatto

Eduardo Bragatto

2010-May-23 01:54 UTC

head link

[zfs-discuss] zpool import hanging - SOLVED

Hi,

I have fixed this problem a couple weeks ago, but haven''t found the  
time to report it until now.

Cindy Swearingen was very kind in contacting me to resolve this issue,  
I would like to take this opportunity to express my gratitude to her.

We have not found the root cause of the error. Cindy suspected about  
some known bugs in release 5/09 that have been fixed in 10/09, but we  
could not confirm that as the real cause of the problem. Anyway, I  
went ahead and re-installed the operating system with the latest  
Solaris release (10/09) and "zpool import" worked like there was  
nothing wrong.

I have scrubbed the pool and no errors were found. I''m using the  
system since the OS was re-installed (exactly 10 days now) without any  
problems.

If you get yourself in a situation where "zpool import" hangs and  
never finishes because it hangs while mounting some of the ZFS  
filesystems, make sure you try to import that pool on the newest  
stable system before wasting too much time debugging the problem.

Thanks,
Eduardo Bragatto.

zfs discuss - May 2010 - zpool import hanging

[zfs-discuss] zpool import hanging

[zfs-discuss] zpool import hanging

[zfs-discuss] zpool import hanging

[zfs-discuss] zpool import hanging

[zfs-discuss] zpool import hanging

[zfs-discuss] zpool import hanging

[zfs-discuss] zpool import hanging

[zfs-discuss] zpool import hanging - SOLVED