thr3ads.net - zfs discuss - [zfs-discuss] help with a BIG problem, can''t import my zpool anymore [May 2008]

If this information is useful, please help other people find it:
Share via:

Hernan Freschi

2008-May-23 14:18 UTC

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

Hello, I''m having a big problem here, disastrous maybe. 

I have a zpool consisting of 4x500GB SATA drives, this pool was born on S10U4
and was recently upgraded to snv85 because of iSCSI issues with some initiator.
Last night I was doing housekeeping, deleting old snapshots. One snapshot failed
to delete because it had a dependant clone. So I try to destroy that clone:
Everything went wrong from there.

The deletion was taking an excessively long time (over 40 minutes). zpool status
hungs when I call it. zfs list too. zpool iostat showed disk activity. Other
services non dependant on the pool were running, and the iSCSI this machine was
serving was unbearably slow.

At one point, I lost all iSCSI, SSH, web, and all other services. Ping still
worked. So I go to the server and notice that the fans are running at 100%. I
try to get a console (local VGA+keyboard) but the monitor shows "no
signal". No disk activity seemed to be happening at the moment. So, I do
the standard procedure (reboot). Solaris boots but stops at "hostname:
blah". I see disk activity from the pool disks, so I let it boot. 30
minutes later, still didn''t finish. I thought (correctly) that the
system was waiting to mount the ZFS before booting, but for some reason it
doesn''t. I call it the day and let the machine do its thing.

8 hours later I return. CPU is cold, disks are idle and... solaris stays at the
same "hostname: blah". Time to reboot again, this time in failsafe.
zpool import shows that the devices are detected and online. I delete
/etc/zfs/zpool.cache and reboot. Solaris starts normally with all services
running, but of course no zfs. zpool import shows the available pool, no errors.
I do zpool import -f pool... 20 minutes later I''m still waiting for the
pool to mount. zpool iostat shows activity:

               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
tera        1.51T   312G    274      0  1.61M  2.91K
tera        1.51T   312G    308      0  1.82M      0
tera        1.51T   312G    392      0  2.31M      0
tera        1.51T   312G    468      0  2.75M      0

but the mountpoint /tera is still not populated (and zpool import still
doesn''t exit).

zpool status shows:

  pool: tera
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tera        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
            c3d0    ONLINE       0     0     0
            c4d0    ONLINE       0     0     0

errors: No known data errors

What''s going on? Why is taking so long to import?

Thanks in advance,
Hernan
 
 
This message posted from opensolaris.org

Hernan Freschi

2008-May-23 17:08 UTC

head link

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

I got more info. I can run zpool history and this is what I get:

2008-05-23.00:29:40 zfs destroy tera/sparsevol at 20080522
2008-05-23.00:29:47 [internal destroy_begin_sync txg:3890809] dataset = 152
2008-05-23.01:28:38 [internal destroy_begin_sync txg:3891101] dataset = 152
2008-05-23.07:01:36 zpool import -f tera
2008-05-23.07:01:40 [internal destroy_begin_sync txg:3891106] dataset = 152
2008-05-23.10:52:56 zpool import -f tera
2008-05-23.10:52:58 [internal destroy_begin_sync txg:3891112] dataset = 152
2008-05-23.12:17:49 [internal destroy_begin_sync txg:3891114] dataset = 152
2008-05-23.12:27:48 zpool import -f tera
2008-05-23.12:27:50 [internal destroy_begin_sync txg:3891120] dataset = 152
2008-05-23.13:03:07 [internal destroy_begin_sync txg:3891122] dataset = 152
2008-05-23.13:56:52 zpool import -f tera
2008-05-23.13:56:54 [internal destroy_begin_sync txg:3891128] dataset = 152

apparently, it starts destroying dataset #152, which is the parent snapshot of
the clone I issued the command to destroy. Not sure how it works, but I ordered
the deletion of the CLONE, not the snapshot (which I was going to destroy
anyway).

The question is still, why does it hang the machine? Why can''t I access
the filesystems? Isn''t it supposed to import the zpool, mount the ZFSs
and then do the destroy, in background?
 
 
This message posted from opensolaris.org

Hernan Freschi

2008-May-23 23:03 UTC

head link

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

I let it run for about 4 hours. when I returned, still the same: I can ping the
machine but I can''t SSH to it, or use the console. Please, I need
urgent help with this issue!
 
 
This message posted from opensolaris.org

Hernan Freschi

2008-May-24 00:42 UTC

head link

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

I let it run while watching TOP, and this is what I got just before it hung.
Look at free mem. Is this memory allocated to the kernel? can I allow the kernel
to swap?

last pid:  7126;  load avg:  3.36,  1.78,  1.11;       up 0+01:01:11            
21:16:49
88 processes: 78 sleeping, 9 running, 1 on cpu
CPU states: 22.4% idle,  0.4% user, 77.2% kernel,  0.0% iowait,  0.0% swap
Memory: 3072M phys mem, 31M free mem, 2055M swap, 1993M free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  7126 root       9  58    0   45M 4188K run      0:00  0.71% java
  4821 root       1  59    0 5124K 1724K run      0:03  0.46% zfs
  5096 root       1  59    0 5124K 1724K run      0:03  0.45% zfs
  2470 root       1  59    0 4956K 1660K sleep    0:06  0.45% zfs
 
 
This message posted from opensolaris.org

Rob at Logan.com

2008-May-24 01:10 UTC

head link

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

> Memory: 3072M phys mem, 31M free mem, 2055M swap, 1993M free swap
perhaps this might help..
mkfile -n 4g /usr/swap
swap   -a    /usr/swap

http://blogs.sun.com/realneel/entry/zfs_arc_statistics

			Rob

Hernan Freschi

2008-May-24 01:38 UTC

head link

[zfs-discuss] help with a BIG problem,

I forgot to post arcstat.pl''s output:

    Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
22:32:37  556K  525K     94  515K   94    9K   98  515K   97     1G    1G
22:32:38    63    63    100    63  100     0    0    63  100     1G    1G
22:32:39    74    74    100    74  100     0    0    74  100     1G    1G
22:32:40    76    76    100    76  100     0    0    76  100     1G    1G
<State Changed>
22:32:41    75    75    100    75  100     0    0    75  100     1G    1G
22:32:42    77    77    100    77  100     0    0    77  100     1G    1G
22:32:43    72    72    100    72  100     0    0    72  100     1G    1G
22:32:44    80    80    100    80  100     0    0    80  100     1G    1G
<State Changed>
22:32:45    98    98    100    98  100     0    0    98  100     1G    1G

sometimes "c" is 2G.

I tried the mkfile and swap, but I get:
[root at solaris:/]# mkfile -n 4g /export/swap
[root at solaris:/]# swap -a /export/swap
"/export/swap" may contain holes - can''t swap on it.

/export is the only place where I have enough free space. I could add another
drive if needed.
 
 
This message posted from opensolaris.org

Victor Latushkin

2008-May-24 01:50 UTC

head link

[zfs-discuss] help with a BIG problem,

Hernan Freschi ?????:> I tried the mkfile and swap, but I get:
> [root at solaris:/]# mkfile -n 4g /export/swap
> [root at solaris:/]# swap -a /export/swap
> "/export/swap" may contain holes - can''t swap on it.
You should not use -n for creating files for additional swap. This is 
mentioned in the mfile man page.


Wbr,
Victor

Robin Guo

2008-May-24 01:52 UTC

head link

[zfs-discuss] help with a BIG problem,

Hi, Herman
 
  You may not use ''-n'' to Makefile, that''ll lead swap
comlain.

Hernan Freschi wrote:> I forgot to post arcstat.pl''s output:
>
>     Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
> 22:32:37  556K  525K     94  515K   94    9K   98  515K   97     1G    1G
> 22:32:38    63    63    100    63  100     0    0    63  100     1G    1G
> 22:32:39    74    74    100    74  100     0    0    74  100     1G    1G
> 22:32:40    76    76    100    76  100     0    0    76  100     1G    1G
> <State Changed>
> 22:32:41    75    75    100    75  100     0    0    75  100     1G    1G
> 22:32:42    77    77    100    77  100     0    0    77  100     1G    1G
> 22:32:43    72    72    100    72  100     0    0    72  100     1G    1G
> 22:32:44    80    80    100    80  100     0    0    80  100     1G    1G
> <State Changed>
> 22:32:45    98    98    100    98  100     0    0    98  100     1G    1G
>
> sometimes "c" is 2G.
>
> I tried the mkfile and swap, but I get:
> [root at solaris:/]# mkfile -n 4g /export/swap
> [root at solaris:/]# swap -a /export/swap
> "/export/swap" may contain holes - can''t swap on it.
>
> /export is the only place where I have enough free space. I could add
another drive if needed.
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> zfs-discuss mailing list
> zfs-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

Hernan Freschi

2008-May-24 01:55 UTC

head link

[zfs-discuss] help with a BIG problem,

oops. replied too fast.
Ran without -n, and space was added successfully... but it didn''t work.
It died out of memory again.
 
 
This message posted from opensolaris.org

Mattias Pantzare

2008-May-24 05:43 UTC

head link

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

2008/5/24 Hernan Freschi <hjf at hjf.com.ar>:> I let it run while watching TOP, and this is what I got just before it
hung. Look at free mem. Is this memory allocated to the kernel? can I allow the
kernel to swap?
No, the kernel will not use swap for this.

But most of the memory used by the kernel is probably in caches that
should release memory when needed.

Is this a 32 or 64 bit system?

ZFS will sometimes use all kernel address space on a 32-bit system.

You can give the kernel more address space with this command (only on
32-bit system):
eeprom kernelbase=0x50000000

Hernan Freschi

2008-May-24 13:20 UTC

head link

[zfs-discuss] help with a BIG problem,

No, this is a 64-bit system (athlon64) with 64-bit kernel of course.
 
 
This message posted from opensolaris.org

Hernan Freschi

2008-May-24 14:28 UTC

head link

[zfs-discuss] help with a BIG problem,

So, I think I''ve narrowed it down to two things:

* ZFS tries to destroy the dataset every time it''s called because the
last time it didn''t finish destroying
* In this process, ZFS makes the kernel run out of memory and die

So I thought of two options, but I''m not sure if I''m right:

Option 1: "Destroy" is an atomic operation

If destroy is atomic, then I guess what it''s trying to do is look up
all the blocks that need to be deleted/unlinked/released/freed (not sure which
is the word). After it has that list, it will write it to the ZIL (remember this
is just what I suppose, correct me if I''m wrong!) and start to
physically delete the blocks, until the operation is done and it''s
finally committed.

If this is the case, then the process will be restarted from scratch every time
the system is rebooted. But I read that apparently in previous versions,
rebooting while destroying a clone that it''s taking too long makes the
clone reappear intact next time. This, and the fact that zpool iostat show only
reads and no or very few writes is what lead me to think this is how it works.

So if this is the case, I''d like to abort this destroy. After importing
the pool, I will have everything as it was and maybe I can delete snapshots
before the clone''s parent snapshot and maybe this will speed up the
destroy process, or just leave the clone.

Option 2: Destroy is "not" atomic

By this I don''t mean that it''s not "atomic", as in
"if the operation is canceled, it will finish in an incomplete state",
but as in "if the system is rebooted, the operation will RESUME at the
point it was where it died.

If this is the case, maybe I can write a script to reboot the computer in a
fixed amount of time, and run it on boot:

zpool import xx &
sleep 20 seconds
rm /etc/zfs/zpool.cache
sleep 1800 seconds
reboot

This will work under the assumption that the list of blocks to be deleted is
flushed to the ZIL or something before boot, to allow the operation to restart
at the same point. This is a very nasty hack but it may do the trick only in a
very slow fashion: zpool iostat shows 1MB/s read when it''s doing the
destroy. The dataset in question has 450GB which means that the operation will
take 5 days to finish if it needs to read the whole dataset to destroy it, or 7
days if it also needs to go through the other snapshots (600GB total).

So, my only viable option seems to be to "abort" this. How can I do
this? disable the ZIL, maybe? Delete the ZIL? scrub after this?

Thanks,
Hern?n
 
 
This message posted from opensolaris.org

Robert Milkowski

2008-May-25 17:02 UTC

head link

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

Hello Hernan,

Friday, May 23, 2008, 6:08:34 PM, you wrote:


HF> The question is still, why does it hang the machine? Why can''t I
HF> access the filesystems? Isn''t it supposed to import the zpool,
HF> mount the ZFSs and then do the destroy, in background?
HF>  


Try to limit ARC size to 1/4 of your physical memory size via
/etc/system and reboot the box. Then wait and see if it finishes and
if it still freezes the box.

-- 
Best regards,
 Robert Milkowski                            mailto:milek at task.gda.pl
                                       http://milek.blogspot.com

Hernan Freschi

2008-May-25 18:25 UTC

head link

[zfs-discuss] help with a BIG problem,

Hello, thanks for your suggestion. I tried settin zfs_arc_max to 0x30000000
(768MB, out of 3GB). The system ran for almost 45 minutes before it froze.

Here''s an interesting piece of arcstat.pl, which I noticed just as it
was pasing by:


    Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
15:17:41   152   152    100   152  100     0    0   152  100     2G  805M
15:17:42   139   139    100   139  100     0    0   139  100     2G  805M
<State Changed>
15:17:43   188   188    100   188  100     0    0   188  100     2G  805M
15:17:44   150   150    100   150  100     0    0   150  100     2G  805M
15:17:45   151   151    100   151  100     0    0   151  100     2G  805M
15:17:46   149   149    100   149  100     0    0   149  100     2G  805M
15:17:47   161   161    100   161  100     0    0   161  100     2G  805M
15:17:48   153   153    100   153  100     0    0   153  100     2G  219M
15:17:49   140   140    100   140  100     0    0   140  100     2G  100M
15:17:50   143   143    100   143  100     0    0   143  100     2G  100M
15:17:51   145   145    100   145  100     0    0   145  100     2G  100M

notice how it suddenly drops "c" from 805M to 100M in 2 seconds. Also
"arcsz" is 2G, which is weird because it shouldn''t grow
beyond 0x30000000 (768M), right? And it''s also weird to also get 100%
MISS ratio

Here''s "top" just before it froze:

last pid:  5253;  load avg:  0.47,  0.37,  0.33;       up 0+00:44:53    
15:20:14
77 processes: 75 sleeping, 1 running, 1 on cpu
CPU states: 57.5% idle,  1.0% user, 41.6% kernel,  0.0% iowait,  0.0% swap
Memory: 3072M phys mem, 28M free mem, 2055M swap, 1994M free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  1248 root       1  59    0 5940K 2736K sleep    0:14  0.82% arcstat.pl
  5206 root       9  59    0   47M 4892K sleep    0:01  0.35% java
   855 root       2  59    0 5076K 1588K sleep    0:09  0.33% apcupsd
  3134 root       1  59    0 5152K 1764K sleep    0:02  0.26% zpool
  1261 root       1  59    0 4104K  588K cpu      0:03  0.22% top
  3125 root       1  59    0 6352K 1536K sleep    0:00  0.06% sshd
  1151 root       1  59    0 6352K 1504K sleep    0:00  0.05% sshd
    62 root       1  59    0 1832K  540K sleep    0:01  0.05% powernowd
   849 root       1  59    0   11M 1100K sleep    0:00  0.05% snmpd
   465 proxy      1  59    0   15M 2196K run      0:00  0.04% squid
   271 daemon     1  59    0 6652K  264K sleep    0:00  0.03% rcapd
  1252 root       1  59    0 6352K 1292K sleep    0:00  0.02% sshd
     7 root      14  59    0   12M 5412K sleep    0:04  0.02% svc.startd
   880 root       1  59    0 6276K 2076K sleep    0:00  0.02% httpd
   847 root       1  59    0 2436K 1148K sleep    0:00  0.02% dhcpagent

and finally, zpool iostat 1:

tera        1.51T   312G    207      0  1.22M      0
tera        1.51T   312G    141      0   854K      0
tera        1.51T   312G     70      0   427K      0
tera        1.51T   312G    204      0  1.20M      0
tera        1.51T   312G    187      0  1.10M      0
tera        1.51T   312G    179      0  1.05M      0
tera        1.51T   312G    120      0   743K      0
tera        1.51T   312G     94      0   580K      0
tera        1.51T   312G     77      0   471K      0
tera        1.51T   312G    115      0   696K      0

Which shows a very poor read performance, for a 4xSATA2 (this array usually
saturates my gigabit ethernet). And it''s not that the kernel is
processing that much data because the CPU is 57% idle and I THINK powernowd is
making it run at 900MHz.

Hern?n
 
 
This message posted from opensolaris.org

Hernan Freschi

2008-May-26 04:44 UTC

head link

[zfs-discuss] help with a BIG problem,

so, anyone have any ideas? I''m obviously hitting a bug here.
I''m happy to help anyone solve this, I DESPERATELY need this data. I
can post dtrace results if you send them to me. I wish I could solve this
myself, but I''m not a C progammer, I don''t know how to program
filesystems, much less an advanced filesystem as ZFS.

Also, this is not a weird configuration, or alpha software, or a hack
I''m making. This is a "production" server with proven
software, on which I executed a regular maintenance operation, just a clone
destroy, which should take just a few seconds, and I found so many wrong things:

* a clone destroy takes more than a few seconds or minutes
* to destroy the clone it tries to read, apparently, through the whole dataset
* while reading through the whole dataset, it does so at 1MB/s (dataset size is
450GB without snapshots)
* it loads these reads to the ARC, filling it up and freezing the machine
* the ARC doesn''t work in this case as all reads have 100% miss ratio
* it doesn''t respect the ARC maximum size.
* after a reboot, it won''t work on the background. it will keep the
pool locked preventing mounts
* disabling the ZIL doesn''t prevent zpool from trying to complete the
destroy

this is very very wrong. 

I don''t know anything about dtrace. I wrote this, I don''t
really know if this is correct. I hope it''s of use to someone.

#!/usr/sbin/dtrace -s
fbt:zfs::
{
   @count_table[probefunc]=count();
}
tick-300s {exit(0);}

The output for the 5 minute run:

  dnode_buf_pageout                                                 6
  zfs_sync                                                         12
  dmu_zfetch_rele                                                  18
  dnode_destroy                                                    18
  bplist_close                                                     78
  bplist_open                                                      78
  bplist_sync                                                      78
  dbuf_hold                                                        78
  dmu_buf_get_user                                                 78
  dmu_object_info                                                  78
  dmu_object_info_from_dnode                                       78
  dmu_zfetch                                                       78
  dmu_zfetch_find                                                  78
  dnode_hold                                                       78
  dnode_hold_impl                                                  78
  dsl_pool_sync                                                    78
  dsl_pool_zil_clean                                               78
  spa_async_dispatch                                               78
  spa_errlog_sync                                                  78
  spa_scrub_resume                                                 78
  spa_scrub_suspend                                                78
  spa_sync                                                         78
  spa_sync_config_object                                           78
  txg_quiesce                                                      78
  uberblock_update                                                 78
  vdev_config_sync                                                 78
  dmu_tx_commit                                                   156
  dmu_tx_create_assigned                                          156
  dmu_tx_create_dd                                                156
  spa_sync_aux_dev                                                156
  txg_list_empty                                                  234
  arc_do_user_evicts                                              368
  txg_list_remove                                                 390
  txg_thread_wait                                                 390
  add_reference                                                   486
  arc_buf_add_ref                                                 486
  spa_config_enter                                                566
  spa_config_exit                                                 566
  zio_null                                                        566
  zio_root                                                        566
  zio_wait                                                        566
  vdev_queue_agg_io_done                                         1954
  dbuf_hold_level                                               62208
  dmu_tx_check_ioerr                                            62208
  dnode_next_offset                                             62208
  dnode_next_offset_level                                       62208
  dbuf_whichblock                                               62286
  dbuf_do_evict                                                 62436
  dbuf_hash_remove                                              62436
  dbuf_clear                                                    62440
  dbuf_destroy                                                  62440
  dbuf_evict                                                    62440
  dbuf_evict_user                                               62440
  arc_space_return                                              62462
  arc_cksum_compute                                             62510
  arc_read_done                                                 62510
  arc_set_callback                                              62510
  dbuf_read_done                                                62510
  dbuf_set_data                                                 62510
  fletcher_4_native                                             62510
  lzjb_decompress                                               62510
  vdev_mirror_child_done                                        62510
  vdev_mirror_io_done                                           62510
  vdev_mirror_map_free                                          62510
  vdev_raidz_io_done                                            62510
  vdev_raidz_map_free                                           62510
  zio_checksum_error                                            62510
  zio_checksum_verified                                         62510
  zio_decompress_data                                           62510
  zio_read_decompress                                           62510
  dnode_rele                                                    62514
  arc_buf_destroy                                               62694
  arc_cksum_verify                                              62694
  arc_evict                                                     62694
  arc_adapt                                                     62696
  arc_buf_alloc                                                 62696
  arc_evict_needed                                              62696
  arc_get_data_buf                                              62696
  arc_read                                                      62696
  buf_hash_insert                                               62696
  dbuf_create                                                   62696
  dbuf_findbp                                                   62696
  dbuf_hash_insert                                              62696
  dbuf_read_impl                                                62696
  hdr_cons                                                      62696
  spa_get_failmode                                              62696
  vdev_mirror_child_select                                      62696
  vdev_mirror_io_start                                          62696
  vdev_mirror_map_alloc                                         62696
  vdev_raidz_io_start                                           62696
  vdev_raidz_map_alloc                                          62696
  zio_read                                                      62696
  zio_read_init                                                 62696
  arc_space_consume                                             62698
  spa_get_random                                                62774
  arc_access                                                    62996
  arc_buf_freeze                                                63002
  arc_buf_remove_ref                                            63002
  remove_reference                                              63002
  arc_reclaim_needed                                            63080
  zio_ready                                                     63262
  zio_wait_for_children_ready                                   63262
  arc_change_state                                             125204
  buf_hash_find                                                125206
  vdev_lookup_top                                              125392
  dbuf_add_ref                                                 125470
  arc_released                                                 125512
  vdev_cache_fill                                              153525
  vdev_cache_evict                                             154090
  zio_vdev_io_reissue                                          155192
  vdev_disk_io_intr                                            163696
  zio_interrupt                                                163696
  vdev_disk_io_done                                            163708
  vdev_queue_io_done                                           163708
  vdev_error_inject                                            163924
  vdev_queue_io_remove                                         166100
  vdev_cache_allocate                                          166298
  vdev_queue_io_add                                            166434
  vdev_cache_hit                                               175179
  zio_vdev_io_bypass                                           179874
  dbuf_find                                                    187190
  dbuf_hold_impl                                               187190
  dbuf_read                                                    187190
  vdev_raidz_child_done                                        187536
  dbuf_update_data                                             249700
  zio_buf_free                                                 281068
  zio_buf_alloc                                                281438
  zio_vdev_io_done                                             288728
  dbuf_rele                                                    312214
  dbuf_hash                                                    312322
  vdev_dtl_contains                                            313480
  buf_hash                                                     314084
  vdev_queue_io                                                323580
  vdev_stat_update                                             405526
  zio_vdev_child_io                                            406828
  vdev_readable                                                414708
  vdev_queue_io_to_issue                                       437894
  zio_should_retry                                             468036
  zio_vdev_io_assess                                           468036
  zio_done                                                     468601
  zio_assess                                                   468602
  zio_clear_transform_stack                                    468602
  zio_nowait                                                   469524
  zio_create                                                   470090
  vdev_is_dead                                                 477404
  vdev_cache_read                                              499324
  vdev_disk_io_start                                           499324
  zio_notify_parent                                            530730
  zio_pop_transform                                            531112
  zio_push_transform                                           532786
  zio_wait_for_children_done                                   593994
  spa_state                                                    624716
  zio_vdev_io_start                                            624716
  zio_wait_for_children                                        657256
  vdev_queue_deadline_compare                                  906304
  zio_execute                                                 1247615
  vdev_queue_offset_compare                                   1694848
  vdev_cache_lastused_compare                                 2472113
  vdev_cache_offset_compare                                   3076216
 
 
This message posted from opensolaris.org

Rob at Logan.com

2008-May-26 14:14 UTC

head link

[zfs-discuss] help with a BIG problem,

it would be helpful to know what''s filling your ram..
echo ::kmastat | mdb -k

Rob at Logan.com

2008-May-26 14:32 UTC

head link

[zfs-discuss] help with a BIG problem,

>   vdev_cache_offset_compare                                   3076216
vdev_cache changed to metadata only in snv_70
http://bugs.opensolaris.org/view_bug.do?bug_id=6437054

but without knowing what''s using the kmem
we are kinda guessing.

			Rob

Hernan Freschi

2008-May-26 17:05 UTC

head link

[zfs-discuss] help with a BIG problem,

I''m using snv_85 upgraded from S10U4. 

Here is echo ::kmastat | mdb -k:

After 4 minutes uptime http://pastebin.com/f112d1cae (when it''s OK)
After 40 minutes: http://pastebin.com/f7076d96f (when free memory starts
dropping)
 
 
This message posted from opensolaris.org

Hernan Freschi

2008-May-27 00:39 UTC

head link

[zfs-discuss] help with a BIG problem,

is there a way to know the progress of the process? 

I obviously don''t really have other choice than leave it do whatever
it''s doing, and hope that it will finish soon (according to my
calculations, within a week).

So far the only thing I''m doing is allowing it to work and reboot every
20 minutes before memory fills.
 
 
This message posted from opensolaris.org

zfs discuss - May 2008 - help with a BIG problem, can''t import my zpool anymore

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem, can''t import my zpool anymore

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,

[zfs-discuss] help with a BIG problem,