kristof
2007-Dec-13 20:44 UTC
[zfs-discuss] zpool version 3 & Uberblock version 9 , zpool upgrade only half succeeded?
We are currently experiencing a very huge perfomance drop on our zfs storage server. We have 2 pools, pool 1 stor is a raidz out of 7 iscsi nodes, home is a local mirror pool. Recently we had some issues with one of the storagenodes, because of that the pool was degraded. Since we did not succeed in bringing this storagenode back online (on zfs level) we upgraded our nashead from opensolaris b57 to b77. After upgrade we succesfully resilvered the pool (resilver took 1 week! -> 14 TB). Finally we upgraded the pool to version 9 (comming from version 3). Now zpool is healty again, but perfomance realy s*cks. Accessing older data takes way to much time. Doing "dtruss -a find ." in a zfs filesystem on this b77 server is extremely slow, while it is fast in our backup location were we are still using opensolaris b57 and zpool version 3. Writing new data seems normal, we don''t see huge issues here. The real problem is do ls, rm or find in filesystems with lots of files (+50000, not in 1 directory spread in multiple subfolders) Today I found that not only zpool upgrade exist, but also zfs upgrade, most filesystems are still version 1 while some new are already version 3. Running zdb we also saw there is a mismatchs in version information, our storage pool is list as version 3 while the uberblock is at version 9, when we run zpool upgrade, it tells us all pools are upgraded to latest version. below the zdb output: zdb stor ????version=3 ????name=''stor'' ????state=0 ????txg=6559447 ????pool_guid=14464037545511218493 ????hostid=341941495 ????hostname=''fileserver011'' ????vdev_tree ????????type=''root'' ????????id=0 ????????guid=14464037545511218493 ????????children[0] ????????????????type=''raidz'' ????????????????id=0 ????????????????guid=179558698360846845 ????????????????nparity=1 ????????????????metaslab_array=13 ????????????????metaslab_shift=37 ????????????????ashift=9 ????????????????asize=20914156863488 ????????????????is_log=0 ????????????????children[0] ????????????????????????type=''disk'' ????????????????????????id=0 ????????????????????????guid=640233961847538260 ????????????????????????path=''/dev/dsk/c2t3d0s0'' ????????????????????????devid=''id1,sd at t49455400000000000000000001000000cf1900000e000000/a'' ????????????????????????phys_path=''/iscsi/disk at 0000iqn.2006-03.com.domain-SAN10001,0:a'' ????????????????????????whole_disk=1 ????????????????????????DTL=36 ????????????????children[1] ????????????????????????type=''disk'' ????????????????????????id=1 ????????????????????????guid=7833573669820754721 ????????????????????????path=''/dev/dsk/c2t4d0s0'' ????????????????????????devid=''id1,sd at t49455400000000000000000001000000591a00000e000000/a'' ????????????????????????phys_path=''/iscsi/disk at 0000iqn.2006-03.com.domain-SAN20001,0:a'' ????????????????????????whole_disk=1 ????????????????????????DTL=22 ????????????????children[2] ????????????????????????type=''disk'' ????????????????????????id=2 ????????????????????????guid=13685988517147825972 ????????????????????????path=''/dev/dsk/c2t5d0s0'' ????????????????????????devid=''id1,sd at t494554000000000000000000010000004c1b00000e000000/a'' ????????????????????????phys_path=''/iscsi/disk at 0000iqn.2006-03.com.domain-SAN30001,0:a'' ????????????????????????whole_disk=1 ????????????????????????DTL=17 ????????????????children[3] ????????????????????????type=''disk'' ????????????????????????id=3 ????????????????????????guid=13514021245008793227 ????????????????????????path=''/dev/dsk/c2t6d0s0'' ????????????????????????devid=''id1,sd at t49455400000000000000000001000000441d00000e000000/a'' ????????????????????????phys_path=''/iscsi/disk at 0000iqn.2006-03.com.domain-SAN40001,0:a'' ????????????????????????whole_disk=1 ????????????????????????DTL=21 ????????????????children[4] ????????????????????????type=''disk'' ????????????????????????id=4 ????????????????????????guid=15871506866153751690 ????????????????????????path=''/dev/dsk/c2t9d0s0'' ????????????????????????devid=''id1,sd at t494554000000000000000000010000003d1a00000e000000/a'' ????????????????????????phys_path=''/iscsi/disk at 0000iqn.2006-03.com.domain-SAN050001,0:a'' ????????????????????????whole_disk=1 ????????????????????????DTL=20 ????????????????children[5] ????????????????????????type=''disk'' ????????????????????????id=5 ????????????????????????guid=11392907262189654902 ????????????????????????path=''/dev/dsk/c2t7d0s0'' ????????????????????????devid=''id1,sd at t49455400000000000000000001000000ce1a00000e000000/a'' ????????????????????????phys_path=''/iscsi/disk at 0000iqn.2006-03.com.domain-SAN60001,0:a'' ????????????????????????whole_disk=1 ????????????????????????DTL=19 ????????????????children[6] ????????????????????????type=''disk'' ????????????????????????id=6 ????????????????????????guid=8472117762643335828 ????????????????????????path=''/dev/dsk/c2t8d0s0'' ????????????????????????devid=''id1,sd at t49455400000000000000000001000000f11900000e000000/a'' ????????????????????????phys_path=''/iscsi/disk at 0000iqn.2006-03.com.domain-SAN70001,0:a'' ????????????????????????whole_disk=1 ????????????????????????DTL=18 Uberblock ????????magic = 0000000000bab10c ????????version = 9 ????????txg = 6692849 ????????guid_sum = 12266969233845513474 ????????timestamp = 1197546530 UTC = Thu Dec 13 12:48:50 2007 fileserver If we compare with zpool home (this pool was craeted after installing b77): zdb home ????version=9 ????name=''home'' ????state=0 ????txg=4 ????pool_guid=11064283759455309967 ????hostid=341941495 ????hostname=''fileserver011'' ????vdev_tree ????????type=''root'' ????????id=0 ????????guid=11064283759455309967 ????????children[0] ????????????????type=''mirror'' ????????????????id=0 ????????????????guid=12887358012104249684 ????????????????metaslab_array=14 ????????????????metaslab_shift=31 ????????????????ashift=9 ????????????????asize=243784220672 ????????????????is_log=0 ????????????????children[0] ????????????????????????type=''disk'' ????????????????????????id=0 ????????????????????????guid=11054487171079770402 ????????????????????????path=''/dev/dsk/c1t0d0s7'' ????????????????????????devid=''id1,sd at AWDC_WD3200YS-01PGB0=_____WD-WCAPD1898616/h'' ????????????????????????phys_path=''/pci at 0,0/pci10f1,2892 at 7/disk at 0,0:h'' ????????????????????????whole_disk=0 ????????????????children[1] ????????????????????????type=''disk'' ????????????????????????id=1 ????????????????????????guid=5037155585995287391 ????????????????????????path=''/dev/dsk/c1t1d0s7'' ????????????????????????devid=''id1,sd at AWDC_WD3200YS-01PGB0=_____WD-WCAPD2111469/h'' ????????????????????????phys_path=''/pci at 0,0/pci10f1,2892 at 7/disk at 1,0:h'' ????????????????????????whole_disk=0 Uberblock ????????magic = 0000000000bab10c ????????version = 9 ????????txg = 239823 ????????guid_sum = 3149796381215514212 ????????timestamp = 1197541912 UTC = Thu Dec 13 11:31:52 2007 Dataset mos [META], ID 0, cr_txg 4, 99.0K, 18 objects Dataset home [ZPL], ID 5, cr_txg 4, 18.0K, 4 objects Traversing all blocks to verify checksums and verify nothing leaked ... ????????No leaks (block sum matches space maps exactly) ????????bp count: 30 ????????bp logical: 358912 avg: 11963 ????????bp physical: 44544 avg: 1484 compression: 8.06 ????????bp allocated: 124416 avg: 4147 compression: 2.88 ????????SPA allocated: 124416 used: 0.00% ??????????????????????capacity operations bandwidth ---- errors ---- description used avail read write read write read write cksum home 122K 226G 63 0 81.5K 0 0 0 0 ??mirror 122K 226G 63 0 81.5K 0 0 0 0 ????/dev/dsk/c1t0d0s7 576 0 706K 0 0 0 0 ????/dev/dsk/c1t1d0s7 561 0 679K 0 0 0 0 dtruss output ID/LWP RELATIVE ELAPSD CPU SYSCALL(args) = return 23762/1: 20215 613147 103 getdents64(0x4, 0xFEDC0000, 0x2000) = 80 0 ./blabla/blabla as you can see getdents64 takes 613147 miliseconds, while it takes only 10 ms on our failover location. Any idea what is happening to me? Thanks in advance for your reply! Krdoor This message posted from opensolaris.org