I have a build 62 system with a zone that NFS mounts an ZFS filesystem.>From the zone, I keep seeing issues with .nfsXXXX files remaining inotherwise empty directories preventing their deletion. The files appear to be immediately replaced when they are deleted. Is this an NFS or a ZFS issue? Ian
Ian Collins wrote:> I have a build 62 system with a zone that NFS mounts an ZFS filesystem. > >>From the zone, I keep seeing issues with .nfsXXXX files remaining in > otherwise empty directories preventing their deletion. The files appear > to be immediately replaced when they are deleted. > > Is this an NFS or a ZFS issue?It is NFS that is doing that. It happens when a process on the NFS client still has the file open. fuser(1) is your friend here. -- Darren J Moffat
max at bruningsystems.com
2007-Sep-17 11:42 UTC
[zfs-discuss] question about uberblock blkptr
Hi All, I have modified mdb so that I can examine data structures on disk using ::print. This works fine for disks containing ufs file systems. It also works for zfs file systems, but... I use the dva block number from the uberblock_t to print what is at the block on disk. The problem I am having is that I can not figure out what (if any) structure to use. All of the xxx_phys_t types that I try do not look right. So, the question is, just what is the structure that the uberblock_t dva''s refer to on the disk? Here is an example: First, I use zdb to get the dva for the rootbp (should match the value in the uberblock_t(?)). # zdb -dddd usbhard | grep -i dva Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 DMU objset] 400L/200P DVA[0]=<0:111f79000:200> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE contiguous birth=621838 fill=167 cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb bp = [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 first block: [L0 ZIL intent log] 9000L/9000P DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1 ^C # Then I run my modified mdb on the vdev containing the "usbhard" pool # ./mdb /dev/rdsk/c4t0d0s0 I am using the DVA[0} for the META data set above. Note that I have tried all of the xxx_phys_t structures that I can find in zfs source, but none of them look right. Here is example output dumping the data as a objset_phys_t. (The shift by 9 and adding 400000 is from the zfs on-disk format paper, I have tried without the addition, without the shift, in all combinations, but the output still does not make sense). > (111f79000<<9)+400000::print zfs`objset_phys_t { os_meta_dnode = { dn_type = 0x4f dn_indblkshift = 0x75 dn_nlevels = 0x82 dn_nblkptr = 0x25 dn_bonustype = 0x47 dn_checksum = 0x52 dn_compress = 0x1f dn_flags = 0x82 dn_datablkszsec = 0x5e13 dn_bonuslen = 0x63c1 dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ] dn_maxblkid = 0x20a34fa97f3ff2a6 dn_used = 0xac2ea261cef045ff dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, 0x315efac9cb693387, 0x2d56c54db5da75bf ] dn_blkptr = [ { blk_dva = [ { dva_word = [ 0x87c9ed7672454887, 0x760f569622246efe ] } { dva_word = [ 0xce26ac20a6a5315c, 0x38802e5d7cce495f ] } { dva_word = [ 0x9241150676798b95, 0x9c6985f95335742c ] } ] None of this looks believable. So, just what is the rootbp in the uberblock_t referring to? thanks, max
Ian Collins wrote:> I have a build 62 system with a zone that NFS mounts an ZFS filesystem. > >>From the zone, I keep seeing issues with .nfsXXXX files remaining in > otherwise empty directories preventing their deletion. The files appear > to be immediately replaced when they are deleted. > > Is this an NFS or a ZFS issue?This is the NFS client keeping unlinked but open files around. You need to find out what process has the files open (perhaps with "fuser -c") and persuade them to close the files before you can unmount gracefully. You may also use "umount -f" if you don''t care what happens to the processes. Rob T
On 9/17/07, Darren J Moffat <darrenm at opensolaris.org> wrote:> It is NFS that is doing that. It happens when a process on the NFS > client still has the file open. fuser(1) is your friend here.... and if fuser doesn''t tell you what you need to know, you can use lsof ( http://freshmeat.net/projects/lsof/ I usually just get it precompiled from http://www.sunfreeware.com/ ). I have found lsof to be more reliable that fuser in listing what has a file open. -- Paul Kraus
Ian Collins wrote:> I have a build 62 system with a zone that NFS mounts an ZFS filesystem. > >>From the zone, I keep seeing issues with .nfsXXXX files remaining in > otherwise empty directories preventing their deletion. The files appear > to be immediately replaced when they are deleted. > > Is this an NFS or a ZFS issue?That is how NFS deals with files that are unlinked while open. In a local file system, unlinked while open files will simply not be deleted until the close. For remote file systems, like NFS, you have to remove the file from the namespace, but not remove the file''s content. The client will do this by creating .nfs???? files. A more detailed explanation is at: http://nfs.sourceforge.net/ -- richard
Hey Max - Check out the on-disk specification document at http://opensolaris.org/os/community/zfs/docs/. Page 32 illustration shows the rootbp pointing to a dnode_phys_t object (the first member of a objset_phys_t data structure). The source code indicates ub_rootbp is a blkptr_t, which contains a 3 member array of dva_t ''s called blk_dva (blk_dva[3]). Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). So it looks like each blk_dva contains 3 128-bit DVA''s.... You probably figured all this out already....did you try using a objset_phys_t to format the data? Thanks, /jim max at bruningsystems.com wrote:> Hi All, > I have modified mdb so that I can examine data structures on disk using > ::print. > This works fine for disks containing ufs file systems. It also works > for zfs file systems, but... > I use the dva block number from the uberblock_t to print what is at the > block > on disk. The problem I am having is that I can not figure out what (if > any) structure to use. > All of the xxx_phys_t types that I try do not look right. So, the > question is, just what is > the structure that the uberblock_t dva''s refer to on the disk? > > Here is an example: > > First, I use zdb to get the dva for the rootbp (should match the value > in the uberblock_t(?)). > > # zdb -dddd usbhard | grep -i dva > Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 DMU > objset] 400L/200P DVA[0]=<0:111f79000:200> DVA[1]=<0:506bde00:200> > DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE contiguous birth=621838 > fill=167 cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb > bp = [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> > DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE > contiguous birth=621838 fill=34026 > cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 > Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp [L0 > DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> > DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 > fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 > first block: [L0 ZIL intent log] 9000L/9000P > DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous birth=263950 > fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1 > ^C > # > > Then I run my modified mdb on the vdev containing the "usbhard" pool > # ./mdb /dev/rdsk/c4t0d0s0 > > I am using the DVA[0} for the META data set above. Note that I have > tried all of the xxx_phys_t structures > that I can find in zfs source, but none of them look right. Here is > example output dumping the data as a objset_phys_t. > (The shift by 9 and adding 400000 is from the zfs on-disk format paper, > I have tried without the addition, without the shift, > in all combinations, but the output still does not make sense). > > > (111f79000<<9)+400000::print zfs`objset_phys_t > { > os_meta_dnode = { > dn_type = 0x4f > dn_indblkshift = 0x75 > dn_nlevels = 0x82 > dn_nblkptr = 0x25 > dn_bonustype = 0x47 > dn_checksum = 0x52 > dn_compress = 0x1f > dn_flags = 0x82 > dn_datablkszsec = 0x5e13 > dn_bonuslen = 0x63c1 > dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ] > dn_maxblkid = 0x20a34fa97f3ff2a6 > dn_used = 0xac2ea261cef045ff > dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, > 0x315efac9cb693387, 0x2d56c54db5da75bf ] > dn_blkptr = [ > { > blk_dva = [ > { > dva_word = [ 0x87c9ed7672454887, > 0x760f569622246efe ] > } > { > dva_word = [ 0xce26ac20a6a5315c, > 0x38802e5d7cce495f ] > } > { > dva_word = [ 0x9241150676798b95, > 0x9c6985f95335742c ] > } > ] > None of this looks believable. So, just what is the rootbp in the > uberblock_t referring to? > > thanks, > max > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
max at bruningsystems.com
2007-Sep-18 06:18 UTC
[zfs-discuss] question about uberblock blkptr
Jim Mauro wrote:> > Hey Max - Check out the on-disk specification document at > http://opensolaris.org/os/community/zfs/docs/. > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t > object (the first member of a objset_phys_t data structure). > > The source code indicates ub_rootbp is a blkptr_t, which contains > a 3 member array of dva_t ''s called blk_dva (blk_dva[3]). > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). > > So it looks like each blk_dva contains 3 128-bit DVA''s.... > > You probably figured all this out already....did you try using > a objset_phys_t to format the data? > > Thanks, > /jimHi Jim, Yes, I have tried an objset_phys_t. This is what I am using below in the example. Either there''s some extra stuff that the on-disk format specification is not saying, or I''m not picking up the correct blkptr (though I have tried other blkptr''s from the uberblock array following the nvpair/label section at the beginning of the disk), or the uberblock_t blkptr is pointing to something completely different. I am going to have another look at the zdb code, as I suspect that it must also do something like what I am trying to do. Also, I think someone on this list should know what the uberblock_t blkptr refers to if it is not an objset_t. I don''t have compression or any encryption turned on, but I am also wondering if the metadata is somehow compressed or encrypted. Thanks for the response. I was beginning to think the only people that read this mailing list are admins... (Sorry guys, getting zfs configured properly is much more important than what I''m doing here, but this is more interesting to me). max> > > > max at bruningsystems.com wrote: >> Hi All, >> I have modified mdb so that I can examine data structures on disk >> using ::print. >> This works fine for disks containing ufs file systems. It also works >> for zfs file systems, but... >> I use the dva block number from the uberblock_t to print what is at >> the block >> on disk. The problem I am having is that I can not figure out what >> (if any) structure to use. >> All of the xxx_phys_t types that I try do not look right. So, the >> question is, just what is >> the structure that the uberblock_t dva''s refer to on the disk? >> >> Here is an example: >> >> First, I use zdb to get the dva for the rootbp (should match the >> value in the uberblock_t(?)). >> >> # zdb -dddd usbhard | grep -i dva >> Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 >> DMU objset] 400L/200P DVA[0]=<0:111f79000:200> >> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE >> contiguous birth=621838 fill=167 >> cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb >> bp = [L0 DMU objset] 400L/200P >> DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> >> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 >> fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 >> Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp >> [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> >> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE >> contiguous birth=621838 fill=34026 >> cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 >> first block: [L0 ZIL intent log] 9000L/9000P >> DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous >> birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1 >> ^C >> # >> >> Then I run my modified mdb on the vdev containing the "usbhard" pool >> # ./mdb /dev/rdsk/c4t0d0s0 >> >> I am using the DVA[0} for the META data set above. Note that I have >> tried all of the xxx_phys_t structures >> that I can find in zfs source, but none of them look right. Here is >> example output dumping the data as a objset_phys_t. >> (The shift by 9 and adding 400000 is from the zfs on-disk format >> paper, I have tried without the addition, without the shift, >> in all combinations, but the output still does not make sense). >> >> > (111f79000<<9)+400000::print zfs`objset_phys_t >> { >> os_meta_dnode = { >> dn_type = 0x4f >> dn_indblkshift = 0x75 >> dn_nlevels = 0x82 >> dn_nblkptr = 0x25 >> dn_bonustype = 0x47 >> dn_checksum = 0x52 >> dn_compress = 0x1f >> dn_flags = 0x82 >> dn_datablkszsec = 0x5e13 >> dn_bonuslen = 0x63c1 >> dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ] >> dn_maxblkid = 0x20a34fa97f3ff2a6 >> dn_used = 0xac2ea261cef045ff >> dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, >> 0x315efac9cb693387, 0x2d56c54db5da75bf ] >> dn_blkptr = [ >> { >> blk_dva = [ >> { >> dva_word = [ 0x87c9ed7672454887, >> 0x760f569622246efe ] >> } >> { >> dva_word = [ 0xce26ac20a6a5315c, >> 0x38802e5d7cce495f ] >> } >> { >> dva_word = [ 0x9241150676798b95, >> 0x9c6985f95335742c ] >> } >> ] >> None of this looks believable. So, just what is the rootbp in the >> uberblock_t referring to? >> >> thanks, >> max >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >
max at bruningsystems.com
2007-Sep-18 12:46 UTC
[zfs-discuss] question about uberblock blkptr
Jim Mauro wrote:> > Hey Max - Check out the on-disk specification document at > http://opensolaris.org/os/community/zfs/docs/. > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t > object (the first member of a objset_phys_t data structure). > > The source code indicates ub_rootbp is a blkptr_t, which contains > a 3 member array of dva_t ''s called blk_dva (blk_dva[3]). > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). > > So it looks like each blk_dva contains 3 128-bit DVA''s.... > > You probably figured all this out already....did you try using > a objset_phys_t to format the data? > > Thanks, > /jimOk. I think I know what''s wrong. I think the information (most likely, a objset_phys_t) is compressed with lzjb compression. Is there a way to turn this entirely off (not just for file data, but for all meta data as well when a pool is created? Or do I need to figure out how to hack in the lzjb_decompress() function in my modified mdb? (Also, I figured out that zdb is already doing the left shift by 9 before dumping DVA values, for anyone following this...). thanks, max> > > > max at bruningsystems.com wrote: >> Hi All, >> I have modified mdb so that I can examine data structures on disk >> using ::print. >> This works fine for disks containing ufs file systems. It also works >> for zfs file systems, but... >> I use the dva block number from the uberblock_t to print what is at >> the block >> on disk. The problem I am having is that I can not figure out what >> (if any) structure to use. >> All of the xxx_phys_t types that I try do not look right. So, the >> question is, just what is >> the structure that the uberblock_t dva''s refer to on the disk? >> >> Here is an example: >> >> First, I use zdb to get the dva for the rootbp (should match the >> value in the uberblock_t(?)). >> >> # zdb -dddd usbhard | grep -i dva >> Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 >> DMU objset] 400L/200P DVA[0]=<0:111f79000:200> >> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE >> contiguous birth=621838 fill=167 >> cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb >> bp = [L0 DMU objset] 400L/200P >> DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> >> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 >> fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 >> Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp >> [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> >> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE >> contiguous birth=621838 fill=34026 >> cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 >> first block: [L0 ZIL intent log] 9000L/9000P >> DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous >> birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1 >> ^C >> # >> >> Then I run my modified mdb on the vdev containing the "usbhard" pool >> # ./mdb /dev/rdsk/c4t0d0s0 >> >> I am using the DVA[0} for the META data set above. Note that I have >> tried all of the xxx_phys_t structures >> that I can find in zfs source, but none of them look right. Here is >> example output dumping the data as a objset_phys_t. >> (The shift by 9 and adding 400000 is from the zfs on-disk format >> paper, I have tried without the addition, without the shift, >> in all combinations, but the output still does not make sense). >> >> > (111f79000<<9)+400000::print zfs`objset_phys_t >> { >> os_meta_dnode = { >> dn_type = 0x4f >> dn_indblkshift = 0x75 >> dn_nlevels = 0x82 >> dn_nblkptr = 0x25 >> dn_bonustype = 0x47 >> dn_checksum = 0x52 >> dn_compress = 0x1f >> dn_flags = 0x82 >> dn_datablkszsec = 0x5e13 >> dn_bonuslen = 0x63c1 >> dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ] >> dn_maxblkid = 0x20a34fa97f3ff2a6 >> dn_used = 0xac2ea261cef045ff >> dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, >> 0x315efac9cb693387, 0x2d56c54db5da75bf ] >> dn_blkptr = [ >> { >> blk_dva = [ >> { >> dva_word = [ 0x87c9ed7672454887, >> 0x760f569622246efe ] >> } >> { >> dva_word = [ 0xce26ac20a6a5315c, >> 0x38802e5d7cce495f ] >> } >> { >> dva_word = [ 0x9241150676798b95, >> 0x9c6985f95335742c ] >> } >> ] >> None of this looks believable. So, just what is the rootbp in the >> uberblock_t referring to? >> >> thanks, >> max >> >> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >
max at bruningsystems.com writes: > Jim Mauro wrote: > > > > Hey Max - Check out the on-disk specification document at > > http://opensolaris.org/os/community/zfs/docs/. > > > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t > > object (the first member of a objset_phys_t data structure). > > > > The source code indicates ub_rootbp is a blkptr_t, which contains > > a 3 member array of dva_t ''s called blk_dva (blk_dva[3]). > > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). > > > > So it looks like each blk_dva contains 3 128-bit DVA''s.... > > > > You probably figured all this out already....did you try using > > a objset_phys_t to format the data? > > > > Thanks, > > /jim > Ok. I think I know what''s wrong. I think the information (most likely, > a objset_phys_t) is compressed > with lzjb compression. Is there a way to turn this entirely off (not > just for file data, but for all meta data > as well when a pool is created? Or do I need to figure out how to hack > in the lzjb_decompress() function in > my modified mdb? (Also, I figured out that zdb is already doing the > left shift by 9 before dumping DVA values, > for anyone following this...). > Max, this might help (zfs_mdcomp_disable) : http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#METACOMP -r > thanks, > max > > > > > > > > > max at bruningsystems.com wrote: > >> Hi All, > >> I have modified mdb so that I can examine data structures on disk > >> using ::print. > >> This works fine for disks containing ufs file systems. It also works > >> for zfs file systems, but... > >> I use the dva block number from the uberblock_t to print what is at > >> the block > >> on disk. The problem I am having is that I can not figure out what > >> (if any) structure to use. > >> All of the xxx_phys_t types that I try do not look right. So, the > >> question is, just what is > >> the structure that the uberblock_t dva''s refer to on the disk? > >> > >> Here is an example: > >> > >> First, I use zdb to get the dva for the rootbp (should match the > >> value in the uberblock_t(?)). > >> > >> # zdb -dddd usbhard | grep -i dva > >> Dataset mos [META], ID 0, cr_txg 4, 1003K, 167 objects, rootbp [L0 > >> DMU objset] 400L/200P DVA[0]=<0:111f79000:200> > >> DVA[1]=<0:506bde00:200> DVA[2]=<0:36a286e00:200> fletcher4 lzjb LE > >> contiguous birth=621838 fill=167 > >> cksum=84daa9667:365cb5b02b0:b4e531085e90:197eb9d99a3beb > >> bp = [L0 DMU objset] 400L/200P > >> DVA[0]=<0:111f6ae00:200> DVA[1]=<0:502efe00:200> > >> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE contiguous birth=621838 > >> fill=34026 cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 > >> Dataset usbhard [ZPL], ID 5, cr_txg 4, 15.7G, 34026 objects, rootbp > >> [L0 DMU objset] 400L/200P DVA[0]=<0:111f6ae00:200> > >> DVA[1]=<0:502efe00:200> DVA[2]=<0:36a284e00:200> fletcher4 lzjb LE > >> contiguous birth=621838 fill=34026 > >> cksum=cd0d51959:4fef8f217c3:10036508a5cc4:2320f4b2cde529 > >> first block: [L0 ZIL intent log] 9000L/9000P > >> DVA[0]=<0:36aef6000:9000> zilog uncompressed LE contiguous > >> birth=263950 fill=0 cksum=97a624646cebdadb:fd7b50f37b55153b:5:1 > >> ^C > >> # > >> > >> Then I run my modified mdb on the vdev containing the "usbhard" pool > >> # ./mdb /dev/rdsk/c4t0d0s0 > >> > >> I am using the DVA[0} for the META data set above. Note that I have > >> tried all of the xxx_phys_t structures > >> that I can find in zfs source, but none of them look right. Here is > >> example output dumping the data as a objset_phys_t. > >> (The shift by 9 and adding 400000 is from the zfs on-disk format > >> paper, I have tried without the addition, without the shift, > >> in all combinations, but the output still does not make sense). > >> > >> > (111f79000<<9)+400000::print zfs`objset_phys_t > >> { > >> os_meta_dnode = { > >> dn_type = 0x4f > >> dn_indblkshift = 0x75 > >> dn_nlevels = 0x82 > >> dn_nblkptr = 0x25 > >> dn_bonustype = 0x47 > >> dn_checksum = 0x52 > >> dn_compress = 0x1f > >> dn_flags = 0x82 > >> dn_datablkszsec = 0x5e13 > >> dn_bonuslen = 0x63c1 > >> dn_pad2 = [ 0x2e, 0xb9, 0xaa, 0x22 ] > >> dn_maxblkid = 0x20a34fa97f3ff2a6 > >> dn_used = 0xac2ea261cef045ff > >> dn_pad3 = [ 0x9c2b4541ab9f78c0, 0xdb27e70dce903053, > >> 0x315efac9cb693387, 0x2d56c54db5da75bf ] > >> dn_blkptr = [ > >> { > >> blk_dva = [ > >> { > >> dva_word = [ 0x87c9ed7672454887, > >> 0x760f569622246efe ] > >> } > >> { > >> dva_word = [ 0xce26ac20a6a5315c, > >> 0x38802e5d7cce495f ] > >> } > >> { > >> dva_word = [ 0x9241150676798b95, > >> 0x9c6985f95335742c ] > >> } > >> ] > >> None of this looks believable. So, just what is the rootbp in the > >> uberblock_t referring to? > >> > >> thanks, > >> max > >> > >> > >> _______________________________________________ > >> zfs-discuss mailing list > >> zfs-discuss at opensolaris.org > >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > >> > > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
max at bruningsystems.com
2007-Sep-19 17:48 UTC
[zfs-discuss] question about uberblock blkptr
Roch - PAE wrote:> max at bruningsystems.com writes: > > Jim Mauro wrote: > > > > > > Hey Max - Check out the on-disk specification document at > > > http://opensolaris.org/os/community/zfs/docs/. > > > > > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t > > > object (the first member of a objset_phys_t data structure). > > > > > > The source code indicates ub_rootbp is a blkptr_t, which contains > > > a 3 member array of dva_t ''s called blk_dva (blk_dva[3]). > > > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). > > > > > > So it looks like each blk_dva contains 3 128-bit DVA''s.... > > > > > > You probably figured all this out already....did you try using > > > a objset_phys_t to format the data? > > > > > > Thanks, > > > /jim > > Ok. I think I know what''s wrong. I think the information (most likely, > > a objset_phys_t) is compressed > > with lzjb compression. Is there a way to turn this entirely off (not > > just for file data, but for all meta data > > as well when a pool is created? Or do I need to figure out how to hack > > in the lzjb_decompress() function in > > my modified mdb? (Also, I figured out that zdb is already doing the > > left shift by 9 before dumping DVA values, > > for anyone following this...). > > > > Max, this might help (zfs_mdcomp_disable) : > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#METACOMP >Hi Roch, That would help, except it does not seem to work. I set zfs_mdcomp_disable to 1 with mdb, deleted the pool, recreated the pool, and zdb -uuuu still shows the rootbp in the uberblock_t to have the lzjb flag turned on. So I then added the variable to /etc/system, destroyed the pool, rebooted, recreated the pool, and still the same result. Also, my mdb shows the same thing for the uberblock_t rootbp blkptr data. I am running Nevada build 55b. I shall update the build I am running soon, but in the meantime I''ll probably write a modified cmd_print() function for my (modified) mdb to handle (at least) lzjb compressed metadata. Also, I think the ZFS Evil Tuning Guide should be modified. It says this can be tuned for Solaris 10 11/06 and snv_52. I guess that means only those two releases. snv_55b has the variable, but it doesn''t have an effect (at least on the uberblock_t rootbp meta-data). thanks for your help. max
max at bruningsystems.com writes: > Roch - PAE wrote: > > max at bruningsystems.com writes: > > > Jim Mauro wrote: > > > > > > > > Hey Max - Check out the on-disk specification document at > > > > http://opensolaris.org/os/community/zfs/docs/. > > > > > > > > Page 32 illustration shows the rootbp pointing to a dnode_phys_t > > > > object (the first member of a objset_phys_t data structure). > > > > > > > > The source code indicates ub_rootbp is a blkptr_t, which contains > > > > a 3 member array of dva_t ''s called blk_dva (blk_dva[3]). > > > > Each dva_t is a 2 member array of 64-bit unsigned ints (dva_word[2]). > > > > > > > > So it looks like each blk_dva contains 3 128-bit DVA''s.... > > > > > > > > You probably figured all this out already....did you try using > > > > a objset_phys_t to format the data? > > > > > > > > Thanks, > > > > /jim > > > Ok. I think I know what''s wrong. I think the information (most likely, > > > a objset_phys_t) is compressed > > > with lzjb compression. Is there a way to turn this entirely off (not > > > just for file data, but for all meta data > > > as well when a pool is created? Or do I need to figure out how to hack > > > in the lzjb_decompress() function in > > > my modified mdb? (Also, I figured out that zdb is already doing the > > > left shift by 9 before dumping DVA values, > > > for anyone following this...). > > > > > > > Max, this might help (zfs_mdcomp_disable) : > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#METACOMP > > > Hi Roch, > That would help, except it does not seem to work. I set > zfs_mdcomp_disable to 1 with mdb, > deleted the pool, recreated the pool, and zdb -uuuu still shows the > rootbp in the uberblock_t > to have the lzjb flag turned on. So I then added the variable to > /etc/system, destroyed the pool, > rebooted, recreated the pool, and still the same result. Also, my mdb > shows the same thing > for the uberblock_t rootbp blkptr data. I am running Nevada build 55b. > > I shall update the build I am running soon, but in the meantime I''ll > probably write a modified cmd_print() function for my > (modified) mdb to handle (at least) lzjb compressed metadata. Also, I > think the ZFS Evil Tuning Guide should be > modified. It says this can be tuned for Solaris 10 11/06 and snv_52. I > guess that means only those > two releases. snv_55b has the variable, but it doesn''t have an effect > (at least on the uberblock_t > rootbp meta-data). > > thanks for your help. > > max > My bad. The tunable only affects indirect dbufs (so I guess only for large files). As you noted, other metadata is compressed unconditionaly (I guess from the use of ZIO_COMPRESS_LZJB in dmu_objset_open_impl). -r
max at bruningsystems.com
2007-Sep-20 08:44 UTC
[zfs-discuss] question about uberblock blkptr
Hi Roch, Roch - PAE wrote:> max at bruningsystems.com writes: > > Roch - PAE wrote: > > > max at bruningsystems.com writes: > > > > Jim Mauro wrote: > > > > > > > > > > Hey Max - Check out the on-disk specification document at > > > > > http://opensolaris.org/os/community/zfs/docs/. >> > > > Ok. I think I know what''s wrong. I think the information (most likely, > > > > a objset_phys_t) is compressed > > > > with lzjb compression. Is there a way to turn this entirely off (not > > > > just for file data, but for all meta data > > > > as well when a pool is created? Or do I need to figure out how to hack > > > > in the lzjb_decompress() function in > > > > my modified mdb? (Also, I figured out that zdb is already doing the > > > > left shift by 9 before dumping DVA values, > > > > for anyone following this...). > > > > > > > > > > Max, this might help (zfs_mdcomp_disable) : > > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#METACOMP > > > > > Hi Roch, > > That would help, except it does not seem to work. I set > > zfs_mdcomp_disable to 1 with mdb, > > deleted the pool, recreated the pool, and zdb -uuuu still shows the > > rootbp in the uberblock_t > > to have the lzjb flag turned on. So I then added the variable to > > /etc/system, destroyed the pool, > > rebooted, recreated the pool, and still the same result. Also, my mdb > > shows the same thing > > for the uberblock_t rootbp blkptr data. I am running Nevada build 55b. > > > > I shall update the build I am running soon, but in the meantime I''ll > > probably write a modified cmd_print() function for my > > (modified) mdb to handle (at least) lzjb compressed metadata. Also, I > > think the ZFS Evil Tuning Guide should be > > modified. It says this can be tuned for Solaris 10 11/06 and snv_52. I > > guess that means only those > > two releases. snv_55b has the variable, but it doesn''t have an effect > > (at least on the uberblock_t > > rootbp meta-data). > > > > thanks for your help. > > > > max > > > > My bad. The tunable only affects indirect dbufs (so I guess > only for large files). As you noted, other metadata is > compressed unconditionaly (I guess from the use of > ZIO_COMPRESS_LZJB in dmu_objset_open_impl). > > -r > > >This makes printing the data with ::print much more problematic... The code in mdb that prints data structures recursively iterates through the structure members reading each member separately. I can either write a new print function that does the decompression, or add a new dcmd that does the descompression and dumps the data to the screen, but then I lose the structure member names in the output. I guess I''ll do the decompression dcmd first, and then figure out how to get the member names back in the output... thanks, max
Max, Glad you figured out where your problem was. Compression does complicate things. Also, make sure you have the most recent (highest txg) uberblock. Just for the record, using MDB to print out ZFS data structures is totally sweet! We have actually been wanting to do that for about 5 years now, but other things keep coming up :-) So we''d love for you to contribute your code to OpenSolaris once you get it more fully working. --matt