Hi Guys, Someone asked me how to count the number of inodes/objects in a ZFS filesystem and I wasn''t exactly sure. "zdb -dv <filesystem>" seems like a likely candidate but I wanted to find out for sure. As to why you''d want to know this, I don''t know their reasoning but I assume it has to do with the maximum number of files a ZFS filesystem can support (2^48 no?). Thank you in advance for your help. Best Regards, Jason
You can just do something like this: # zfs list tank/home/billm NAME USED AVAIL REFER MOUNTPOINT tank/home/billm 83.9G 5.56T 74.1G /export/home/billm # zdb tank/home/billm Dataset tank/home/billm [ZPL], ID 83, cr_txg 541, 74.1G, 111066 objects Let me know if that causes any trouble. --Bill On Fri, Nov 09, 2007 at 12:14:07PM -0700, Jason J. W. Williams wrote:> Hi Guys, > > Someone asked me how to count the number of inodes/objects in a ZFS > filesystem and I wasn''t exactly sure. "zdb -dv <filesystem>" seems > like a likely candidate but I wanted to find out for sure. As to why > you''d want to know this, I don''t know their reasoning but I assume it > has to do with the maximum number of files a ZFS filesystem can > support (2^48 no?). Thank you in advance for your help. > > Best Regards, > Jason > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hey Bill: what''s an object here? or do we have a mapping between "objects" and block pointers? for example a zdb -bb might show: th37 # zdb -bb rz-7 Traversing all blocks to verify nothing leaked ... No leaks (block sum matches space maps exactly) bp count: 47 bp logical: 518656 avg: 11035 bp physical: 64512 avg: 1372 compression: 8.04 bp allocated: 249856 avg: 5316 compression: 2.08 SPA allocated: 249856 used: 0.00% but do we maintain any sort of mapping between the object instantiation and how many block pointers an "object" or file might consume on disk? --- .je On Nov 9, 2007, at 15:18, Bill Moore wrote:> You can just do something like this: > > # zfs list tank/home/billm > NAME USED AVAIL REFER MOUNTPOINT > tank/home/billm 83.9G 5.56T 74.1G /export/home/billm > # zdb tank/home/billm > Dataset tank/home/billm [ZPL], ID 83, cr_txg 541, 74.1G, 111066 > objects > > Let me know if that causes any trouble. > > > --Bill > > On Fri, Nov 09, 2007 at 12:14:07PM -0700, Jason J. W. Williams wrote: >> Hi Guys, >> >> Someone asked me how to count the number of inodes/objects in a ZFS >> filesystem and I wasn''t exactly sure. "zdb -dv <filesystem>" seems >> like a likely candidate but I wanted to find out for sure. As to why >> you''d want to know this, I don''t know their reasoning but I assume it >> has to do with the maximum number of files a ZFS filesystem can >> support (2^48 no?). Thank you in advance for your help. >> >> Best Regards, >> Jason >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello all. I am working on an NFS failover scenario between two servers. I am getting the stale file handle errors on my (linux) client which point to there being a mismatch in the fsid''s of my two filesystems when the failover occurs. I understand that the fsid_guid attribute which is then used as the fsid in an NFS share, is created at zfs create time, but I would like to see and modify that value on any particular zfs filesystem after creation. More details were discussed at http://www.mail-archive.com/zfs- discuss at opensolaris.org/msg03662.html but this was talking about the same filesystem sitting on a san failing over between two nodes. On a linux NFS server one can specify in the nfs exports "-o fsid=num" which can be an arbitrary number, which would seem to fix this issue for me, but it seems to be unsupported on Solaris. Any thoughts on workarounds to this issue? Thank you kind sirs and ladies. Asa -hack On Nov 10, 2007, at 10:18 AM, Jonathan Edwards wrote:> Hey Bill: > > what''s an object here? or do we have a mapping between "objects" and > block pointers? > > for example a zdb -bb might show: > th37 # zdb -bb rz-7 > > Traversing all blocks to verify nothing leaked ... > > No leaks (block sum matches space maps exactly) > > bp count: 47 > bp logical: 518656 avg: 11035 > bp physical: 64512 avg: 1372 > compression: 8.04 > bp allocated: 249856 avg: 5316 > compression: 2.08 > SPA allocated: 249856 used: 0.00% > > but do we maintain any sort of mapping between the object > instantiation and how many block pointers an "object" or file might > consume on disk? > > --- > .je > > On Nov 9, 2007, at 15:18, Bill Moore wrote: > >> You can just do something like this: >> >> # zfs list tank/home/billm >> NAME USED AVAIL REFER MOUNTPOINT >> tank/home/billm 83.9G 5.56T 74.1G /export/home/billm >> # zdb tank/home/billm >> Dataset tank/home/billm [ZPL], ID 83, cr_txg 541, 74.1G, 111066 >> objects >> >> Let me know if that causes any trouble. >> >> >> --Bill >> >> On Fri, Nov 09, 2007 at 12:14:07PM -0700, Jason J. W. Williams wrote: >>> Hi Guys, >>> >>> Someone asked me how to count the number of inodes/objects in a ZFS >>> filesystem and I wasn''t exactly sure. "zdb -dv <filesystem>" seems >>> like a likely candidate but I wanted to find out for sure. As to why >>> you''d want to know this, I don''t know their reasoning but I >>> assume it >>> has to do with the maximum number of files a ZFS filesystem can >>> support (2^48 no?). Thank you in advance for your help. >>> >>> Best Regards, >>> Jason >>> _______________________________________________ >>> zfs-discuss mailing list >>> zfs-discuss at opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Mattias Pantzare
2007-Nov-10 23:49 UTC
[zfs-discuss] Modify fsid/guid of dataset for NFS failover
2007/11/10, asa <asa_mailinglists at assembleco.com>:> Hello all. I am working on an NFS failover scenario between two > servers. I am getting the stale file handle errors on my (linux) > client which point to there being a mismatch in the fsid''s of my two > filesystems when the failover occurs. > I understand that the fsid_guid attribute which is then used as the > fsid in an NFS share, is created at zfs create time, but I would like > to see and modify that value on any particular zfs filesystem after > creation. > > More details were discussed at http://www.mail-archive.com/zfs- > discuss at opensolaris.org/msg03662.html but this was talking about the > same filesystem sitting on a san failing over between two nodes. > > On a linux NFS server one can specify in the nfs exports "-o > fsid=num" which can be an arbitrary number, which would seem to fix > this issue for me, but it seems to be unsupported on Solaris.As the fsid is created when the file system is created it will be the same when you mount it on a different NFS server. Why change it? Or are you trying to match two different file systems? Then you also have to match all inode-numbers on your files. That is not possible at all.
On Nov 10, 2007, at 3:49 PM, Mattias Pantzare wrote:> 2007/11/10, asa <asa_mailinglists at assembleco.com>: >> Hello all. I am working on an NFS failover scenario between two >> servers. I am getting the stale file handle errors on my (linux) >> client which point to there being a mismatch in the fsid''s of my two >> filesystems when the failover occurs. >> I understand that the fsid_guid attribute which is then used as the >> fsid in an NFS share, is created at zfs create time, but I would like >> to see and modify that value on any particular zfs filesystem after >> creation. >> >> More details were discussed at http://www.mail-archive.com/zfs- >> discuss at opensolaris.org/msg03662.html but this was talking about the >> same filesystem sitting on a san failing over between two nodes. >> >> On a linux NFS server one can specify in the nfs exports "-o >> fsid=num" which can be an arbitrary number, which would seem to fix >> this issue for me, but it seems to be unsupported on Solaris. > > As the fsid is created when the file system is created it will be the > same when you mount it on a different NFS server. Why change it?> Or are you trying to match two different file systems? Then you also > have to match all inode-numbers on your files. That is not possible at > all.I am trying to match two different file systems. I have the two file systems being replicated via zfs send|recv for a near realtime mirror so they are the same filesystem in my head. There may well be zfs goodness going on under the hood which makes this fsid different even if they seem like they could be the same because they originated from the same filesystem via zfs send/recv. Perhaps what is happening when zfs recv recieves a zfs stream is to create a totally new filesystem under the new location. I found an ID parameter on the datasets with: > zdb -d tank/test -lv Dataset tank/test [ZPL], ID 37406, cr_txg 2410348, 593M, 21 objects It is different on each machine. Is this the GUID? or something else. Some hack way to set it? I know not enough about inodes and zfs to know if what I am asking is silly, and once I get past this FSID issue I will hit that next stumbling block of inode and file ID differences which will trip up the nfs failover. I would like for all my NFS clients to hang during the failover, then pick up trucking on this new filesystem, perhaps obviously failing their writes back to the apps which are doing the writing. Naive? Asa
Carson Gaspar
2007-Nov-11 06:16 UTC
[zfs-discuss] Modify fsid/guid of dataset for NFS failover
Mattias Pantzare wrote:> As the fsid is created when the file system is created it will be the > same when you mount it on a different NFS server. Why change it? > > Or are you trying to match two different file systems? Then you also > have to match all inode-numbers on your files. That is not possible at > all.It is, if you do block replication between the servers (drbd on Linux, or the Sun product whose name I''m blanking on at the moment). What isn''t clear is if zfs send/recv retains inode numbers... if it doesn''t that''s a really sad thing, as we won''t be able to use ZFS to replace NetApp snapmirrors. -- Carson
Jonathan Edwards
2007-Nov-12 16:19 UTC
[zfs-discuss] Modify fsid/guid of dataset for NFS failover
On Nov 10, 2007, at 23:16, Carson Gaspar wrote:> Mattias Pantzare wrote: > >> As the fsid is created when the file system is created it will be the >> same when you mount it on a different NFS server. Why change it? >> >> Or are you trying to match two different file systems? Then you also >> have to match all inode-numbers on your files. That is not >> possible at >> all. > > It is, if you do block replication between the servers (drbd on Linux, > or the Sun product whose name I''m blanking on at the moment).AVS (or Availability Suite) .. http://www.opensolaris.org/os/project/avs/ Jim Dunham does a nice demo here for block replication on zfs (see sidebar)> What isn''t clear is if zfs send/recv retains inode numbers... if it > doesn''t that''s a really sad thing, as we won''t be able to use ZFS to > replace NetApp snapmirrors.zfs send/recv comes out of the DSL which i believe will generate a unique fsid_guid .. for mirroring you''d really want to use AVS. btw - you can also look at the Cluster SUNWnfs agent in the ohac community: http://opensolaris.org/os/community/ha-clusters/ohac/downloads/ hth --- .je
Darren J Moffat
2007-Nov-12 21:21 UTC
[zfs-discuss] Modify fsid/guid of dataset for NFS failover
asa wrote:> I would like for all my NFS clients to hang during the failover, then > pick up trucking on this new filesystem, perhaps obviously failing > their writes back to the apps which are doing the writing. Naive?The OpenSolaris NFS client does this already - has done since IIRC around Solaris 2.6. The knowledge is in the NFS client code. For NFSv4 this functionality is part of the standard. -- Darren J Moffat
Well then this is probably the wrong list to be hounding I am looking for something like http://blog.wpkg.org/2007/10/26/stale-nfs-file-handle/ Where when fileserver A dies, fileserver B can come up, grab the same IP address via some mechanism(in this case I am using sun cluster) and keep on trucking without the lovely stale file handle errors I am encountering. My clients are Linux, Servers are sol 10u4. it seems that it is impossible to change the fsid on solaris, can you point me towards the appropriate NFS client behavior option lingo if you have a minute?(just the terminology would be great, there are a ton of confusing options in the land of nfs: Client recovery, failover, replicas etc....) I am unable to use block base replication(AVS) underneath the ZFS layer because I would like to run with different zpool schemes on each server( fast primary server, slower, larger failover server only to be used during downtime on the main server.) Worst case scenario here seems to be that I would have to forcibly unmount and remount all my client mounts. Ill start bugging the nfs-discuss people. Thank you. Asa On Nov 12, 2007, at 1:21 PM, Darren J Moffat wrote:> asa wrote: >> I would like for all my NFS clients to hang during the failover, >> then pick up trucking on this new filesystem, perhaps obviously >> failing their writes back to the apps which are doing the >> writing. Naive? > > The OpenSolaris NFS client does this already - has done since IIRC > around Solaris 2.6. The knowledge is in the NFS client code. > > For NFSv4 this functionality is part of the standard. > > -- > Darren J Moffat
Richard Elling
2007-Nov-21 03:27 UTC
[zfs-discuss] Modify fsid/guid of dataset for NFS failover
asa wrote:> Well then this is probably the wrong list to be hounding > > I am looking for something like http://blog.wpkg.org/2007/10/26/stale-nfs-file-handle/ > Where when fileserver A dies, fileserver B can come up, grab the same > IP address via some mechanism(in this case I am using sun cluster) and > keep on trucking without the lovely stale file handle errors I am > encountering. >If you are getting stale file handles, then the Solaris cluster is misconfigured. Please double check the NFS installation guide for Solaris Cluster and verify that the paths are correct. -- richard
I am "rolling my own" replication using zfs send|recv through the cluster agent framework and a custom HA shared local storage set of scripts(similar to http://www.posix.brte.com.br/blog/?p=75 but without avs). I am not using zfs off of shared storage in the supported way. So this is a bit of a lonely area. =) As these are two different zfs volumes on different zpools of differing underlying vdev topology, it appears they are not sharing the same fsid and so are assumedly presenting different file handles from each other. I have the cluster parts out of the way(mostly =)), I now need to solve the nfs side of things so that at the point of failing over. I have isolated zfs out of the equation, I receive the same stale file handle errors if I try and share an arbitrary UFS folder to the client through the cluster interface. Yeah I am a hack. Asa On Nov 20, 2007, at 7:27 PM, Richard Elling wrote:> asa wrote: >> Well then this is probably the wrong list to be hounding >> >> I am looking for something like http://blog.wpkg.org/2007/10/26/stale-nfs-file-handle/ >> Where when fileserver A dies, fileserver B can come up, grab the >> same IP address via some mechanism(in this case I am using sun >> cluster) and keep on trucking without the lovely stale file handle >> errors I am encountering. >> > > If you are getting stale file handles, then the Solaris cluster is > misconfigured. > Please double check the NFS installation guide for Solaris Cluster and > verify that the paths are correct. > -- richard >