Has anyone done benchmarking on the scalability and performance of zpool import in terms of the number of devices in the pool on recent opensolaris builds? In other words, what would the relative performance be for "zpool import" for the following three pool configurations on multi-pathed 4G FC connected JBODs: 1) 1, 12 disk raidz2 in pool 2) 10, 12 disk raidz2 in pool 3) 100, 12 disk raidz2 in pool Any feedback on your experiences would be greatly appreciated. -- paul
Hi All, I have a zpool (name as testpool) on /dev/dsk/c0t0d0. The command $zpool import testpool, imports the testpool (means mount the testpool). How the import command comes to know testpool created on /dev/dsk/c0t0d0 ? And also the command $zpool import, list out all the zpools which we can import, How it list our them ? Please take a look into the following sequence of commands //Create testpool on /dev/dsk/c0t0d0 and destroy it #zpool create testpool /dev/dsk/c0t0d0 #zpool destroy testpool //create testpool on /dev/dsk/c0t0d1 and destroy it #zpool create testpool /dev/dsk/c0t0d1 #zpool destroy testpool //now list out all the zpools which are destroyed #zpool import -D The above command lists two testpools one on c0t0d0 and another on c0t0d1.... Actually at any time we can create (import) only one pool with one name. But the above command listing two different pools with same name. Whats wrong with import command ? How a ZFS system knows about which device belongs to which pool ? is Zpool import command read any info on the disk to know about to which pool this disk belongs ? Your help is appreciated. Thanks & Regards -Masthan --------------------------------- Be a PS3 game guru. Get your game face on with the latest PS3 news and previews at Yahoo! Games. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20070226/6f06d34a/attachment.html>
On Mon, 2007-02-26 at 07:00 -0800, dudekula mastan wrote:> > Hi All, > > I have a zpool (name as testpool) on /dev/dsk/c0t0d0. > > The command $zpool import testpool, imports the testpool (means mount > the testpool). > > How the import command comes to know testpool created > on /dev/dsk/c0t0d0 ? > > And also the command $zpool import, list out all the zpools which we > can import, How it list our them ?http://cvs.opensolaris.org/source/xref/clearview/usr/src/uts/common/fs/zfs/vdev_label.c 32 * The vdev label serves several distinct purposes: 33 * 34 * 1. Uniquely identify this device as part of a ZFS pool and confirm its 35 * identity within the pool. 36 * 37 * 2. Verify that all the devices given in a configuration are present 38 * within the pool. 39 * 40 * 3. Determine the uberblock for the pool. 41 * 42 * 4. In case of an import operation, determine the configuration of the 43 * toplevel vdev of which it is a part. 44 * 45 * 5. If an import operation cannot find all the devices in the pool, 46 * provide enough information to the administrator to determine which 47 * devices are missing. 48 * 49 * It is important to note that while the kernel is responsible for writing the 50 * label, it only consumes the information in the first three cases. The 51 * latter information is only consumed in userland when determining the 52 * configuration to import a pool [...] 99 * On-disk Format 100 * -------------- 101 * 102 * The vdev label consists of two distinct parts, and is wrapped within the 103 * vdev_label_t structure. The label includes 8k of padding to permit legacy 104 * VTOC disk labels, but is otherwise ignored. 105 * 106 * The first half of the label is a packed nvlist which contains pool wide 107 * properties, per-vdev properties, and configuration information. It is 108 * described in more detail below. 109 * 110 * The latter half of the label consists of a redundant array of uberblocks. 111 * These uberblocks are updated whenever a transaction group is committed, 112 * or when the configuration is updated. When a pool is loaded, we scan each 113 * vdev for the ''best'' uberblock. 114 * 115 * 116 * Configuration Information 117 * ------------------------- 118 * 119 * The nvlist describing the pool and vdev contains the following elements: 120 * 121 * version ZFS on-disk version 122 * name Pool name 123 * state Pool state 124 * txg Transaction group in which this label was written 125 * pool_guid Unique identifier for this pool 126 * vdev_tree An nvlist describing vdev tree. 127 * 128 * Each leaf device label also contains the following: 129 * 130 * top_guid Unique ID for top-level vdev in which this is contained 131 * guid Unique ID for the leaf vdev Each disk that has been part of a zpool at some point has a vdev. zfs import scans all devices that are seen by format or rmformat if there is a vdev. Francois
It''s perfectly reasonable to have multiple exported/destroyed pools with the same name. Pool names are unique only when active on the system. This is why ''zpool import'' also prints out the pool GUID and allows import by ID, instead of just names. In your output below, you''d see that each pool has a differnt ID, and you could chooose your pool based on this information, and optionally rename it to a non-conflicting name when you actually do the import. - Eric On Mon, Feb 26, 2007 at 07:00:30AM -0800, dudekula mastan wrote:> > Hi All, > > I have a zpool (name as testpool) on /dev/dsk/c0t0d0. > > The command $zpool import testpool, imports the testpool (means mount the testpool). > > How the import command comes to know testpool created on /dev/dsk/c0t0d0 ? > > And also the command $zpool import, list out all the zpools which we can import, How it list our them ? > > Please take a look into the following sequence of commands > > //Create testpool on /dev/dsk/c0t0d0 and destroy it > #zpool create testpool /dev/dsk/c0t0d0 > #zpool destroy testpool > > //create testpool on /dev/dsk/c0t0d1 and destroy it > #zpool create testpool /dev/dsk/c0t0d1 > #zpool destroy testpool > > //now list out all the zpools which are destroyed > #zpool import -D > > The above command lists two testpools one on c0t0d0 and another on c0t0d1.... > > Actually at any time we can create (import) only one pool with one name. But the above command listing two different pools with same name. Whats wrong with import command ? > > How a ZFS system knows about which device belongs to which pool ? > is Zpool import command read any info on the disk to know about to which pool this disk belongs ? > > Your help is appreciated. > > Thanks & Regards > -Masthan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --------------------------------- > Be a PS3 game guru. > Get your game face on with the latest PS3 news and previews at Yahoo! Games.> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
The slow part of zpool import is actually discovering the pool configuration. This involves examining every device on the system (or every device within a ''import -d'' directory) and seeing if it has any labels. Internally, the import action itself shoudl be quite fast, and is essentially the same speed as opening a pool normally. So the scalability is really depending on the number of devices in the system, not the number of devices within a pool. - Eric On Mon, Feb 26, 2007 at 08:14:14AM -0600, Paul Fisher wrote:> Has anyone done benchmarking on the scalability and performance of zpool import in terms of the number of devices in the pool on recent opensolaris builds? > > In other words, what would the relative performance be for "zpool import" for the following three pool configurations on multi-pathed 4G FC connected JBODs: > 1) 1, 12 disk raidz2 in pool > 2) 10, 12 disk raidz2 in pool > 3) 100, 12 disk raidz2 in pool > > Any feedback on your experiences would be greatly appreciated. > > > -- > > paul > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Mon, Feb 26, 2007 at 10:05:08AM -0800, Eric Schrock wrote:> The slow part of zpool import is actually discovering the pool > configuration. This involves examining every device on the system (or > every device within a ''import -d'' directory) and seeing if it has any > labels. Internally, the import action itself shoudl be quite fast, and > is essentially the same speed as opening a pool normally. So the > scalability is really depending on the number of devices in the system, > not the number of devices within a pool.Couldn''t all that tasting be done in parallel? Nico --
On Mon, Feb 26, 2007 at 12:06:14PM -0600, Nicolas Williams wrote:> On Mon, Feb 26, 2007 at 10:05:08AM -0800, Eric Schrock wrote: > > The slow part of zpool import is actually discovering the pool > > configuration. This involves examining every device on the system (or > > every device within a ''import -d'' directory) and seeing if it has any > > labels. Internally, the import action itself shoudl be quite fast, and > > is essentially the same speed as opening a pool normally. So the > > scalability is really depending on the number of devices in the system, > > not the number of devices within a pool. > > Couldn''t all that tasting be done in parallel?Yep, that''s certainly possible. Sounds like a perfect feature for someone in the community to work on :-) Simply take zpool_find_import(), add some worker thread/pool model, and there you go. - Ericd -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Mon, Feb 26, 2007 at 10:10:15AM -0800, Eric Schrock wrote:> On Mon, Feb 26, 2007 at 12:06:14PM -0600, Nicolas Williams wrote: > > Couldn''t all that tasting be done in parallel? > > Yep, that''s certainly possible. Sounds like a perfect feature for > someone in the community to work on :-) Simply take > zpool_find_import(), add some worker thread/pool model, and there you > go.What is slow, BTW? The open(2)s of the devices? Or the label reading? And is there a way to do async open(2)s w/o a thread per-open? The open(2) man page isn''t very detailed about O_NONBLOCK/O_NDELAY behaviour on devices ("[s]ubsequent behaviour of the device is device-specific")... Also, I see this happens in user-land. Is there any benefit of trying this in kernel-land? OT: I''ve been trying to get ZFS boot on a USB flash device going, and currently that''s failing to find the pool named in /etc/system -- next time I try it I will check if this is something to do with USB modules not loading or not in the minitroot (I don''t have that USB stick with me atm) or if zpool.cache refers to the wrong device or doesn''t match /devices. If ZFS boot could live without a zpool.cache to find the volum with the boot root FS that would rock. Incidentally, it''d be nice if I could more easily observe what is going wrong here with kmdb -- there must be a way to get more info from the ZFS module that I''m just missing. Nico --
On Mon, Feb 26, 2007 at 12:27:48PM -0600, Nicolas Williams wrote:> > What is slow, BTW? The open(2)s of the devices? Or the label reading? > And is there a way to do async open(2)s w/o a thread per-open? The > open(2) man page isn''t very detailed about O_NONBLOCK/O_NDELAY behaviour > on devices ("[s]ubsequent behaviour of the device is device-specific")...Simply opening and reading 512K from the beginning and end of every device under /dev/dsk. Async I/O would probably help. If you use libaio, then it will do the right thing depending on the device (spawn a thread to do synchronous I/O, or use the driver entry points if provided).> Also, I see this happens in user-land. Is there any benefit of trying > this in kernel-land?No. It''s simpler and less brittle to keep it userland. - Eric -- Eric Schrock, Solaris Kernel Development http://blogs.sun.com/eschrock
On Mon, Feb 26, 2007 at 10:32:22AM -0800, Eric Schrock wrote:> On Mon, Feb 26, 2007 at 12:27:48PM -0600, Nicolas Williams wrote: > > > > What is slow, BTW? The open(2)s of the devices? Or the label reading? > > And is there a way to do async open(2)s w/o a thread per-open? The > > open(2) man page isn''t very detailed about O_NONBLOCK/O_NDELAY behaviour > > on devices ("[s]ubsequent behaviour of the device is device-specific")... > > Simply opening and reading 512K from the beginning and end of every > device under /dev/dsk. Async I/O would probably help. If you use > libaio, then it will do the right thing depending on the device (spawn a > thread to do synchronous I/O, or use the driver entry points if > provided).So, are you saying that O_NONBLOCK/O_NDELAY opens of devices in /dev/dsk is supported (that''s what I was asking, albeit obliquely)? And if so, can the application issue an aioread(3AIO) immediately, and if not, how can the application poll for the open to complete (not with poll(2)!)? My guess (and if I''ve time I''ll test it) is that there''s no way to do async opens of disk devices, that one would have to create multiple threads, one-per-device up to some maximum, for tasting devices in parallel. Nico --
> From: Eric Schrock [mailto:eric.schrock at sun.com] > Sent: Monday, February 26, 2007 12:05 PM > > The slow part of zpool import is actually discovering the > pool configuration. This involves examining every device on > the system (or every device within a ''import -d'' directory) > and seeing if it has any labels. Internally, the import > action itself shoudl be quite fast...Thanks for the answer. Let me ask a follow-up question related to zpool import and the sun cluster+zfs integration -- is the slow part done "early" on the backup node so that at the time of the failover the actual import is "fast" as you describe above? -- paul
Hello Paul, Monday, February 26, 2007, 8:28:43 PM, you wrote:>> From: Eric Schrock [mailto:eric.schrock at sun.com] >> Sent: Monday, February 26, 2007 12:05 PM >> >> The slow part of zpool import is actually discovering the >> pool configuration. This involves examining every device on >> the system (or every device within a ''import -d'' directory) >> and seeing if it has any labels. Internally, the import >> action itself shoudl be quite fast...PF> Thanks for the answer. Let me ask a follow-up question related PF> to zpool import and the sun cluster+zfs integration -- is the slow PF> part done "early" on the backup node so that at the time of the PF> failover the actual import is "fast" as you describe above? Right now Sun Cluster does import almost the same way as ''zpool import'' does. Perhaps SC could save config for each pool and then during import use it in a similar way zpool.cache is used during zfs initialization but only for a given pool. In most cases it should greatly reduce import time in SC environment with lot of LUNs. Perhaps: zpool config export pool >pool.cache then later - zpool import -c pool.cache pool Of course such imported pool won''t be added to zpool.cache (or perhaps it should be optional and SC doesn''t used zpool command anyway). -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com