Alec Muffett
2009-Mar-27 14:39 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
Hi, The inability to create more than 1 clone at a time (ie: in separate TXGs) is something which has hampered me (and several projects on which I have worked) for some years, now. Specifically I am looking at various forms of diskless grid/cloud environments where you create a "golden image", snapshot it, and then clone that snapshot perhaps 1000 times for 1000 machines... poking the image slightly each time, and letting DHCP pick up the administrative slack of systems management. To create 1000 clones in this fashion (for i in `range 1 1000` do ; zfs clone [...] ; done) may take well over 1 hour, because 1000 TXGs need to be set up and committed. I would like to create 1000 clones in rather less than a couple of minutes. I''ve kicked this idea around with Darren Moffat and he informs me that it is painful to achieve the obvious solution: zfs clone tank/src at version tank/foo tank/bar tank/baz ... ...because to explicitly specify each (of multiple) clone names on the cmdline causes multiple calls to ioctl() and therefore multiple TXGs, and thus slowness. Thus we hit on this proposal, for your consideration: zfs multiclone tank/fs at 1 tank/<PATTERN> <BEGIN> <END> [STRIDE] - implement limited but comprehensive snprintf semantics for PATTERN - support: %d, %3d, %03d, FOO%d, FOO%dBAR, FOO%#08xBAR - includes decimal, hex, octal - BEGIN, END, STRIDE all decimal positive integers - STRIDE optional, defaults to 1 - creation begins at BEGIN, increments by STRIDE, continues until END is exceeded Examples: - zfs multiclone tank/shark at 1 tank/fish%02d 0 7 2 - creates: /tank/fish00 /tank/fish02 /tank/fish04 /tank/fish06 - zfs multiclone tank/gold-image tank/diskless/node%d.root 1 100 - ...is pretty obvious. (.../node1.root/ etc) What do you think? - alec
Mark J Musante
2009-Mar-27 14:49 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
On Fri, 27 Mar 2009, Alec Muffett wrote:> The inability to create more than 1 clone at a time (ie: in separate > TXGs) is something which has hampered me (and several projects on which > I have worked) for some years, now.Hi Alec, Does CR 6475257 cover what you''re looking for? Regards, markm
Alec Muffett
2009-Mar-27 14:53 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
I would like to apologise to those reading via the forums, because I used BNF anglebrackets and even though I sent a plaintext message, it lost my text as "HTML"... zfs multiclone tank/fs at 1 tank/PATTERN BEGIN END [STRIDE] -a
Darren J Moffat
2009-Mar-27 14:54 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
Mark J Musante wrote:> On Fri, 27 Mar 2009, Alec Muffett wrote: > >> The inability to create more than 1 clone at a time (ie: in separate >> TXGs) is something which has hampered me (and several projects on >> which I have worked) for some years, now. > > Hi Alec, > > Does CR 6475257 cover what you''re looking for?It was the same Alec that logged 6475257 back in 2006. -- Darren J Moffat
Richard Elling
2009-Mar-27 15:04 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
Alec Muffett wrote:> I would like to apologise to those reading via the forums, because I > used BNF anglebrackets and even though I sent a plaintext message, it > lost my text as "HTML"... > > zfs multiclone tank/fs at 1 tank/PATTERN BEGIN END [STRIDE]So much for an easy-to-use CLI :-O How about feeding in a file containing names instead (qv fmthard -s)? -- richard
Alec Muffett
2009-Mar-27 15:25 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
> So much for an easy-to-use CLI :-O > How about feeding in a file containing names instead (qv fmthard -s)?Not terribly script-friendly; I suffered that sort of thing with zonecfg and zoneadm (create a controlfile and squirt it into another command) and deemed it a horrible hack. They are still too broken for me to face using in their raw state except on special occasions... Same reason I never use fgrep, it''s just not the true Unix way. If it *was* the true Unix way, you would never have written something like: rm `cat files-to-delete.txt` ...or anything else in backticks in your life; nor would you ever have used xargs... :-) -a
Darren J Moffat
2009-Mar-27 15:33 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
Alec Muffett wrote:>> So much for an easy-to-use CLI :-O >> How about feeding in a file containing names instead (qv fmthard -s)? > > Not terribly script-friendly; I suffered that sort of thing with zonecfg > and zoneadm (create a controlfile and squirt it into another command) > and deemed it a horrible hack. They are still too broken for me to face > using in their raw state except on special occasions...Also it is unfriendly to the way the zfs ioctl code works today because that would mean passing every single filename down to the kernel over a single ioctl call. The reason for the "pattern" based filenames is because: a) that is probably what is wanted most of the time anyway b) it is easy to pass from userland to kernel - you pass the rules (after some userland sanity checking first) as is. -- Darren J Moffat
Alec Muffett
2009-Mar-27 15:36 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
> The reason for the "pattern" based filenames is because: > a) that is probably what is wanted most of the time anyway > b) it is easy to pass from userland to kernel - you pass the > rules (after some userland sanity checking first) as is.Just to quote what I wrote back in 2006 (ahem) which would *also* fit the "single ioctl()" model:>> 2) zfs clone -C N <snapshot> <filesystemXXXXXX> >> >> - ie: following the inspiration of mktemp(3c), when invokes with "-C" >> (count) then N clones are created, numbered 0 thru (N-1), where the >> counter number for each one will replace XXXXXX in the filesystem >> pattern, zero-padded left if needed. >> >> For example: >> >> zfs clone -C 1000 pool/ xxxxx at xxxxx pool/fishXXXXXX >> >> ...yields clones named pool/fish000000 through pool/fish000999....but I like the control of having a snprintf() pattern with START/ STOP/STEP, more. It brings out the BASIC programmer in me... - alec
Chris Kirby
2009-Mar-27 15:46 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
On Mar 27, 2009, at 10:33 AM, Darren J Moffat wrote:> a) that is probably what is wanted most of the time anyway > b) it is easy to pass from userland to kernel - you pass the > rules (after some userland sanity checking first) as is.But doesn''t that also exclude the possibility of creating non-pattern based clones in a single txg? While I think that allowing multiple clones to be created in a single txg is perfectly reasonable, we shouldn''t need to artificially restrict the clone namespace in order to achieve that. -Chris
Darren J Moffat
2009-Mar-27 15:58 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
Chris Kirby wrote:> On Mar 27, 2009, at 10:33 AM, Darren J Moffat wrote: >> a) that is probably what is wanted most of the time anyway >> b) it is easy to pass from userland to kernel - you pass the >> rules (after some userland sanity checking first) as is. > > > But doesn''t that also exclude the possibility of creating non-pattern based > clones in a single txg?Yes it does.> While I think that allowing multiple clones to be created in a single txg > is perfectly reasonable, we shouldn''t need to artificially restrict the > clone namespace in order to achieve that.Agreed, but other than pattern based I can''t at the moment thing of a nice way to pass all the names over the /dev/zfs ioctl call while maintaining the fact it is pretty much all fixed size. I''m not saying passing a list of names over the ioctl is impossible, more it just doesn''t feel right to me at the moment - but I''m happy to be convinced otherwise. That way the patterning part can be left to the shell. -- Darren J Moffat
Miles Nordin
2009-Mar-27 18:52 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
>>>>> "djm" == Darren J Moffat <darrenm at opensolaris.org> writes:djm> I''m not saying passing a list of names over the ioctl is djm> impossible, more it just doesn''t feel right to me at the djm> moment - but I''m happy to be convinced otherwise. im not sure I want to convince you otherwise. but here are two attempts for considering: 1. When we were talking about ''zfs list'' scalability and ''zfs destroy <snapshot>'' scalability with thousands of filesystems there was also some discussion of time burned up making thousands of ioctls to accomplish one administrative action. Maybe the ioctl packing/unpacking/copyin overhead is part of the problem. Or maybe there is actually work to be done in each of those thousand ioctl so that combining them will be of no benefit, but making the in-kernel work more efficient would probably be easier if it were coalesced into fewer ioctls rather than many. sometihng that can delete, insert, query multiple rows per operation like SQL, albeit JUST multirow, without a parsed text grammar and without stuff like ''UPDATE'' for supporting multiple writers, might *not* be overkill. 2. How is stuff like snapshot -r implemented atomically? Could a more complicated ioctl interface make -r more elegant rather than less? Maybe the ultimate question isn''t ``should we pass an asston of stuff in one ioctl''''. The questions are more like: * given this will probably not be the last atomic-change needed---snapshot -r needed for consistency, this needed for speed, and who-knows-what next?---which do you find less maintainable: (a) transactional ioctl interface, where you call ioctl BEGINTRANSACTION, ioctl DOSTUFF ioctl DOSTUFF ioctl DOSTUFF, ioctl COMMIT (b) big ioctl interface where you express everything you want done at once in one possibly-complicated large structured ioctl blob and return success/fail on the whole blob the (b) seems to be more in line with this nvlist hairy stuff infesting solaris everywhere so maybe that''s better? * what would you find simpler / better-respecting kernel-userland boundary? (a) passing instructions in some rather ugly interpreted bytecode language, ``0xC0 means RANGE opcode, arguments to follow in 8-bit registers,'''' full of .h macros and lots of structs in unions. kernel executes bytecode to expand full argument list, then dostuff. (your current favored proposal) (b) expand in userland in C or in bash, rather than in kernel crappy-switch()-based-bytecode-interpreter, simply pass the full argument list in the ioctl, copyin, dostuff. I''m reminded of printers that kept advertising increasingly complicated page-description languages because the parallel port and LocalTalk were so slow that publishers wanted to express their pages in as few bits as possible. now, this is not the only reason said thing happened with printers. The printer became a hardware dongle enforcing your ``authorized'''' use of the fonts, which were encrypted in an attempt to bind them to $complicatedlanguage---they never got this far, but if it still existed would probably be enforcing rules like ``you have to pay for the Professional version of the font if you want to print more than two consecutive capital letters. The Home font is only for normal home correspondence so consecutive capitals will be downcased automatically.'''' But the former slow-interface-port reasons are how it was pitched, how the architecture was justified. People bought the argument. Printers grew hard drives to cache fonts and reuseable ``preamble'''' libraries written in $complicatedlanguage, printer CPUs and RAM''s bigger than the computers driving them, and hidden cost of $complicatedlanguage almost exceeded that of the publishing package driving it. Sometimes you could see a page on the screen, but it was too ``complicated'''' to print---solution: buy a bigger printer! WYSIWYG broke since publishing package had to reimplement $complicatedlanguage on screen, badly. merchants of $complicatedlanguage, who are now a hegemonic monopoly, sneakily trickle-sold heavily-DRM''d brokeass versions of $complicaedlanguage for linking with the publishing packages to make WYSIWYG work again like it used to, and these blobs even made it into Solaris. all because of a fucking parallel port. Eventually it was deemed mistaken and the whole damn tower collapsed. $complicatedlanguage in solaris bitrotted. We invented faster interfaces and stopped overcharging for them (USB, Ethernet), and extremely simple page description languages: modern laser printers get a single JBIG image of the page, pre-dithered, and do not even understand what a glyph is. I am not even sure how they decompress the JBIG, if they even have enough video RAM to buffer an entire page or if they uncompress it while the mirror is spinning. awesome, awesome. we keep $complicatedlanguage around as an open-source emulator for those who still need it. OS designers stood up for themselves and invented their own font formats and font storage pools, which they now hegemonically enforce font formats onto the font vendors rather than the other way around. wysiwyg works, no more font DRM, install fonts 1x, every page is printable, and at rated speed, regardless of ``complexity'''', no more hard drives inside printers. hallelujah. so my analagous argument is for (b), not to make the kernel/userland boundary into a new virtual parallel port just because it feels right to squeeze it into a straw. (though i just contradicted reason (1).) At least just to make any tinylanguages invented at interfaces extremelytiny and expressive so the invention of the language makes less code in the overall system even after implementing its interpreter, not to invent boorish EE-inspired ``packing'''' languages based on opcodes that make lots of code for less data. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090327/9716a0ee/attachment.bin>
Carson Gaspar
2009-Mar-27 22:26 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
Darren J Moffat wrote: ...> Agreed, but other than pattern based I can''t at the moment thing of a > nice way to pass all the names over the /dev/zfs ioctl call while > maintaining the fact it is pretty much all fixed size. > > I''m not saying passing a list of names over the ioctl is impossible, > more it just doesn''t feel right to me at the moment - but I''m happy to > be convinced otherwise. That way the patterning part can be left to the > shell.OK, while I have played a developer upon occasion, I''ve never touched kernel code. So feel free to tell me I''m on crack. What is so difficult about passing a pointer to memory as an argument in the ioctl? The kernel certainly has easy access to user-space pages. And parsing a list of text strings is neither complicated, nor dangerous. And as long as you never touch the memory after returning from ioctl(), no memory allocation ownership issues. In short, what am I missing here? This ioctl() limit seems much ado about nothing to me... -- Carson
Jeff Bonwick
2009-Mar-29 22:19 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
I agree with Chris -- I''d much rather do something like: zfs clone snap1 clone1 snap2 clone2 snap3 clone3 ... than introduce a pattern grammar. Supporting multiple snap/clone pairs on the command line allows you to do just about anything atomically. Jeff On Fri, Mar 27, 2009 at 10:46:33AM -0500, Chris Kirby wrote:> On Mar 27, 2009, at 10:33 AM, Darren J Moffat wrote: > > a) that is probably what is wanted most of the time anyway > > b) it is easy to pass from userland to kernel - you pass the > > rules (after some userland sanity checking first) as is. > > > But doesn''t that also exclude the possibility of creating non-pattern > based > clones in a single txg? > > While I think that allowing multiple clones to be created in a single > txg > is perfectly reasonable, we shouldn''t need to artificially restrict the > clone namespace in order to achieve that. > > -Chris > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Alec Muffett
2009-Mar-29 22:41 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
On 29 Mar 2009, at 23:19, Jeff Bonwick wrote:> I agree with Chris -- I''d much rather do something like: > > zfs clone snap1 clone1 snap2 clone2 snap3 clone3 ... > > than introduce a pattern grammar. Supporting multiple snap/clone > pairs > on the command line allows you to do just about anything atomically.Can you elucidate how this will help me take a single snap and clone it 1000 times, quickly and with minimum fuss? -a
Darren J Moffat
2009-Mar-30 08:57 UTC
[zfs-discuss] RFE: creating multiple clones in one zfs(1) call and one txg
Carson Gaspar wrote:> Darren J Moffat wrote: > ... >> Agreed, but other than pattern based I can''t at the moment thing of a >> nice way to pass all the names over the /dev/zfs ioctl call while >> maintaining the fact it is pretty much all fixed size. >> >> I''m not saying passing a list of names over the ioctl is impossible, >> more it just doesn''t feel right to me at the moment - but I''m happy to >> be convinced otherwise. That way the patterning part can be left to >> the shell. > > OK, while I have played a developer upon occasion, I''ve never touched > kernel code. So feel free to tell me I''m on crack. > > What is so difficult about passing a pointer to memory as an argument in > the ioctl? The kernel certainly has easy access to user-space pages. And > parsing a list of text strings is neither complicated, nor dangerous. > And as long as you never touch the memory after returning from ioctl(), > no memory allocation ownership issues. > > In short, what am I missing here? This ioctl() limit seems much ado > about nothing to me...You aren''t missing anything, it could certainly be done. I was just trying to see what was possible without to much change from how the ioctl calls on /dev/zfs work today. I was just being very conservative with respect to change. If Jeff (as he indicated in another email) is happy with a non pattern method and what that means for how this is passed over the ioctl then so am I. -- Darren J Moffat