Tobias Oberstein
2005-Nov-28 18:05 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
Is there / will there be a userland possibility of fast (O(1)) file concatenation that works by FS metadata manipulation under the hood (concatenating files by rechaining pointers into blocks) instead of deep data copies in ZFS? This message posted from opensolaris.org
Joerg Schilling
2005-Nov-28 18:39 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
Tobias Oberstein <tobias.oberstein at gmx.de> wrote:> Is there / will there be a userland possibility of fast (O(1)) file concatenation that works by FS metadata manipulation under the hood (concatenating files by rechaining pointers into blocks) instead of deep data copies in ZFS?Hoe do you like to do this? There is no interface in the OS to do this...... J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Tobias Oberstein
2005-Nov-28 19:34 UTC
[zfs-discuss] Re: Fast file concatenation by FS metadata bending?
I know .. not i.e. in POSIX. Adding new entry points into the kernel is in _general_ bad - I agree-, but does that mean it has to be fixed until the end of planet earth? i.e. ReiserFS4 does it. (define REISER4_FS_SYSCALL and you''ll get reiser4()) This message posted from opensolaris.org
Al Hopper
2005-Nov-28 20:36 UTC
[zfs-discuss] Re: Fast file concatenation by FS metadata bending?
On Mon, 28 Nov 2005, Tobias Oberstein wrote:> I know .. not i.e. in POSIX. Adding new entry points into the kernel is > in _general_ bad - I agree-, but does that mean it has to be fixed until > the end of planet earth? i.e. ReiserFS4 does it. (define > REISER4_FS_SYSCALL and you''ll get reiser4())zfscat zfs cat ?? Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
Tobias Oberstein
2005-Nov-28 20:46 UTC
[zfs-discuss] Re: Re: Fast file concatenation by FS metadata bending?
Uh .. sorry, whats that? Nothing found neither for "zfs cat" nor for "zfscat" in none of google, docs.sun.com, bigadmin or opensolaris sources. ??? This message posted from opensolaris.org
Al Hopper
2005-Nov-28 20:58 UTC
[zfs-discuss] Re: Re: Fast file concatenation by FS metadata bending?
On Mon, 28 Nov 2005, Tobias Oberstein wrote:> Uh .. sorry, whats that? Nothing found neither for "zfs cat" nor for > "zfscat" in none of google, docs.sun.com, bigadmin or opensolaris > sources. ???That was a suggestion for a new interface. Sorry - I should not have been so terse. Al Hopper Logical Approach Inc, Plano, TX. al at logical-approach.com Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
Frank Hofmann - Solaris Sustaining
2005-Nov-29 09:43 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
> > Tobias Oberstein <tobias.oberstein at gmx.de> wrote: > > > > Is there / will there be a userland possibility of fast (O(1)) fileconcatenation that works by FS metadata manipulation under the hood (concatenating files by rechaining pointers into blocks) instead of deep data copies in ZFS?> > Hoe do you like to do this? > There is no interface in the OS to do this...... > > J?rgI don''t see why the egg-laying, milk-giving woolly UNIX syscall pig that is ioctl() couldn''t be genetically engineered to concatenate files as well ... Anyone willing to work on a ufs prototype ? I actually believe this could be done in a ~100 lines of code in ufs ... if you put in some restrictions onto the sizes of files that can be concatenated (if you don''t, you''ll have to copy a possibly huge number of partial block lists). cc:''ing ufs-discuss for good measure. This isn''t ZFS-specific. Best regards, FrankH.
Jeff Bonwick
2005-Nov-29 11:02 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
> Anyone willing to work on a ufs prototype ? I actually believe this > could be done in a ~100 lines of code in ufs ... if you put in some > restrictions onto the sizes of files that can be concatenated (if > you don''t, you''ll have to copy a possibly huge number of partial > block lists). > > cc:''ing ufs-discuss for good measure. This isn''t ZFS-specific.Before we go too far down the road with this, what real-world problem would it solve? There''s no question that the idea is cool, and that for certain workloads (concatenating lots of files whose sizes are nicely aligned) it would rock. But does that particular use case really come up often enough to justify the added complexity? I don''t mean to imply that the answer is no; I''m just asking the question because this is not something I''ve had customers ask for, nor have I ever had the need myself. Jeff
Casper.Dik at Sun.COM
2005-Nov-29 11:34 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
>I don''t see why the egg-laying, milk-giving woolly UNIX syscall pig >that is ioctl() couldn''t be genetically engineered to concatenate >files as well ...No, not ioctl(), fcntl()! You guys just don''t understand the clear distinction between the two interfaces :-)>Anyone willing to work on a ufs prototype ? I actually believe this >could be done in a ~100 lines of code in ufs ... if you put in some >restrictions onto the sizes of files that can be concatenated (if >you don''t, you''ll have to copy a possibly huge number of partial >block lists).If the file is a multiple of the filesystem blocksize, then it should be possible; if not, it''s going to be tough. (But updating all the pointers might not be all that easy). Casper
Darren J Moffat
2005-Nov-29 12:06 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
On Tue, 2005-11-29 at 11:34, Casper.Dik at sun.com wrote:> >I don''t see why the egg-laying, milk-giving woolly UNIX syscall pig > >that is ioctl() couldn''t be genetically engineered to concatenate > >files as well ... > > No, not ioctl(), fcntl()! You guys just don''t understand the clear distinction > between the two interfaces :-)hmnn. fcntl(fd1, F_CAT, fd2); Looks good to me and just for good measure lets have: typedef struct fsplit_s { int fs_nfd; off_t fs_off; } fsplit_t; fcntl(fd1, F_SPLIT, &split_info); :-) Okay I''m being silly but Casper''s point is well taken fcntl(2) seems to be the perfect place to extend for this type of work. -- Darren J Moffat
Frank Hofmann - Solaris Sustaining
2005-Nov-29 12:41 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
> On Tue, 2005-11-29 at 11:34, Casper.Dik at sun.com wrote: > > >I don''t see why the egg-laying, milk-giving woolly UNIX syscall pig > > >that is ioctl() couldn''t be genetically engineered to concatenate > > >files as well ... > > > > No, not ioctl(), fcntl()! You guys just don''t understand the cleardistinction> > between the two interfaces :-)Yes yes yes, you''re right of course. Me getting carried away because ufs does too many things in ioctl() ... :) There''s actually one point for using ioctl() vs. fcntl(): With ioctl(), you don''t need to have the file open. You could open the fs rootdir, and ioctl(). Completely under the hood.> > hmnn. > > fcntl(fd1, F_CAT, fd2);Problem: Does this close fd2 ? If not, what''s fd2 after the op ?> > Looks good to me and just for good measure lets have: > > typedef struct fsplit_s { > int fs_nfd; > off_t fs_off; > } fsplit_t; > > fcntl(fd1, F_SPLIT, &split_info);How to name the "split files" ? How to handle "file exists" ?> > :-) Okay I''m being silly but Casper''s point is well taken fcntl(2) > seems to be the perfect place to extend for this type of work.Well, the devils for these things isn''t the interface it''s the implementation. Performance will vary between "faster than light" and "could''ve used good old cat", depending on file sizes. Thinking a bit about the market for such things: Archiving utilities. Both tar and cpio (and possibly others) just "concatenate" files-to- be-archived into the archive. GNU tar has an option to move files into archives (i.e. delete them as they are archived). If you add on-the-fly filesystem compression (eliminating the need to ''... | bzip2 >...bz2'') then fast file concatenation/split would make an interesting archiving mechanism. The problem of course is that none of these utilities (can) take the blocksize thing into account, but the archive formats are to a degree set in stone. So yes, I could see how archiving could benefit from concat/split/compression support on filesystem level, but I think that no existing archiving utility could really use it due to the blocksize and alignment constraints. Still, having the capability could make a good case for exploring this possibility - "hyperfast archiving". Archive the whole contents of a multi-petabyte-sized filesystem snapshot in-place, one night a week... I''m thinking blue-sky now. But I do see why it could be interesting. FrankH.
Darren J Moffat
2005-Nov-29 13:02 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
On Tue, 2005-11-29 at 12:41, Frank Hofmann - Solaris Sustaining wrote:> > On Tue, 2005-11-29 at 11:34, Casper.Dik at sun.com wrote: > > > >I don''t see why the egg-laying, milk-giving woolly UNIX syscall pig > > > >that is ioctl() couldn''t be genetically engineered to concatenate > > > >files as well ... > > > > > > No, not ioctl(), fcntl()! You guys just don''t understand the clear > distinction > > > between the two interfaces :-) > > Yes yes yes, you''re right of course. Me getting carried away > because ufs does too many things in ioctl() ... :) > > There''s actually one point for using ioctl() vs. fcntl(): > With ioctl(), you don''t need to have the file open. You could > open the fs rootdir, and ioctl(). Completely under the hood. > > > > > hmnn. > > > > fcntl(fd1, F_CAT, fd2); > > Problem: Does this close fd2 ? If not, what''s fd2 after the op ?I assumed it didn''t close fd2 but the seek pointer was at the end. It also assumed that an fd2 = open(path) had been done already.> > > > Looks good to me and just for good measure lets have: > > > > typedef struct fsplit_s { > > int fs_nfd; > > off_t fs_off; > > } fsplit_t; > > > > fcntl(fd1, F_SPLIT, &split_info); > > How to name the "split files" ? How to handle "file exists" ?As above the open of fs_nfd had to already be done, so the file exists is dealt with in the open(2) call.> Still, having the capability could make a good case for exploring this > possibility - "hyperfast archiving". Archive the whole contents of a > multi-petabyte-sized filesystem snapshot in-place, one night a week...but with snapshots and zfs backup do we really need that if it isn''t going to get it into a standard (note the small "s") archive format. Maybe we do I don''t know. -- Darren J Moffat
Bill Sommerfeld
2005-Nov-29 13:11 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
(deleted ufs-discuss CC: since I''m not subscribed and this message would just bounce from there.) On Tue, 2005-11-29 at 12:41 +0000, Frank Hofmann - Solaris Sustaining wrote:> > > No, not ioctl(), fcntl()! You guys just don''t understand the clear > distinction > > > between the two interfaces :-) > > Yes yes yes, you''re right of course. Me getting carried away > because ufs does too many things in ioctl() ... :)To toss another option on the table, we already have sendfile()/sendfilev() -- this already more-or-less has the high-level semantics you want of "move data from here to there efficiently". But I admittedly haven''t looked under the covers to see how the implementation could be adapted to the full O(N^2) combinatoric horror for different underlying src and dst descriptor types; there''s also the matter of needing giving archivers, etc., some way to figure out the optimally efficient alignments.. - Bill
Frank Hofmann - Solaris Sustaining
2005-Nov-29 13:42 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
> From: Bill Sommerfeld <sommerfeld at sun.com>[ ... ]> To toss another option on the table, we already have > sendfile()/sendfilev() -- this already more-or-less has the high-level > semantics you want of "move data from here to there efficiently".Well, it''s "copy data". The point here was in-place-concatenation (with the deletion/removal of the file-to-be-tucked-on implied). sendfile() doesn''t do that. It only manipulates metadata, and should be able to do that "instantly" if possible. Just move - with no (intermediate) data duplication along the way. Although one could now start speculating about using copy-on-write techniques to implement "instant sendfile" within a filesystem via concatenation (i.e. make sendfile == "snapshot file"+"concat file"). Ok, I''ll stop it here. Too much pure speculation. FrankH.
Joerg Schilling
2005-Nov-29 16:22 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
Frank Hofmann - Solaris Sustaining <Frank.Hofmann at Sun.COM> wrote:> I don''t see why the egg-laying, milk-giving woolly UNIX syscall pig > that is ioctl() couldn''t be genetically engineered to concatenate > files as well ... > > Anyone willing to work on a ufs prototype ? I actually believe this > could be done in a ~100 lines of code in ufs ... if you put in some > restrictions onto the sizes of files that can be concatenated (if > you don''t, you''ll have to copy a possibly huge number of partial > block lists).How do you intend to handle files that are not at the end of a list and that do not end on a block boundary? J?rg -- EMail:joerg at schily.isdn.cs.tu-berlin.de (home) J?rg Schilling D-13353 Berlin js at cs.tu-berlin.de (uni) schilling at fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
Boyd Adamson
2005-Nov-29 21:48 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
On 29/11/2005, at 6:34 AM, Tobias Oberstein wrote:> I know .. not i.e. in POSIX. Adding new entry points into the > kernel is in _general_ bad - I agree-, but does that mean it has to > be fixed until the end of planet earth?On 29/11/2005, at 10:34 PM, Casper.Dik at Sun.COM wrote:>> I don''t see why the egg-laying, milk-giving woolly UNIX syscall pig >> that is ioctl() couldn''t be genetically engineered to concatenate >> files as well ... > > No, not ioctl(), fcntl()! You guys just don''t understand the clear > distinction > between the two interfaces :-)Hey, settle down, guys. Pretty soon you''ll be discussing heretical ideas like allowing insertion of data into the middle of a file! I must say that in general, I''m sympathetic to Thomas'' sentiment that it wouldn''t be a bad thing to consider some additional/alternative file access semantics sometime between now and the heat death of the universe. Boyd
Tobias Oberstein
2005-Nov-29 22:26 UTC
[zfs-discuss] Re: Fast file concatenation by FS metadata bending?
The need / idea actually arose in a real world situation: while working in the data warehouse unit of a large german telco I was given the task to reprocess our incoming flat-file feed 200 days back in time .. 80.000 files .. each a few MBs compressed .. filtering out call detail records with certain numbers from a list of 100.000. We had time pressure. It had to be finished in max 2 days. I ended up writing a small single threaded C++ program with in-memory hash of the 100.000 numbers. Customer is running a big Sun machine on TBs of HDS storage. It turned out that the naive approach of opening and reading files sequentially was too slow. Buffered reads .. too many call into the kernel .. too many I/O requests. The fast way was this: concatenating the uncompressed files into 2GB pieces (at that time I didn''t understood 64 Bit compilation and file handling with gcc and had no time for learning), then reading the whole 2GB file into memory with one read request and then doing the processing fully in-memory. As it turned out, the concatenation stage nearly took as long as the actual processing. The latter was still I/O bound .. we could process at 60-90 MBs per second .. I speculate that to be the sustained read bandwidth for a single process on our storage. This message posted from opensolaris.org
Nathan Kroenert
2005-Nov-30 00:49 UTC
[zfs-discuss] Fast file concatenation by FS metadata bending?
Is that the heat death generated by the storage array required to hold the maximum possible size zfs filesystem? :) On Wed, 2005-11-30 at 08:48, Boyd Adamson wrote:> I must say that in general, I''m sympathetic to Thomas'' sentiment that > it wouldn''t be a bad thing to consider some additional/alternative > file access semantics sometime between now and the heat death of the > universe. > > Boyd > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://opensolaris.org/mailman/listinfo/zfs-discuss-- ////////////////////////////////////////////////////////////////// // Nathan Kroenert nathan.kroenert at sun.com // // PTS Engineer Phone: +61 2 9844-5235 // // Sun Services Direct Ext: x57235 // // Level 2, 828 Pacific Hwy Fax: +61 2 9844-5311 // // Gordon 2072 New South Wales Australia // //////////////////////////////////////////////////////////////////