Hi Everyone, Is it possible to use send/recv to change the recordsize, or does each file need to be individually recreated/copied within a given dataset? Is there a way to check the recordsize of a given file, assuming that the filesystems recordsize was changed at some point? Also - Am I right in thinking that if a 4K write is made to a filesystem block with a recordsize of 8K, then the original block is read (assuming it''s not in the ARC), before the new block is written elsewhere (the "copy", from copy on write)? This would be one of the reasons that aligning application IO size and filesystem record sizes is a good thing, because where such IO is aligned, you remove the need for that original read? Thanks, Tristan
On Mon, 18 Jan 2010, Tristan Ball wrote:> Is there a way to check the recordsize of a given file, assuming that the > filesystems recordsize was changed at some point?This would be problematic since a file may consist of different size records (at least I think so). If the record size was changed after the file was already created, then new/updated parts would use the new record size.> Also - Am I right in thinking that if a 4K write is made to a filesystem > block with a recordsize of 8K, then the original block is read (assuming it''s > not in the ARC), before the new block is written elsewhere (the "copy", from > copy on write)? This would be one of the reasons that aligning application IO > size and filesystem record sizes is a good thing, because where such IO is > aligned, you remove the need for that original read?This is exactly right. There is a very large performance hit if the block to be updated is no longer in the ARC and the update does not perfectly align to the origin and size of the underlying block. Applications which are aware of this (and which expect the total working set to be much larger than available cache) could choose to read and write more data than absolutely required so that zfs does not need to read an existing block in order to update it. This also explains why the l2arc can be so valuable, if the data then fits in the ARC. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Jan 17, 2010, at 11:59 AM, Tristan Ball wrote:> Hi Everyone, > > Is it possible to use send/recv to change the recordsize, or does each file need to be individually recreated/copied within a given dataset?Yes. The former does the latter.> Is there a way to check the recordsize of a given file, assuming that the filesystems recordsize was changed at some point?I don''t know of an easy way to do this. But it is also rarely needed. For most file system use it is best to let the recordsize scale to large values. It is only for fixed record length workloads (eg. databases) that recordsize matching can significantly improve efficiency.> Also - Am I right in thinking that if a 4K write is made to a filesystem block with a recordsize of 8K, then the original block is read (assuming it''s not in the ARC), before the new block is written elsewhere (the "copy", from copy on write)? This would be one of the reasons that aligning application IO size and filesystem record sizes is a good thing, because where such IO is aligned, you remove the need for that original read?No. Think of recordsize as a limit. As long as the recordsize >= 4 KB, a 4KB file will only use one, 4KB record. -- richard
Richard Elling wrote:> Tristan Ball wrote: >> Also - Am I right in thinking that if a 4K write is made to a >> filesystem block with a recordsize of 8K, then the original block >> is read (assuming it''s not in the ARC), before the new block is >> written elsewhere (the "copy", from copy on write)? This would be >> one of the reasons that aligning application IO size and filesystem >> record sizes is a good thing, because where such IO is aligned, you >> remove the need for that original read? > > No. Think of recordsize as a limit. As long as the recordsize >= 4 > KB, a 4KB file will only use one, 4KB record. > -- richardI didn''t read Tristan''s question as refering to a 4KB file. If a file with an 8KB recordsize already has one or more 8KB records, then a single 4KB non-synchronous write to a record not already in the ARC will require a read as part of the copy on write operation. Howver, I''m assuming that mutiple synchronous sequential writes to, say, an Oracle redo log, which are first committed to the ZIL, will generally coalesce before the file''s records are COW-ed, thus avoiding reads for all but the last record (assuming it''s not aligned or cached). But I''m always open to having my assumptions verified :) Phil
On Mon, Jan 18, 2010 at 10:22 AM, Richard Elling <richard.elling at gmail.com> wrote:> On Jan 17, 2010, at 11:59 AM, Tristan Ball wrote: >> Is there a way to check the recordsize of a given file, assuming that the filesystems recordsize was changed at some point? > > I don''t know of an easy way to do this.can''t you use zdb? something like zdb -dddd pool_name/fs_name then read the output of dblk. You''d have to manually look for that file in the output though. -- Fajar
On 17/01/2010 20:34, Bob Friesenhahn wrote:> On Mon, 18 Jan 2010, Tristan Ball wrote: > >> Is there a way to check the recordsize of a given file, assuming that >> the filesystems recordsize was changed at some point? > > This would be problematic since a file may consist of different size > records (at least I think so). If the record size was changed after > the file was already created, then new/updated parts would use the new > record size.A single file can only have one recordsize (except for a tail block which might be shorter). So if you created a large file with the default recordsize of 128K and you change later a filesystem recordsize to lets say 8k it will affect only new files being created - the file will stay at using 128K. However if you would copy the file then its copy would use a new recordsize of 8k. -- Robert Milkowski http://milek.blogspot.com