I''ve been reading btrfs''s on-disk format, and two things caught my eye - attribute((packed)) structures everywhere, often with misaligned fields. This conserves space, but can be harmful to in-memory performance on some archs. - le64''s everywhere. This scales nicely, but wastes space. My home directory is unlikely to have more than 4G objects or 4GB extents (let alone >2 devices). I think the two issues can be improved by separating the on-disk format and the in-memory structure, and by using uleb128 as the on-disk format for numbers. uleb128 is a variable-length format that encodes 7 bits of a number in each byte, using the eighth bit as a stop bit. So, for example struct btrfs_disk_key { __le64 objectid; u8 type; __le64 offset; } __attribute__ ((__packed__)); With 1M objectids, and 1T offsets, this reduces in size from 17 bytes to 10 bytes. Most other structures show similar gains. We can also have more than 256 types if the need arises. There are, off course, disadvantages to switching to uleb128: - need to write encode and decode functions, which is tedious. This can be automated a la xdr. - increased cpu utilization for decoding and encoding - can no longer know the size of the in-memory structures in advance - it''s just wonderful to rewrite the entire disk format so close to freezing it The advantages, IMO, outweigh the disadvantages: - better packing reduces tree depth and therefore seekage, the most important cost on rotating media - the disk format is infinitely growable - in-memory format is more efficient for archs which prefer aligned accesses I''m not volunteering to do this, but please consider this proposal. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2008-10-03 at 14:42 +0300, Avi Kivity wrote:> I''ve been reading btrfs''s on-disk format, and two things caught my eye > > - attribute((packed)) structures everywhere, often with misaligned > fields. This conserves space, but can be harmful to in-memory > performance on some archs.packed is important to make sure that a given field takes exactly the same amount of space everywhere, regardless of compiler optimization or arch.> - le64''s everywhere. This scales nicely, but wastes space. My home > directory is unlikely to have more than 4G objects or 4GB extents (let > alone >2 devices). > > I think the two issues can be improved by separating the on-disk format > and the in-memory structure, and by using uleb128 as the on-disk format > for numbers. uleb128 is a variable-length format that encodes 7 bits of > a number in each byte, using the eighth bit as a stop bit. >This couldn''t be used everywhere, as the array of items headers and keys need to be a fixed sized the current bin_search code. The items can be variable sized but in general they don''t have as many le64s. The place we''re really suffering from poor packing is in the extent allocation tree. But, most of the space being wasted there is in the item headers and keys ;) I''d prefer to address that by consolidating the items stored, especially for the btree block extents. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason wrote:> On Fri, 2008-10-03 at 14:42 +0300, Avi Kivity wrote: > >> I''ve been reading btrfs''s on-disk format, and two things caught my eye >> >> - attribute((packed)) structures everywhere, often with misaligned >> fields. This conserves space, but can be harmful to in-memory >> performance on some archs. >> > > packed is important to make sure that a given field takes exactly the > same amount of space everywhere, regardless of compiler optimization or > arch. >Yes, of course.>> - le64''s everywhere. This scales nicely, but wastes space. My home >> directory is unlikely to have more than 4G objects or 4GB extents (let >> alone >2 devices). >> >> I think the two issues can be improved by separating the on-disk format >> and the in-memory structure, and by using uleb128 as the on-disk format >> for numbers. uleb128 is a variable-length format that encodes 7 bits of >> a number in each byte, using the eighth bit as a stop bit. >> >> > > This couldn''t be used everywhere, as the array of items headers and keys > need to be a fixed sized the current bin_search code. The items can be > variable sized but in general they don''t have as many le64s. > >You''d decode the keys and headers before searching. This of couse negates the idea behind a binary search, unless you cache the decoded nodes. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity wrote:> I''ve been reading btrfs''s on-disk format, and two things caught my eye > > - attribute((packed)) structures everywhere, often with misaligned > fields. This conserves space, but can be harmful to in-memory > performance on some archs.How harmful? Do you have any profiles that can even pick this out of the noise? - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 03 October 2008 05:22, Chris Mason wrote:> On Fri, 2008-10-03 at 14:42 +0300, Avi Kivity wrote: > > I''ve been reading btrfs''s on-disk format, and two things caught my eye > > > > - attribute((packed)) structures everywhere, often with misaligned > > fields. This conserves space, but can be harmful to in-memory > > performance on some archs. > > packed is important to make sure that a given field takes exactly the > same amount of space everywhere, regardless of compiler optimization or > arch.Amen. Are we sure that attribute ((packed)) works the same on all arches? And what about bitfields, does GCC pack them to the same size on all arches in this case or take the easy way out as permitted by the standard? Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 03, 2008 at 05:22:51PM -0700, Daniel Phillips wrote:> On Friday 03 October 2008 05:22, Chris Mason wrote: > > On Fri, 2008-10-03 at 14:42 +0300, Avi Kivity wrote: > > > I''ve been reading btrfs''s on-disk format, and two things caught my eye > > > > > > - attribute((packed)) structures everywhere, often with misaligned > > > fields. This conserves space, but can be harmful to in-memory > > > performance on some archs. > > > > packed is important to make sure that a given field takes exactly the > > same amount of space everywhere, regardless of compiler optimization or > > arch. > > Amen. Are we sure that attribute ((packed)) works the same on all > arches?As long as you use types with strictly defined size, yes.> > And what about bitfields, does GCC pack them to the same size on all > arches in this case or take the easy way out as permitted by the > standard?I''m not sure, I believe a bit field on a strictly defined size will be packed the same everywhere. But bitfields + endian conversion are messy, so I avoid them. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 03 October 2008 17:28, Chris Mason wrote:> On Fri, Oct 03, 2008 at 05:22:51PM -0700, Daniel Phillips wrote: > > And what about bitfields, does GCC pack them to the same size on all > > arches in this case or take the easy way out as permitted by the > > standard? > > I''m not sure, I believe a bit field on a strictly defined size will be > packed the same everywhere. But bitfields + endian conversion are messy, > so I avoid them.Likewise, sigh. On the other hand, packed + endian can''t really be avoided. Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday 03 October 2008 17:28, Chris Mason wrote:> On Fri, Oct 03, 2008 at 05:22:51PM -0700, Daniel Phillips wrote: > > ...Are we sure that attribute ((packed)) works the same on all > > arches? > > As long as you use types with strictly defined size, yes.Just to be a language lawyer about that... int does not have a strictly defined size, yet we define uint32_t as a typedef of int on 32 bit arches do we not? Regards, Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Daniel Phillips wrote:> On Friday 03 October 2008 17:28, Chris Mason wrote: > >> On Fri, Oct 03, 2008 at 05:22:51PM -0700, Daniel Phillips wrote: >> >>> ...Are we sure that attribute ((packed)) works the same on all >>> arches? >>> >> As long as you use types with strictly defined size, yes. >> > > Just to be a language lawyer about that... int does not have a > strictly defined size, yet we define uint32_t as a typedef of int > on 32 bit arches do we not? >What else would you define it to? -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Zach Brown wrote:> Avi Kivity wrote: > >> I''ve been reading btrfs''s on-disk format, and two things caught my eye >> >> - attribute((packed)) structures everywhere, often with misaligned >> fields. This conserves space, but can be harmful to in-memory >> performance on some archs. >> > > How harmful? Do you have any profiles that can even pick this out of > the noise? >On those archs that take faults on unaligned accesses it''s unlikely to be in the noise. But we could (and should) stick a get_unaligned() in the accessor functions. uleb128 encoding is orthogonal to this issue, however. It''s only about getting better density. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity <avi@redhat.com> writes:> Zach Brown wrote: >> Avi Kivity wrote: >> >>> I''ve been reading btrfs''s on-disk format, and two things caught my eye >>> >>> - attribute((packed)) structures everywhere, often with misaligned >>> fields. This conserves space, but can be harmful to in-memory >>> performance on some archs. >>> >> >> How harmful? Do you have any profiles that can even pick this out of >> the noise? >> > > On those archs that take faults on unaligned accesses it''s unlikely to > be in the noise. But we could (and should) stick a get_unaligned() in > the accessor functions.Normally the compiler on such architectures generates special load/store code for known unaligned types that does not actually fault, but just uses multiple instructions. That is slower than a normal memory access, but still much faster than a exception. You only really get the full exception fault penalty when the compiler cannot figure out at compile time that a given variable is unaligned. But with packed it normally assumes that (I think). -Andi -- ak@linux.intel.com -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andi Kleen wrote:>> On those archs that take faults on unaligned accesses it''s unlikely to >> be in the noise. But we could (and should) stick a get_unaligned() in >> the accessor functions. >> > > Normally the compiler on such architectures generates special load/store code > for known unaligned types that does not actually fault, but just uses multiple > instructions. That is slower than a normal memory access, but still much > faster than a exception. > > You only really get the full exception fault penalty when the compiler > cannot figure out at compile time that a given variable is unaligned. > But with packed it normally assumes that (I think)Sounds reasonable. In which case the unaligned access issue I raised is a red herring. So using uleb128 or not is down to whether the improved packing efficiency is worth the increased complexity; it seems unlikely that it is. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html