adilger@clusterfs.com
2007-Jan-26 01:14 UTC
[Lustre-devel] [Bug 10651] Implement nanosecond timestamps (for version-based recovery)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10651 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9401|review?(adilger@clusterfs.co|review- Flag|m) | (From update of attachment 9401)>Index: linux-2.6.12/fs/ext3/ialloc.c >==================================================================>--- linux-2.6.12.orig/fs/ext3/ialloc.c >+++ linux-2.6.12/fs/ext3/ialloc.c >@@ -43,7 +43,6 @@ > * the free blocks count in the block. > */ > >- > /* > * Read the inode allocation bitmap for a given block_group, reading > * into the specified slot in the superblock''s bitmap cache.Please don''t change whitespace in the bug unless needed.>@@ -419,6 +418,8 @@ static int find_group_other(struct super > * For other inodes, search forward from the parent directory''s block > * group to find a free inode. > */ >+# define MAX(a, b) (((a) > (b))? (a) : (b))The kernel already has a "max()" macro.>@@ -591,9 +593,17 @@ got: >+ if (EXT3_HAS_RO_COMPAT_FEATURE(sb, EXT3_FEATURE_RO_COMPAT_EXTRA_ISIZE)) { >+ ei->i_extra_isize = MAX(sb->s_min_extra_isize, >+ MAX(sb->s_want_extra_isize, >+ sizeof(struct ext3_inode) - >+ EXT3_GOOD_OLD_INODE_SIZE)); >+ } >+ else >+ ei->i_extra_isize >+ (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) ? >+ sizeof(struct ext3_inode) - EXT3_GOOD_OLD_INODE_SIZE : 0;Hmm, it seems concievable (though it would be incorrect) to have RO_COMPAT_EXTRA_ISIZE set on a filesystem with inodes of EXT3_GOOD_OLD_INODE_SIZE, and in this case the above would corrupt the filesystem. How about in ext3_fill_super(): /* determine the minimum size of new large inodes, if present */ if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) { EXT3_SB(sb)->s_want_extra_isize = sizeof(struct ext3_inode) - EXT3_GOOD_OLD_INODE_SIZE; if (EXT3_HAS_RO_COMPAT_FEATURE(sb, EXT3_FEATURE_RO_COMPAT_EXTRA_ISIZE)) if (EXT3_SB(sb)->s_want_extra_isize < le32_to_cpu(EXT3_SB(sb)->s_es->s_want_extra_isize)) le32_to_cpu(EXT3_SB(sb)->s_es->s_want_extra_isize); if (EXT3_SB(sb)->s_want_extra_isize < le32_to_cpu(EXT3_SB(sb)->s_es->s_min_extra_isize)) le32_to_cpu(EXT3_SB(sb)->s_es->s_min_extra_isize); } and here in ext3_new_inode() we only need: ei->i_extra_isize = EXT3_SB(sb)->s_want_extra_isize; We _might_ want to change this so that it will take s_want_extra_isize from the on-disk superblock each time, so that if tune2fs is used while the filesystem is in use then it will pick up this new value immediately. We might also want to verify that s_{want,min}_extra_isize is small enough to fit in EXT3_INODE_SIZE() or it may cause data corruption.>@@ -2450,10 +2450,11 @@ void ext3_read_inode(struct inode * inod >+ EXT3_INODE_GET_XTIME(i_ctime, i_ctime_extra, inode, raw_inode); >+ EXT3_INODE_GET_XTIME(i_mtime, i_mtime_extra, inode, raw_inode); >+ EXT3_INODE_GET_XTIME(i_atime, i_atime_extra, inode, raw_inode); >+ EXT3_INODE_GET_XTIME(i_crtime, i_crtime_extra, inode, raw_inode);Tabs instead of spaces in kernel patches, but more on this line below...>@@ -246,7 +246,7 @@ struct ext3_inode { > __le16 i_uid; /* Low 16 bits of Owner Uid */ > __le32 i_size; /* Size in bytes */ > __le32 i_atime; /* Access time */ >- __le32 i_ctime; /* Creation time */ >+ __le32 i_ctime; /* Change time */Actually, looking in the current e2fsprogs-1.39-cfs2, the comment reads "Inode Change time".>@@ -295,10 +295,52 @@ struct ext3_inode { >+ __le32 i_crtime; /* Creation time */ >+ __le32 i_crtime_extra; /* extra Creation time (nsec << 2 | epoch) */The comments here read "File Creation time" and "extra File Creation time".>+#define EXT3_FEATURE_RO_COMPAT_NS_TIMESTAMP 0x0008 >+#define EXT3_FEATURE_RO_COMPAT_EXTRA_ISIZE 0x0010The RO_COMPAT_NS_TIMESTAMP feature was replaced with RO_COMPAT_EXTRA_ISIZE, and the latter has the value 0x0040.>--- linux-2.6.12.orig/include/linux/fs.h >+++ linux-2.6.12/include/linux/fs.h >@@ -440,6 +440,7 @@ struct inode { >+ struct timespec i_crtime;This definitely shouldn''t be part of the generic VFS inode. Instead, it should be initialized in ext3_new_inode() and saved in the ext3_inode_info like i_dtime is, read from disk in ext3_read_inode(), and then written to disk in ext3_do_update_inode().>@@ -804,6 +805,8 @@ struct super_block { > /* Granuality of c/m/atime in ns. > Cannot be worse than a second */ > u32 s_time_gran; >+ u16 s_min_extra_isize; /* All inodes have at least # bytes */ >+ u16 s_want_extra_isize; /* New inodes should reserve # bytes */ > };Similarly, these fields are ext3-specific and should not go into the generic superblock. Instead they are in the ext3_super_block (which is read from disk) as can be seen in e2fsprogs-1.39-cfs2/lib/ext2fs/ext2_fs.h. That also means they need to be byte-swapped on access, but it likely makes sense to store an s_want_extra_isize in ext3_sb_info (the in-memory filesystem data) so we don''t have to do the max() and byte-swap operations for each new inode.
kalpak@clusterfs.com
2007-Jan-29 05:51 UTC
[Lustre-devel] [Bug 10651] Implement nanosecond timestamps (for version-based recovery)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10651 (In reply to comment #18)> >@@ -591,9 +593,17 @@ got: > >+ if (EXT3_HAS_RO_COMPAT_FEATURE(sb, EXT3_FEATURE_RO_COMPAT_EXTRA_ISIZE)) { > >+ ei->i_extra_isize = MAX(sb->s_min_extra_isize, > >+ MAX(sb->s_want_extra_isize, > >+ sizeof(struct ext3_inode) - > >+ EXT3_GOOD_OLD_INODE_SIZE)); > >+ } > >+ else > >+ ei->i_extra_isize > >+ (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) ? > >+ sizeof(struct ext3_inode) - EXT3_GOOD_OLD_INODE_SIZE : 0; > > Hmm, it seems concievable (though it would be incorrect) to have > RO_COMPAT_EXTRA_ISIZE set on a filesystem with inodes of > EXT3_GOOD_OLD_INODE_SIZE, and in this case the above would corrupt the > filesystem. How about in ext3_fill_super(): > > /* determine the minimum size of new large inodes, if present */ > if (EXT3_INODE_SIZE(inode->i_sb) > EXT3_GOOD_OLD_INODE_SIZE) { > EXT3_SB(sb)->s_want_extra_isize = sizeof(struct ext3_inode) - > EXT3_GOOD_OLD_INODE_SIZE; > if (EXT3_HAS_RO_COMPAT_FEATURE(sb, > EXT3_FEATURE_RO_COMPAT_EXTRA_ISIZE)) > if (EXT3_SB(sb)->s_want_extra_isize < > > le32_to_cpu(EXT3_SB(sb)->s_es->s_want_extra_isize)) > > le32_to_cpu(EXT3_SB(sb)->s_es->s_want_extra_isize); > if (EXT3_SB(sb)->s_want_extra_isize < > > le32_to_cpu(EXT3_SB(sb)->s_es->s_min_extra_isize)) > > le32_to_cpu(EXT3_SB(sb)->s_es->s_min_extra_isize); > } > > and here in ext3_new_inode() we only need: > > ei->i_extra_isize = EXT3_SB(sb)->s_want_extra_isize; > > We _might_ want to change this so that it will take s_want_extra_isize from the > on-disk superblock each time, so that if tune2fs is used while the filesystem > is in use then it will pick up this new value immediately. We might also want > to verify that s_{want,min}_extra_isize is small enough to fit in > EXT3_INODE_SIZE() or it may cause data corruption. >Well, we are using s_want_extra_isize from ext3_sb_info so that we don''t have to do the max() and byte-swapping operations as you have mentioned below. And what if the new value that is entered is smaller than earlier value(maybe by mistake)? During verification, I think checking this condition should be enough, right? if (EXT3_GOOD_OLD_INODE_SIZE + EXT3_SB(sb)->s_want_extra_isize < EXT3_INODE_SIZE(inode->i_sb)).
kalpak@clusterfs.com
2007-Jan-30 00:46 UTC
[Lustre-devel] [Bug 10651] Implement nanosecond timestamps (for version-based recovery)
Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=10651 What |Removed |Added ---------------------------------------------------------------------------- Attachment #9401 is|0 |1 obsolete| | Created an attachment (id=9449) Please don''t reply to lustre-devel. Instead, comment in Bugzilla by using the following link: --> (https://bugzilla.lustre.org/attachment.cgi?id=9449&action=view) Updated nanosecond timestamp patch Most of the suggested changes have been made. s_{want, min}_extra_isize is validated during mount time and saved in ext3_sb_info to avoid further checks and complexity for each new inode. In ext3_new_inode(), we can use on-disk value after changes have been made to tunefs.