Tao Ma
2008-Nov-27 22:58 UTC
[Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group.
In ocfs2, the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. The good thing is that in ocfs2_alloc_context, there is a field named ac_last_group which will record the last group we allocate from. So we can only pass the right group to it and the following allocation will do as what we expect. So we add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. Signed-off-by: Tao Ma <tao.ma at oracle.com> --- fs/ocfs2/inode.c | 2 ++ fs/ocfs2/inode.h | 4 ++++ fs/ocfs2/namei.c | 4 ++-- fs/ocfs2/suballoc.c | 21 +++++++++++++++++++++ fs/ocfs2/suballoc.h | 2 ++ 5 files changed, 31 insertions(+), 2 deletions(-) diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c index 288512c..c3463c1 100644 --- a/fs/ocfs2/inode.c +++ b/fs/ocfs2/inode.c @@ -350,6 +350,8 @@ void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe, ocfs2_set_inode_flags(inode); + OCFS2_I(inode)->ip_last_used_slot = 0; + OCFS2_I(inode)->ip_last_used_group = 0; mlog_exit_void(); } diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h index eb3c302..e1978ac 100644 --- a/fs/ocfs2/inode.h +++ b/fs/ocfs2/inode.h @@ -72,6 +72,10 @@ struct ocfs2_inode_info struct inode vfs_inode; struct jbd2_inode ip_jinode; + + /* Only valid if the inode is the dir. */ + u32 ip_last_used_slot; + u64 ip_last_used_group; }; /* diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c index 02c8026..a601dd5 100644 --- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -469,8 +469,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb, *new_fe_bh = NULL; - status = ocfs2_claim_new_inode(osb, handle, inode_ac, &suballoc_bit, - &fe_blkno); + status = ocfs2_claim_new_inode(osb, handle, dir, parent_fe_bh, + inode_ac, &suballoc_bit, &fe_blkno); if (status < 0) { mlog_errno(status); goto leave; diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index 226fe21..f75782f 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -1587,6 +1587,8 @@ bail: int ocfs2_claim_new_inode(struct ocfs2_super *osb, handle_t *handle, + struct inode *dir, + struct buffer_head *parent_fe_bh, struct ocfs2_alloc_context *ac, u16 *suballoc_bit, u64 *fe_blkno) @@ -1594,6 +1596,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb, int status; unsigned int num_bits; u64 bg_blkno; + struct ocfs2_dinode *parent_fe + (struct ocfs2_dinode *)parent_fe_bh->b_data; mlog_entry_void(); @@ -1602,6 +1606,21 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb, BUG_ON(ac->ac_bits_wanted != 1); BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE); + /* + * Try to allocate inodes from some specific group. + * + * If the parent dir has recorded the last group used in allocation, + * cool, use it. Otherwise if we try to allocate new inode from the + * same slot the parent dir belongs to, use the same chunk. + */ + if (OCFS2_I(dir)->ip_last_used_group && + OCFS2_I(dir)->ip_last_used_slot == ac->ac_alloc_slot) + ac->ac_last_group = OCFS2_I(dir)->ip_last_used_group; + else if (le16_to_cpu(parent_fe->i_suballoc_slot) =+ ac->ac_alloc_slot) + ac->ac_last_group = le64_to_cpu(parent_fe->i_blkno) - + le16_to_cpu(parent_fe->i_suballoc_bit); + status = ocfs2_claim_suballoc_bits(osb, ac, handle, @@ -1620,6 +1639,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb, *fe_blkno = bg_blkno + (u64) (*suballoc_bit); ac->ac_bits_given++; + OCFS2_I(dir)->ip_last_used_group = ac->ac_last_group; + OCFS2_I(dir)->ip_last_used_slot = ac->ac_alloc_slot; status = 0; bail: mlog_exit(status); diff --git a/fs/ocfs2/suballoc.h b/fs/ocfs2/suballoc.h index e3c13c7..ea85a4c 100644 --- a/fs/ocfs2/suballoc.h +++ b/fs/ocfs2/suballoc.h @@ -88,6 +88,8 @@ int ocfs2_claim_metadata(struct ocfs2_super *osb, u64 *blkno_start); int ocfs2_claim_new_inode(struct ocfs2_super *osb, handle_t *handle, + struct inode *dir, + struct buffer_head *parent_fe_bh, struct ocfs2_alloc_context *ac, u16 *suballoc_bit, u64 *fe_blkno); -- 1.5.4.GIT
Tao Ma
2008-Nov-27 22:58 UTC
[Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap.
Inode groups used to be allocated from local alloc file, but since we want all inodes to be contiguous enough, we will try to allocate them directly from global_bitmap. Add a new flag named "ALLOC_NEW_GROUP_FROM_GLOBAL", if we pass this flag into allocation, don't try local alloc. Signed-off-by: Tao Ma <tao.ma at oracle.com> --- fs/ocfs2/suballoc.c | 33 +++++++++++++++++++++------------ 1 files changed, 21 insertions(+), 12 deletions(-) diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index f75782f..98e32b2 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -47,7 +47,8 @@ #include "buffer_head_io.h" #define NOT_ALLOC_NEW_GROUP 0 -#define ALLOC_NEW_GROUP 1 +#define ALLOC_NEW_GROUP 0x1 +#define ALLOC_NEW_GROUP_FROM_GLOBAL 0x2 #define OCFS2_MAX_INODES_TO_STEAL 1024 @@ -63,7 +64,8 @@ static int ocfs2_block_group_fill(handle_t *handle, static int ocfs2_block_group_alloc(struct ocfs2_super *osb, struct inode *alloc_inode, struct buffer_head *bh, - u64 max_block); + u64 max_block, + int flags); static int ocfs2_cluster_group_search(struct inode *inode, struct buffer_head *group_bh, @@ -115,7 +117,8 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode, u16 *bg_bit_off); static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb, u32 bits_wanted, u64 max_block, - struct ocfs2_alloc_context **ac); + struct ocfs2_alloc_context **ac, + int flags); void ocfs2_free_ac_resource(struct ocfs2_alloc_context *ac) { @@ -370,7 +373,8 @@ static inline u16 ocfs2_find_smallest_chain(struct ocfs2_chain_list *cl) static int ocfs2_block_group_alloc(struct ocfs2_super *osb, struct inode *alloc_inode, struct buffer_head *bh, - u64 max_block) + u64 max_block, + int flags) { int status, credits; struct ocfs2_dinode *fe = (struct ocfs2_dinode *) bh->b_data; @@ -390,7 +394,7 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb, cl = &fe->id2.i_chain; status = ocfs2_reserve_clusters_with_limit(osb, le16_to_cpu(cl->cl_cpg), - max_block, &ac); + max_block, &ac, flags); if (status < 0) { if (status != -ENOSPC) mlog_errno(status); @@ -498,7 +502,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb, struct ocfs2_alloc_context *ac, int type, u32 slot, - int alloc_new_group) + int flags) { int status; u32 bits_wanted = ac->ac_bits_wanted; @@ -554,7 +558,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb, goto bail; } - if (alloc_new_group != ALLOC_NEW_GROUP) { + if (!(flags & ALLOC_NEW_GROUP)) { mlog(0, "Alloc File %u Full: wanted=%u, free_bits=%u, " "and we don't alloc a new group for it.\n", slot, bits_wanted, free_bits); @@ -563,7 +567,7 @@ static int ocfs2_reserve_suballoc_bits(struct ocfs2_super *osb, } status = ocfs2_block_group_alloc(osb, alloc_inode, bh, - ac->ac_max_block); + ac->ac_max_block, flags); if (status < 0) { if (status != -ENOSPC) mlog_errno(status); @@ -707,7 +711,9 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb, atomic_set(&osb->s_num_inodes_stolen, 0); status = ocfs2_reserve_suballoc_bits(osb, *ac, INODE_ALLOC_SYSTEM_INODE, - osb->slot_num, ALLOC_NEW_GROUP); + osb->slot_num, + ALLOC_NEW_GROUP | + ALLOC_NEW_GROUP_FROM_GLOBAL); if (status >= 0) { status = 0; @@ -773,7 +779,8 @@ bail: * things a bit. */ static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb, u32 bits_wanted, u64 max_block, - struct ocfs2_alloc_context **ac) + struct ocfs2_alloc_context **ac, + int flags) { int status; @@ -790,7 +797,8 @@ static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb, (*ac)->ac_max_block = max_block; status = -ENOSPC; - if (ocfs2_alloc_should_use_local(osb, bits_wanted)) { + if (!(flags & ALLOC_NEW_GROUP_FROM_GLOBAL) && + ocfs2_alloc_should_use_local(osb, bits_wanted)) { status = ocfs2_reserve_local_alloc_bits(osb, bits_wanted, *ac); @@ -828,7 +836,8 @@ int ocfs2_reserve_clusters(struct ocfs2_super *osb, u32 bits_wanted, struct ocfs2_alloc_context **ac) { - return ocfs2_reserve_clusters_with_limit(osb, bits_wanted, 0, ac); + return ocfs2_reserve_clusters_with_limit(osb, bits_wanted, 0, ac, + ALLOC_NEW_GROUP); } /* -- 1.5.4.GIT
Tao Ma
2008-Nov-27 22:58 UTC
[Ocfs2-devel] [PATCH 3/3] ocfs2: Optimize inode group allocation by recording last used group.
In ocfs2, the block group search looks for the "emptiest" group to allocate from. So if the allocator has many equally(or almost equally) empty groups, new block group will tend to get spread out amongst them. We add osb_last_alloc_group in ocfs2_super to record the last used allocation group so that next time we can allocate inode group directly from it. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. Signed-off-by: Tao Ma <tao.ma at oracle.com> --- fs/ocfs2/ocfs2.h | 3 +++ fs/ocfs2/suballoc.c | 18 +++++++++++++++++- fs/ocfs2/super.c | 1 + 3 files changed, 21 insertions(+), 1 deletions(-) diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index 5c77798..a99b53e 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -335,6 +335,9 @@ struct ocfs2_super struct ocfs2_node_map osb_recovering_orphan_dirs; unsigned int *osb_orphan_wipes; wait_queue_head_t osb_wipe_event; + + /* the group we used to allocate inodes. */ + u64 osb_last_alloc_group; }; #define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info) diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c index 98e32b2..3ab1135 100644 --- a/fs/ocfs2/suballoc.c +++ b/fs/ocfs2/suballoc.c @@ -49,6 +49,7 @@ #define NOT_ALLOC_NEW_GROUP 0 #define ALLOC_NEW_GROUP 0x1 #define ALLOC_NEW_GROUP_FROM_GLOBAL 0x2 +#define ALLOC_USE_RECORD_GROUP 0x4 #define OCFS2_MAX_INODES_TO_STEAL 1024 @@ -411,6 +412,11 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb, goto bail; } + if (flags & ALLOC_USE_RECORD_GROUP && osb->osb_last_alloc_group) { + mlog(0, "use old allocation group %llu\n", + (unsigned long long)osb->osb_last_alloc_group); + ac->ac_last_group = osb->osb_last_alloc_group; + } status = ocfs2_claim_clusters(osb, handle, ac, @@ -485,6 +491,15 @@ static int ocfs2_block_group_alloc(struct ocfs2_super *osb, alloc_inode->i_blocks = ocfs2_inode_sector_count(alloc_inode); status = 0; + + if (flags & ALLOC_USE_RECORD_GROUP) { + spin_lock(&osb->osb_lock); + osb->osb_last_alloc_group = ac->ac_last_group; + spin_unlock(&osb->osb_lock); + mlog(0, "after reservation, new allocation group is " + "%llu\n", (unsigned long long)osb->osb_last_alloc_group); + } + bail: if (handle) ocfs2_commit_trans(osb, handle); @@ -713,7 +728,8 @@ int ocfs2_reserve_new_inode(struct ocfs2_super *osb, INODE_ALLOC_SYSTEM_INODE, osb->slot_num, ALLOC_NEW_GROUP | - ALLOC_NEW_GROUP_FROM_GLOBAL); + ALLOC_NEW_GROUP_FROM_GLOBAL | + ALLOC_USE_RECORD_GROUP); if (status >= 0) { status = 0; diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index bc43138..3593759 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -843,6 +843,7 @@ static int ocfs2_fill_super(struct super_block *sb, void *data, int silent) osb->osb_commit_interval = parsed_options.commit_interval; osb->local_alloc_default_bits = ocfs2_megabytes_to_clusters(sb, parsed_options.localalloc_opt); osb->local_alloc_bits = osb->local_alloc_default_bits; + osb->osb_last_alloc_group = 0; if (osb->s_mount_opt & OCFS2_MOUNT_USRQUOTA && !OCFS2_HAS_RO_COMPAT_FEATURE(sb, OCFS2_FEATURE_RO_COMPAT_USRQUOTA)) { -- 1.5.4.GIT
Tao Ma
2008-Nov-28 06:49 UTC
[Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement
Hi all, In ocfs2, when we create a fresh file system and create inodes in it, they are contiguous and good for readdir+stat. While if we delete all the inodes and created again, the new inodes will get spread out and that isn't what we need. The core problem here is that the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.(I have modified it a little, Mark, if you are interested, please look at it. They are underlined.) So this patch set try to fix this problem. patch 1: Optimize inode allocation by remembering last group. We add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. patch 2: let the Inode group allocs use the global bitmap directly. patch 3: we add osb_last_alloc_group in ocfs2_super to record the last used allocation group so that we can make inode groups contiguous enough. The patch set are based on Mark's merge_window, so they are easy to push. Regards, Tao
Mark Fasheh
2009-Jan-06 19:02 UTC
[Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group.
On Fri, Nov 28, 2008 at 06:58:43AM +0800, Tao Ma wrote:> In ocfs2, the inode block search looks for the "emptiest" inode > group to allocate from. So if an inode alloc file has many equally > (or almost equally) empty groups, new inodes will tend to get > spread out amongst them, which in turn can put them all over the > disk. This is undesirable because directory operations on conceptually > "nearby" inodes force a large number of seeks. > > The good thing is that in ocfs2_alloc_context, there is a field named > ac_last_group which will record the last group we allocate from. So > we can only pass the right group to it and the following allocation > will do as what we expect. > > So we add ip_last_used_group in core directory inodes which records > the last used allocation group. Another field named ip_last_used_slot > is also added in case inode stealing happens. When claiming new inode, > we passed in directory's inode so that the allocation can use this > information. > For more details, please see > http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. > > Signed-off-by: Tao Ma <tao.ma at oracle.com> > --- > fs/ocfs2/inode.c | 2 ++ > fs/ocfs2/inode.h | 4 ++++ > fs/ocfs2/namei.c | 4 ++-- > fs/ocfs2/suballoc.c | 21 +++++++++++++++++++++ > fs/ocfs2/suballoc.h | 2 ++ > 5 files changed, 31 insertions(+), 2 deletions(-) > > diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c > index 288512c..c3463c1 100644 > --- a/fs/ocfs2/inode.c > +++ b/fs/ocfs2/inode.c > @@ -350,6 +350,8 @@ void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe, > > ocfs2_set_inode_flags(inode); > > + OCFS2_I(inode)->ip_last_used_slot = 0; > + OCFS2_I(inode)->ip_last_used_group = 0; > mlog_exit_void(); > } > > diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h > index eb3c302..e1978ac 100644 > --- a/fs/ocfs2/inode.h > +++ b/fs/ocfs2/inode.h > @@ -72,6 +72,10 @@ struct ocfs2_inode_info > > struct inode vfs_inode; > struct jbd2_inode ip_jinode; > + > + /* Only valid if the inode is the dir. */ > + u32 ip_last_used_slot; > + u64 ip_last_used_group; > }; > > /* > diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c > index 02c8026..a601dd5 100644 > --- a/fs/ocfs2/namei.c > +++ b/fs/ocfs2/namei.c > @@ -469,8 +469,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb, > > *new_fe_bh = NULL; > > - status = ocfs2_claim_new_inode(osb, handle, inode_ac, &suballoc_bit, > - &fe_blkno); > + status = ocfs2_claim_new_inode(osb, handle, dir, parent_fe_bh, > + inode_ac, &suballoc_bit, &fe_blkno); > if (status < 0) { > mlog_errno(status); > goto leave; > diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c > index 226fe21..f75782f 100644 > --- a/fs/ocfs2/suballoc.c > +++ b/fs/ocfs2/suballoc.c > @@ -1587,6 +1587,8 @@ bail: > > int ocfs2_claim_new_inode(struct ocfs2_super *osb, > handle_t *handle, > + struct inode *dir, > + struct buffer_head *parent_fe_bh, > struct ocfs2_alloc_context *ac, > u16 *suballoc_bit, > u64 *fe_blkno) > @@ -1594,6 +1596,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb, > int status; > unsigned int num_bits; > u64 bg_blkno; > + struct ocfs2_dinode *parent_fe > + (struct ocfs2_dinode *)parent_fe_bh->b_data; > > mlog_entry_void(); > > @@ -1602,6 +1606,21 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb, > BUG_ON(ac->ac_bits_wanted != 1); > BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE); > > + /* > + * Try to allocate inodes from some specific group. > + * > + * If the parent dir has recorded the last group used in allocation, > + * cool, use it. Otherwise if we try to allocate new inode from the > + * same slot the parent dir belongs to, use the same chunk. > + */ > + if (OCFS2_I(dir)->ip_last_used_group && > + OCFS2_I(dir)->ip_last_used_slot == ac->ac_alloc_slot) > + ac->ac_last_group = OCFS2_I(dir)->ip_last_used_group; > + else if (le16_to_cpu(parent_fe->i_suballoc_slot) => + ac->ac_alloc_slot) > + ac->ac_last_group = le64_to_cpu(parent_fe->i_blkno) - > + le16_to_cpu(parent_fe->i_suballoc_bit);You should use ocfs2_which_suballoc_group() here, instead of open coding the math to get ac_last_group. Also, would it be possible for us to put this block in it's own function so that it's easier to play with the logic in the future? One last thing - can you add to the comment: * * We are very careful here to avoid the mistake of setting ac_last_group to * a group descriptor from a different (unlocked) slot. */ --Mark -- Mark Fasheh
Mark Fasheh
2009-Jan-06 19:47 UTC
[Ocfs2-devel] [PATCH 2/3] ocfs2: Allocate inode groups from global_bitmap.
On Fri, Nov 28, 2008 at 06:58:44AM +0800, Tao Ma wrote:> Inode groups used to be allocated from local alloc file, but since > we want all inodes to be contiguous enough, we will try to allocate > them directly from global_bitmap. > > Add a new flag named "ALLOC_NEW_GROUP_FROM_GLOBAL", if we pass this > flag into allocation, don't try local alloc. > > Signed-off-by: Tao Ma <tao.ma at oracle.com> > --- > fs/ocfs2/suballoc.c | 33 +++++++++++++++++++++------------ > 1 files changed, 21 insertions(+), 12 deletions(-) > > diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c > index f75782f..98e32b2 100644 > --- a/fs/ocfs2/suballoc.c > +++ b/fs/ocfs2/suballoc.c > @@ -47,7 +47,8 @@ > #include "buffer_head_io.h" > > #define NOT_ALLOC_NEW_GROUP 0 > -#define ALLOC_NEW_GROUP 1 > +#define ALLOC_NEW_GROUP 0x1 > +#define ALLOC_NEW_GROUP_FROM_GLOBAL 0x2How about "ALLOC_GROUPS_FROM_GLOBAL"? That makes it clear we're only modifying *where* new groups come from.> #define OCFS2_MAX_INODES_TO_STEAL 1024 > > @@ -63,7 +64,8 @@ static int ocfs2_block_group_fill(handle_t *handle, > static int ocfs2_block_group_alloc(struct ocfs2_super *osb, > struct inode *alloc_inode, > struct buffer_head *bh, > - u64 max_block); > + u64 max_block, > + int flags); > > static int ocfs2_cluster_group_search(struct inode *inode, > struct buffer_head *group_bh, > @@ -115,7 +117,8 @@ static inline void ocfs2_block_to_cluster_group(struct inode *inode, > u16 *bg_bit_off); > static int ocfs2_reserve_clusters_with_limit(struct ocfs2_super *osb, > u32 bits_wanted, u64 max_block, > - struct ocfs2_alloc_context **ac); > + struct ocfs2_alloc_context **ac, > + int flags);Can you put the 'int flags' before the **ac return value here please? I'm saddened that we're adding a flags field to so many functions, but after looking through the code, I can't really come up with a better solution. --Mark -- Mark Fasheh
Mark Fasheh
2009-Jan-06 21:41 UTC
[Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement
On Fri, Nov 28, 2008 at 02:49:19PM +0800, Tao Ma wrote:> Hi all, > In ocfs2, when we create a fresh file system and create inodes in > it, they are contiguous and good for readdir+stat. While if we delete all > the inodes and created again, the new inodes will get spread out and > that isn't what we need. The core problem here is that the inode block > search looks for the "emptiest" inode group to allocate from. So if an > inode alloc file has many equally (or almost equally) empty groups, new > inodes will tend to get spread out amongst them, which in turn can put > them all over the disk. This is undesirable because directory operations > on conceptually "nearby" inodes force a large number of seeks. For more > details, please see > http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.(I > have modified it a little, Mark, if you are interested, please look at > it. They are underlined.)Your edits look fine. Thanks for updating the design doc.> So this patch set try to fix this problem. > patch 1: Optimize inode allocation by remembering last group. > We add ip_last_used_group in core directory inodes which records > the last used allocation group. Another field named ip_last_used_slot > is also added in case inode stealing happens. When claiming new inode, > we passed in directory's inode so that the allocation can use this > information. > > patch 2: let the Inode group allocs use the global bitmap directly. > > patch 3: we add osb_last_alloc_group in ocfs2_super to record the last > used allocation group so that we can make inode groups contiguous enough.So, the logic in your patches is correct. As you can see, most of my comments were more about code flow or trivial cleanups. Assuming this all works as we expect, there shouldn't be much code for you to modify before the patches can be put in the merge_window branch. One things though - would you mind providing a small amount of data to show what sort of improvement (if any) we're getting from these patches? I don't think we need anything fancy - just enough to answer the following two questions: - How much does this improve our inode fragmentation level? Any test that fragments the inode space would be appropriate for this. We could then simply express fragmentation as some value - maybe a ratio of adjacent inodes as compared to total # of inodes, expressed as a percentage value. It would be nice for future testing if we had a small tool to calculate this (maybe by libocfs2, or by just making readdir calls and looking at inode number), - Does the 2nd patch impact overall inode creation times in a cluster, since we're now using the cluster bitmap instead of local alloc. I'm thinking any of our parallel inode creation tests would be fine for this. Thanks, --Mark -- Mark Fasheh