thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH 0/3] Properly protect the extent map [Mar 2007]

If this information is useful, please help other people find it:
Share via:

Joel Becker

2007-Mar-28 18:50 UTC

[Ocfs2-devel] [PATCH 0/3] Properly protect the extent map

The extent map code was seeing an error trip when inserting records.
There is a sanity check ensuring that the new record is within the known
size of the file - as known by the extent map.  The problem was that the
"known" size was smaller than expected, returning an error to the
caller.

This is a lot easier to see in local mounts.  It has only been rarely seen
in clusters.

We have three "independant" problems.

1) We don't take ip_alloc_sem when touching directory allocations.
2) We don't lock em_clusters properly.
3) Local mounts race to ocfs2_extent_map_trunc() in ocfs2_meta_lock_update().

The patches that follow are against mainline.  They have been tested
against 1.2 with local mounts.

Patches (1) and (2) will be ported to the ocfs2 1.2 branch directly,
and all three fixes will be pushed to mainline.

Joel

--

Life's Little Instruction Book #69

	"Whistle"

                                  jlbec@innerx.net
                                  http://ocala.cs.miami.edu/~jlbec

Joel Becker

2007-Mar-28 18:50 UTC

head link

[Ocfs2-devel] [PATCH 1/3] ocfs2: Wrap access of directory allocations with ip_alloc_sem.

OCFS2_I(inode)->ip_alloc_sem is a read-write semaphore protecting
local concurrent access of ocfs2 inodes.  However, ocfs2 directories were
not taking the semaphore while they accessed or modified the allocation
tree.

ocfs2_extend_dir() needs to take the semaphore in a write mode when it
adds to the allocation.  All other directory users get there via
ocfs2_bread(), which takes the semaphore in read mode.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
---
 fs/ocfs2/dir.c   |    7 ++++++-
 fs/ocfs2/inode.c |    2 ++
 2 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index 66821e1..fd41f8b 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -402,7 +402,7 @@ static int ocfs2_extend_dir(struct ocfs2
 			    struct buffer_head **new_de_bh)
 {
 	int status = 0;
-	int credits, num_free_extents;
+	int credits, num_free_extents, drop_alloc_sem = 0;
 	loff_t dir_i_size;
 	struct ocfs2_dinode *fe = (struct ocfs2_dinode *) parent_fe_bh->b_data;
 	struct ocfs2_alloc_context *data_ac = NULL;
@@ -451,6 +451,9 @@ static int ocfs2_extend_dir(struct ocfs2
 		credits = OCFS2_SIMPLE_DIR_EXTEND_CREDITS;
 	}
 
+	down_write(&OCFS2_I(dir)->ip_alloc_sem);
+	drop_alloc_sem = 1;
+
 	handle = ocfs2_start_trans(osb, credits);
 	if (IS_ERR(handle)) {
 		status = PTR_ERR(handle);
@@ -496,6 +499,8 @@ static int ocfs2_extend_dir(struct ocfs2
 	*new_de_bh = new_bh;
 	get_bh(*new_de_bh);
 bail:
+	if (drop_alloc_sem)
+		up_write(&OCFS2_I(dir)->ip_alloc_sem);
 	if (handle)
 		ocfs2_commit_trans(osb, handle);
 
diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
index 28ab56f..480d90f 100644
--- a/fs/ocfs2/inode.c
+++ b/fs/ocfs2/inode.c
@@ -1121,8 +1121,10 @@ struct buffer_head *ocfs2_bread(struct i
 		return NULL;
 	}
 
+	down_read(&OCFS2_I(inode)->ip_alloc_sem);
 	tmperr = ocfs2_extent_map_get_blocks(inode, block, 1,
 					     &p_blkno, NULL);
+	up_read(&OCFS2_I(inode)->ip_alloc_sem);
 	if (tmperr < 0) {
 		mlog_errno(tmperr);
 		goto fail;
-- 
1.4.2.3

Joel Becker

2007-Mar-28 18:51 UTC

head link

[Ocfs2-devel] [PATCH 3/3] ocfs2: Local mounts should not truncate the extent map.

A local mount can implicitly trust state on the inode.  There are no other
nodes to change the filesystem.  Thus, local mounts should leave the extent
map intact until ->clear_inode().

This fixes a bug where a call to ocfs2_meta_lock_update() can race queries
of the extent map.  A clustered mount only allows one process through
ocfs2_meta_lock_update(), but a local mount allows all processes through.
Thus, one process can be all the way to ocfs2_extent_map_get_blocks() when
a second process is truncating the extent map in ocfs2_meta_lock_update().
That no longer happens because the extent map is unmodified.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
---
 fs/ocfs2/dlmglue.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index e335541..eb1c8fa 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -1508,8 +1508,9 @@ static int ocfs2_meta_lock_update(struct
 	ocfs2_metadata_cache_purge(inode);
 
 	/* will do nothing for inode types that don't use the extent
-	 * map (directories, bitmap files, etc) */
-	ocfs2_extent_map_trunc(inode, 0);
+	 * map (bitmap files, etc) */
+	if (!ocfs2_mount_local(osb))
+		ocfs2_extent_map_trunc(inode, 0);
 
 	if (lockres && ocfs2_meta_lvb_is_trustable(inode, lockres)) {
 		mlog(0, "Trusting LVB on inode %llu\n",
-- 
1.4.2.3

Joel Becker

2007-Mar-28 18:51 UTC

head link

[Ocfs2-devel] [PATCH 2/3] ocfs2: Properly lock extent map size changes.

The extent map code failed to properly lock changes to ->em_clusters, the
extent map's idea of its own size.

This leads to a subtle race.  One process is updating the size to match an
inode that changed, while another process is already past that in
the lookup code checking the size against its arguments.  For a moment, the
size is wrong (due to how the size is checked and calculated).

Properly locking the update and the query makes this safe.

The check for size change is abstracted into a common function.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
---
 fs/ocfs2/extent_map.c |   90 +++++++++++++++++++++++++++----------------------
 1 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
index 80ac69f..9ff4351 100644
--- a/fs/ocfs2/extent_map.c
+++ b/fs/ocfs2/extent_map.c
@@ -84,6 +84,8 @@ static int ocfs2_extent_map_try_insert(s
 				       struct ocfs2_extent_rec *rec,
 				       int tree_depth,
 				       struct ocfs2_em_insert_context *ctxt);
+static void ocfs2_extent_map_check_size_change(struct inode *inode,
+					       u32 expected_clusters);
 
 /* returns 1 only if the rec contains all the given clusters -- that is that
  * rec's cpos is <= the cluster cpos and that the rec endpoint (cpos +
@@ -558,8 +560,10 @@ static int ocfs2_extent_map_insert(struc
 	int ret;
 	struct ocfs2_em_insert_context ctxt = {0, };
 
+	spin_lock(&OCFS2_I(inode)->ip_lock);
 	if ((le32_to_cpu(rec->e_cpos) + le32_to_cpu(rec->e_clusters)) >
 	    OCFS2_I(inode)->ip_map.em_clusters) {
+		spin_unlock(&OCFS2_I(inode)->ip_lock);
 		ret = -EBADR;
 		mlog_errno(ret);
 		return ret;
@@ -569,6 +573,7 @@ static int ocfs2_extent_map_insert(struc
 	if (!rec->e_clusters) {
 		if ((le32_to_cpu(rec->e_cpos) + le32_to_cpu(rec->e_clusters)) ! 		   
OCFS2_I(inode)->ip_map.em_clusters) {
+			spin_unlock(&OCFS2_I(inode)->ip_lock);
 			ret = -EBADR;
 			mlog_errno(ret);
 			ocfs2_error(inode->i_sb,
@@ -578,9 +583,12 @@ static int ocfs2_extent_map_insert(struc
 			return ret;
 		}
 
+		spin_unlock(&OCFS2_I(inode)->ip_lock);
+
 		/* Ignore the truncated tail */
 		return 0;
 	}
+	spin_unlock(&OCFS2_I(inode)->ip_lock);
 
 	ret = -ENOMEM;
 	ctxt.new_ent = kmem_cache_alloc(ocfs2_em_ent_cachep,
@@ -662,15 +670,8 @@ int ocfs2_extent_map_append(struct inode
 	BUG_ON(!new_clusters);
 	BUG_ON(le32_to_cpu(rec->e_clusters) < new_clusters);
 
-	if (em->em_clusters < OCFS2_I(inode)->ip_clusters) {
-		/*
-		 * Size changed underneath us on disk.  Drop any
-		 * straddling records and update our idea of
-		 * i_clusters
-		 */
-		ocfs2_extent_map_drop(inode, em->em_clusters - 1);
-		em->em_clusters = OCFS2_I(inode)->ip_clusters;
-	}
+	ocfs2_extent_map_check_size_change(inode,
+					   OCFS2_I(inode)->ip_clusters);
 
 	mlog_bug_on_msg((le32_to_cpu(rec->e_cpos) +
 			 le32_to_cpu(rec->e_clusters)) !@@ -745,7 +746,6 @@ int
ocfs2_extent_map_get_rec(struct inod
 			     int *tree_depth)
 {
 	int ret = -ENOENT;
-	struct ocfs2_extent_map *em = &OCFS2_I(inode)->ip_map;
 	struct ocfs2_extent_map_entry *ent;
 
 	*rec = NULL;
@@ -753,15 +753,7 @@ int ocfs2_extent_map_get_rec(struct inod
 	if (cpos >= OCFS2_I(inode)->ip_clusters)
 		return -EINVAL;
 
-	if (cpos >= em->em_clusters) {
-		/*
-		 * Size changed underneath us on disk.  Drop any
-		 * straddling records and update our idea of
-		 * i_clusters
-		 */
-		ocfs2_extent_map_drop(inode, em->em_clusters - 1);
-		em->em_clusters = OCFS2_I(inode)->ip_clusters ;
-	}
+	ocfs2_extent_map_check_size_change(inode, cpos);
 
 	ent = ocfs2_extent_map_lookup(&OCFS2_I(inode)->ip_map, cpos, 1,
 				      NULL, NULL);
@@ -782,7 +774,6 @@ int ocfs2_extent_map_get_clusters(struct
 {
 	int ret;
 	u32 coff, ccount;
-	struct ocfs2_extent_map *em = &OCFS2_I(inode)->ip_map;
 	struct ocfs2_extent_map_entry *ent = NULL;
 
 	*p_cpos = ccount = 0;
@@ -790,16 +781,7 @@ int ocfs2_extent_map_get_clusters(struct
 	if ((v_cpos + count) > OCFS2_I(inode)->ip_clusters)
 		return -EINVAL;
 
-	if ((v_cpos + count) > em->em_clusters) {
-		/*
-		 * Size changed underneath us on disk.  Drop any
-		 * straddling records and update our idea of
-		 * i_clusters
-		 */
-		ocfs2_extent_map_drop(inode, em->em_clusters - 1);
-		em->em_clusters = OCFS2_I(inode)->ip_clusters;
-	}
-
+	ocfs2_extent_map_check_size_change(inode, v_cpos + count);
 
 	ret = ocfs2_extent_map_lookup_read(inode, v_cpos, count, &ent);
 	if (ret)
@@ -838,7 +820,6 @@ int ocfs2_extent_map_get_blocks(struct i
 	u32 cpos, clusters;
 	int bpc = ocfs2_clusters_to_blocks(inode->i_sb, 1);
 	struct ocfs2_extent_map_entry *ent = NULL;
-	struct ocfs2_extent_map *em = &OCFS2_I(inode)->ip_map;
 	struct ocfs2_extent_rec *rec;
 
 	*p_blkno = 0;
@@ -852,15 +833,7 @@ int ocfs2_extent_map_get_blocks(struct i
 		return ret;
 	}
 
-	if ((cpos + clusters) > em->em_clusters) {
-		/*
-		 * Size changed underneath us on disk.  Drop any
-		 * straddling records and update our idea of
-		 * i_clusters
-		 */
-		ocfs2_extent_map_drop(inode, em->em_clusters - 1);
-		em->em_clusters = OCFS2_I(inode)->ip_clusters;
-	}
+	ocfs2_extent_map_check_size_change(inode, cpos + clusters);
 
 	ret = ocfs2_extent_map_lookup_read(inode, cpos, clusters, &ent);
 	if (ret) {
@@ -996,6 +969,43 @@ int ocfs2_extent_map_drop(struct inode *
 }
 
 /*
+ * This is almost a wrapper of ocfs2_extent_map_drop(), but must
+ * handle its locking carefully.
+ */
+static void ocfs2_extent_map_check_size_change(struct inode *inode,
+					       u32 expected_clusters)
+{
+	struct rb_node *free_head = NULL;
+	struct ocfs2_extent_map *em = &OCFS2_I(inode)->ip_map;
+	struct ocfs2_extent_map_entry *ent;
+
+	spin_lock(&OCFS2_I(inode)->ip_lock);
+
+	if (em->em_clusters < expected_clusters) {
+		/*
+		 * Size changed underneath us on disk.  Drop any
+		 * straddling records and update our idea of
+		 * i_clusters
+		 */
+		__ocfs2_extent_map_drop(inode, em->em_clusters -1,
+					&free_head, &ent);
+
+		if (ent) {
+			rb_erase(&ent->e_node, &em->em_extents);
+			ent->e_node.rb_right = free_head;
+			free_head = &ent->e_node;
+		}
+
+		em->em_clusters = OCFS2_I(inode)->ip_clusters;
+	}
+
+	spin_unlock(&OCFS2_I(inode)->ip_lock);
+
+	if (free_head)
+		__ocfs2_extent_map_drop_cleanup(free_head);
+}
+
+/*
  * Remove all entries past new_clusters and also clip any extent
  * straddling new_clusters, if there is one.  This does not check
  * or modify ip_clusters
-- 
1.4.2.3

Fabio Massimo Di Nitto

2007-Mar-29 02:04 UTC

head link

[Ocfs2-devel] [PATCH 0/3] Properly protect the extent map

Joel Becker wrote:> The extent map code was seeing an error trip when inserting records.
> There is a sanity check ensuring that the new record is within the known
> size of the file - as known by the extent map.  The problem was that the
> "known" size was smaller than expected, returning an error to the
caller.
> 
> This is a lot easier to see in local mounts.  It has only been rarely seen
> in clusters.
> 
> We have three "independant" problems.
> 
> 1) We don't take ip_alloc_sem when touching directory allocations.
> 2) We don't lock em_clusters properly.
> 3) Local mounts race to ocfs2_extent_map_trunc() in
ocfs2_meta_lock_update().
> 
> The patches that follow are against mainline.  They have been tested
> against 1.2 with local mounts.
> 
> Patches (1) and (2) will be ported to the ocfs2 1.2 branch directly,
> and all three fixes will be pushed to mainline.
> 
> Joel
add a

Signed-off-by: Fabio M. Di Nitto <fabbione@ubuntu.com>

on all 3 of them. They fix the problem as i reported it, but it looks like they
introduce a performance hit. Somebody other than me should test it too.

Fabio

-- 
I'm going to make him an offer he can't refuse.

Mark Fasheh

2007-Apr-02 11:45 UTC

head link

[Ocfs2-devel] [PATCH 0/3] Properly protect the extent map

On Thu, Mar 29, 2007 at 10:38:24AM +0200, Fabio Massimo Di Nitto
wrote:> on all 3 of them. They fix the problem as i reported it, but it looks like
they
> introduce a performance hit.
Can you elaborate on that please?

I don't see anything in those patches that should cause any performance
problems but if you have any more information about what you saw, it might
point to something we missed.
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

Mark Fasheh

2007-Apr-02 11:47 UTC

head link

[Ocfs2-devel] [PATCH 0/3] Properly protect the extent map

On Wed, Mar 28, 2007 at 06:27:07PM -0700, Joel Becker
wrote:> The patches that follow are against mainline.  They have been tested
> against 1.2 with local mounts.
> 
> Patches (1) and (2) will be ported to the ocfs2 1.2 branch directly,
> and all three fixes will be pushed to mainline.
Ok. These patches look good - thanks. Have you had a chance to test against
mainline?
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

Ocfs2 devel - Mar 2007 - [PATCH 0/3] Properly protect the extent map

[Ocfs2-devel] [PATCH 0/3] Properly protect the extent map

[Ocfs2-devel] [PATCH 1/3] ocfs2: Wrap access of directory allocations with ip_alloc_sem.

[Ocfs2-devel] [PATCH 3/3] ocfs2: Local mounts should not truncate the extent map.

[Ocfs2-devel] [PATCH 2/3] ocfs2: Properly lock extent map size changes.

[Ocfs2-devel] [PATCH 0/3] Properly protect the extent map

[Ocfs2-devel] [PATCH 0/3] Properly protect the extent map

[Ocfs2-devel] [PATCH 0/3] Properly protect the extent map