Ling, Xiaofeng
2004-Jul-20 17:08 UTC
[Ocfs2-devel] A patch to improve the metadata reading throughput(a gainst svn1267)
>-----Original Message----- >Another thing that's on the list which you might be interested >in looking at >is not sending all lock release messages. Some of them do >basically nothing >on the other end in process_vote, so there's really no reason >to send them >to the nodes at all. This should help alot when you've batched >up a ton of >locks to release in commit_cache.Now, in our patch, the release message will notify the other node to throw away meta data caches, so they are not doing nothing.>So are you planning to turn off immediate checkpointing for >all the other >journal transactions? This is also on the list :) The only one >that *may* be >troublesome I believe is truncate. Otherwise, the ones that >are left are: >link, symlink, and rename.Yes, the immediate checkpointing is the main reason for the low performance of these operations we found.>> 4. readdir() may get old data after the data is written back >to disk in >> journal asynchronously. It is not a bug. But which way is >better, sync >> the new data to disk when other nodes notify READONLY message or just >> let them get old data? >No, we consider it a bug :) The other nodes should be getting >up to date >directory contents.Now, in our patch, the release message is sent in journal asynchronously, so before that, we can think the write is not finished. So we think this is accepted and not bug, of cause, resolved it is also ok. Index: src/journal.c ==================================================================--- src/journal.c (revision 1267) +++ src/journal.c (working copy) @@ -148,6 +148,8 @@ } spin_unlock(&journal->cmt_lock); + if (osb->needs_flush) + ocfs_sync_blockdev(osb->sb);>Is this necessary? It seems awfully heavy, and since we journal *all*> >metadata (so it should be synced up to disk via the journal_flush justa>couple lines above that), I don't see the point... I was actuallymeaning to>take the other call to sync_blockdev out as it's never used :)We added this just because we found that some times we can not see the new created directory from another node, but by adding this, we can always see. Seems some buffer in block device's cache list are not flushed to disk after journal_flush. And after the lock release message is sent, the meta data cache on another node can not be throw away any more.So we must ensure all data is synced to disk on this node before sending message.
Mark Fasheh
2004-Jul-22 23:46 UTC
[Ocfs2-devel] A patch to improve the metadata reading throughput(a gainst svn1267)
On Wed, Jul 21, 2004 at 05:58:38AM +0800, Ling, Xiaofeng wrote:> --- src/journal.c (revision 1267) > +++ src/journal.c (working copy) > @@ -148,6 +148,8 @@ > } > spin_unlock(&journal->cmt_lock); > > + if (osb->needs_flush) > + ocfs_sync_blockdev(osb->sb); > > >Is this necessary? It seems awfully heavy, and since we journal *all*> > >metadata (so it should be synced up to disk via the journal_flush just > a > >couple lines above that), I don't see the point... I was actually > meaning to > >take the other call to sync_blockdev out as it's never used :) > > We added this just because we found that some times we can not see the > new created directory > from another node, but by adding this, we can always see. Seems > some buffer in block device's cache list are not flushed to disk after > journal_flush.Actually, the bug (which I just fixed) was that we weren't telling the other node to wait *on* the journal flush for a busy directory. Once I fixed it, I haven't had any directory contents consistency issues. See svn revision 1302 for the patch.> And after the lock release message is sent, the meta data cache on > another node can not be > throw away any more.So we must ensure all data is synced to disk on this > node before sending message.Again, JBD handles this for us via journal_flush. If you're seeing metadata inconsistency it's much more likely that it's a DLM bug (in the case of the readdir) or a caching issue. --Mark -- Mark Fasheh Software Developer, Oracle Corp mark.fasheh@oracle.com
Ling, Xiaofeng
2004-Jul-23 18:41 UTC
[Ocfs2-devel] A patch to improve the metadata reading throughput(a gainst svn1267)
=20>-----Original Message----- >From: Mark Fasheh [mailto:mark.fasheh@oracle.com]=20 >Sent: 2004=C4=EA7=D4=C222=C8=D5 21:46 >To: Ling, Xiaofeng >Cc: Zhang, Sonic; Fu, Michael; Yang, Elton; Ocfs2-Devel >Subject: Re: [Ocfs2-devel] A patch to improve the metadata=20 >reading throughput(a gainst svn1267)>Actually, the bug (which I just fixed) was that we weren't=20 >telling the other >node to wait *on* the journal flush for a busy directory. Once=20 >I fixed it, I >haven't had any directory contents consistency issues. See svn=20 >revision 1302 >for the patch.Yes, I've run test, it's ok now. So currently, when the node get READONLY message, it will=20 flush the journal and then answer it. Is my description correct?