thr3ads.net - Ocfs2 devel - [Ocfs2-devel] The truncate_inode_page call in ocfs_file_releasecaus es the severethroughput drop of file reading in OCFS2. [Jun 2004]

If this information is useful, please help other people find it:
Share via:

Zhang, Sonic

2004-Jun-22 03:58 UTC

[Ocfs2-devel] The truncate_inode_page call in ocfs_file_releasecaus es the severethroughput drop of file reading in OCFS2.

Hi Wim,

	I remember that the OCFS only make sure the metadata is
consistent among different nodes in the cluster, but it doesn't care
about the file data consistency.

	So, I think we don't need to notify every change of a file to
all active nodes. What should be done is only notify the changes in the
inode metadata of a file, which costs little bandwidth. Why do you care
about the file data consistency in your example?

	If OCFS has to make sure the file data consistency, the current
truncate_inode_page() solution also doesn't work. See my sample:

1. Node 1 writes block 1 to file 1, flush to disk and keep it open.
2. Node 2 open file 1, reads block 1 and wait.
3. Node 1 writes block 1 again with new data. Also flush to disk.
4. Node 2 reads block 1 again.

Now, the data of block 1 got by node 2 is not the data on the disk.



-----Original Message-----
From: wim.coekaerts@oracle.com [mailto:wim.coekaerts@oracle.com] 
Sent: Tuesday, June 22, 2004 4:01 PM
To: Zhang, Sonic
Cc: Ocfs2-Devel; Rusty Lynch; Fu, Michael; Yang, Elton
Subject: Re: [Ocfs2-devel] The truncate_inode_page call in
ocfs_file_releasecaus es the severethroughput drop of file reading in
OCFS2.

yeah... it's on purpose for the reason you mentioned.
multinodeconsistency

i was actually cosnidering testing by taking out truncateinodepages,
this has been discussed internqally for quite a few months, it's a big
nightmare i have nightly ;-)

the problem is, how can we notify. I think we don't want to notify every
node on every change othewise we overload the interconnect and we don't
have a good consistent map, if I remmeber Kurts explanation correctly.

this has to be fixed for regular performance for sure, the question is
how do we do this in a good way. 

I'd say, feel free to experiment... just remember that the big probelm
is multinode consistency. imagine this :

I open file /ocfs/foo and read it
all cached
close file, no one on this node has it open

on node2 I write some data, either O_DIRECT or regular
close or keep it open whichever

on node1 I now do an md5sum


> development machine. But, if we try to bypass the call to
> truncate_inode_page(), the file reading throughput in one node can
reach> 1300M bytes/sec, which is about 75% of that of ext3.
> 
> 	I think it is not a good idea to clean all page caches of an
> inode when its last reference is closed. This inode may be reopened
very> soon and its cached pages may be accessed again. 
> 
> 	I guess your intention to call truncate_inode_page() is to avoid
> inconsistency of the metadata if a process on the other node changes
the> same inode metadata on disk before it is reopened in this node. Am I
> right? Do you have more concern?
> 
> 	I think in this case we have 2 options. One is to clean all
> pages of this inode when receive the file change notification (rename,
> delete, move, attributes, etc) in the receiver thread. The other is to
> only invalidate pages contain the metadata of this inode.
> 
> 	What's your opinion?
> 
> 	Thank you.
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel

Mark Fasheh

2004-Jun-22 13:37 UTC

head link

[Ocfs2-devel] The truncate_inode_page call in ocfs_file_releasecaus es the severethroughput drop of file reading in OCFS2.

On Tue, Jun 22, 2004 at 04:57:56PM +0800, Zhang, Sonic
wrote:> Hi Wim,
> 
> 	I remember that the OCFS only make sure the metadata is
> consistent among different nodes in the cluster, but it doesn't care
> about the file data consistency.Actually we use journalling and the inode sequence numbers for metadata
consistency. the truncate_inode_pages calls *are* used for data consistency,
but you're right in that we only really provide a minimal effort for that
(relying mostly on direct I/O in the database case for real consistency).
> 	So, I think we don't need to notify every change of a file to
> all active nodes. What should be done is only notify the changes in the
> inode metadata of a file, which costs little bandwidth. Why do you care
> about the file data consistency in your example?Well, we already more or less handle this. Again, I think you're thinking
metadata when you want to be thinking data.
> 	If OCFS has to make sure the file data consistency, the current
> truncate_inode_page() solution also doesn't work. See my sample:
> 
> 1. Node 1 writes block 1 to file 1, flush to disk and keep it open.
> 2. Node 2 open file 1, reads block 1 and wait.
> 3. Node 1 writes block 1 again with new data. Also flush to disk.
> 4. Node 2 reads block 1 again.
> 
> Now, the data of block 1 got by node 2 is not the data on the disk.Yeah, that's probably a hole in our scheme :)
	--Mark
> 
> 
> 
> -----Original Message-----
> From: wim.coekaerts@oracle.com [mailto:wim.coekaerts@oracle.com] 
> Sent: Tuesday, June 22, 2004 4:01 PM
> To: Zhang, Sonic
> Cc: Ocfs2-Devel; Rusty Lynch; Fu, Michael; Yang, Elton
> Subject: Re: [Ocfs2-devel] The truncate_inode_page call in
> ocfs_file_releasecaus es the severethroughput drop of file reading in
> OCFS2.
> 
> yeah... it's on purpose for the reason you mentioned.
> multinodeconsistency
> 
> i was actually cosnidering testing by taking out truncateinodepages,
> this has been discussed internqally for quite a few months, it's a big
> nightmare i have nightly ;-)
> 
> the problem is, how can we notify. I think we don't want to notify
every
> node on every change othewise we overload the interconnect and we don't
> have a good consistent map, if I remmeber Kurts explanation correctly.
> 
> this has to be fixed for regular performance for sure, the question is
> how do we do this in a good way. 
> 
> I'd say, feel free to experiment... just remember that the big probelm
> is multinode consistency. imagine this :
> 
> I open file /ocfs/foo and read it
> all cached
> close file, no one on this node has it open
> 
> on node2 I write some data, either O_DIRECT or regular
> close or keep it open whichever
> 
> on node1 I now do an md5sum
> 
> 
> 
> > development machine. But, if we try to bypass the call to
> > truncate_inode_page(), the file reading throughput in one node can
> reach
> > 1300M bytes/sec, which is about 75% of that of ext3.
> > 
> > 	I think it is not a good idea to clean all page caches of an
> > inode when its last reference is closed. This inode may be reopened
> very
> > soon and its cached pages may be accessed again. 
> > 
> > 	I guess your intention to call truncate_inode_page() is to avoid
> > inconsistency of the metadata if a process on the other node changes
> the
> > same inode metadata on disk before it is reopened in this node. Am I
> > right? Do you have more concern?
> > 
> > 	I think in this case we have 2 options. One is to clean all
> > pages of this inode when receive the file change notification (rename,
> > delete, move, attributes, etc) in the receiver thread. The other is to
> > only invalidate pages contain the metadata of this inode.
> > 
> > 	What's your opinion?
> > 
> > 	Thank you.
> > 
> > 
> > _______________________________________________
> > Ocfs2-devel mailing list
> > Ocfs2-devel@oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel@oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel--
Mark Fasheh
Software Developer, Oracle Corp
mark.fasheh@oracle.com

Ocfs2 devel - Jun 2004 - The truncate_inode_page call in ocfs_file_releasecaus es the severethroughput drop of file reading in OCFS2.

[Ocfs2-devel] The truncate_inode_page call in ocfs_file_releasecaus es the severethroughput drop of file reading in OCFS2.

[Ocfs2-devel] The truncate_inode_page call in ocfs_file_releasecaus es the severethroughput drop of file reading in OCFS2.