According to the man page of open(), when a file is opened in O_DIRECT mode, all read/write calls are synchronous. My question is whether this synchronization on Lustre only reaches to the servers or all the way to the disks, before a write call is returned? Wei-keng Liao
I respect your question, but, in general do you have any positive experience with an FS delivering substantial performance increase with O_DIRECT? I have worked with performance of about ten FS implementations or flavors, mostly vendor proprietary, and have found only the SGI XFS, with large block access, to provide as much as a 20% speed-up,. Marty Barnaby Wei-keng Liao wrote:> > According to the man page of open(), when a file is opened in O_DIRECT > mode, all read/write calls are synchronous. My question is whether this > synchronization on Lustre only reaches to the servers or all the way > to the disks, before a write call is returned? > > Wei-keng Liao > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
Interstingly Lustre internally uses IO extremely similar to direct IO because we did find two performance improvements using that (in Linux 2.4): - using page caches slowed things down (even removing clean pages of all things!) - concurrently running threads did much better allocation with direct IO than normal IO We have not verified if this has been improved in Linux 2.6 and it may very well have been because 2.6 has a much advanced version of ext3. The Lustre server loads are probably not applicable to loads seen on clients, but I thought I''d relate our experience here. - Peter -> -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > Marty Barnaby > Sent: Wednesday, September 20, 2006 3:44 PM > To: lustre-discuss@clusterfs.com > Subject: Re: [Lustre-discuss] O_DIRECT question > > I respect your question, but, in general do you have any > positive experience with an FS delivering substantial > performance increase with O_DIRECT? I have worked with > performance of about ten FS implementations or flavors, > mostly vendor proprietary, and have found only the SGI XFS, > with large block access, to provide as much as a 20% speed-up,. > > Marty Barnaby > > > Wei-keng Liao wrote: > > > > According to the man page of open(), when a file is opened > in O_DIRECT > > mode, all read/write calls are synchronous. My question is whether > > this synchronization on Lustre only reaches to the servers > or all the > > way to the disks, before a write call is returned? > > > > Wei-keng Liao > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
I have an MPI code that performs contiguous, large chunk, 1 MB aligned, non-overlapping, non-interleaved writes to a shared file and data will never read back. I could not get a better data rate when using O_DIRECT, comparing to not using it, although the access pattern should be better if using O_DIRECT. So, I am reasoning this result. (I disabled locking when I used O_DIRECT. So, locking should not be an issue.) I wonder if O_DIRECT makes a read/write call synchronous all the way to the disks on the server side before the the call returns. Or is it that the call returns immediately after the servers receive the data. If it is the former, it is reasonable that O_DIRECT performs worse. Are there other factors that lead to a bad performance of O_DIRECT, given such an access pattern. Comments are appreciated. I can provide the I/O kernel and write trace file. Please let me know. Wei-keng On Wed, 20 Sep 2006, Peter J. Braam wrote:> > > Interstingly Lustre internally uses IO extremely similar to direct IO > because we did find two performance improvements using that (in Linux 2.4): > - using page caches slowed things down (even removing clean pages of all > things!) > - concurrently running threads did much better allocation with direct IO > than normal IO > > We have not verified if this has been improved in Linux 2.6 and it may very > well have been because 2.6 has a much advanced version of ext3. > > The Lustre server loads are probably not applicable to loads seen on > clients, but I thought I''d relate our experience here. > > - Peter - > >> -----Original Message----- >> From: lustre-discuss-bounces@clusterfs.com >> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of >> Marty Barnaby >> Sent: Wednesday, September 20, 2006 3:44 PM >> To: lustre-discuss@clusterfs.com >> Subject: Re: [Lustre-discuss] O_DIRECT question >> >> I respect your question, but, in general do you have any >> positive experience with an FS delivering substantial >> performance increase with O_DIRECT? I have worked with >> performance of about ten FS implementations or flavors, >> mostly vendor proprietary, and have found only the SGI XFS, >> with large block access, to provide as much as a 20% speed-up,. >> >> Marty Barnaby >> >> >> Wei-keng Liao wrote: >>> >>> According to the man page of open(), when a file is opened >> in O_DIRECT >>> mode, all read/write calls are synchronous. My question is whether >>> this synchronization on Lustre only reaches to the servers >> or all the >>> way to the disks, before a write call is returned? >>> >>> Wei-keng Liao >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss@clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >>> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
On Thu, 2006-09-21 at 12:02 -0500, Wei-keng Liao wrote:> I have an MPI code that performs contiguous, large chunk, 1 MB aligned, > non-overlapping, non-interleaved writes to a shared file and data will > never read back. I could not get a better data rate when using O_DIRECT, > comparing to not using it, although the access pattern should be better if > using O_DIRECT. So, I am reasoning this result. (I disabled locking when I > used O_DIRECT. So, locking should not be an issue.)Hi Wei-keng. 1MB seems smallish. Is something larger possible and, importantly, still valid in your test? --Lee> > I wonder if O_DIRECT makes a read/write call synchronous all the way to > the disks on the server side before the the call returns. Or is it that > the call returns immediately after the servers receive the data. If it is > the former, it is reasonable that O_DIRECT performs worse. Are there other > factors that lead to a bad performance of O_DIRECT, given such an access > pattern. Comments are appreciated. > > I can provide the I/O kernel and write trace file. Please let me know. > > Wei-keng > > > > On Wed, 20 Sep 2006, Peter J. Braam wrote: > > > > > > > Interstingly Lustre internally uses IO extremely similar to direct IO > > because we did find two performance improvements using that (in Linux 2.4): > > - using page caches slowed things down (even removing clean pages of all > > things!) > > - concurrently running threads did much better allocation with direct IO > > than normal IO > > > > We have not verified if this has been improved in Linux 2.6 and it may very > > well have been because 2.6 has a much advanced version of ext3. > > > > The Lustre server loads are probably not applicable to loads seen on > > clients, but I thought I''d relate our experience here. > > > > - Peter - > > > >> -----Original Message----- > >> From: lustre-discuss-bounces@clusterfs.com > >> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > >> Marty Barnaby > >> Sent: Wednesday, September 20, 2006 3:44 PM > >> To: lustre-discuss@clusterfs.com > >> Subject: Re: [Lustre-discuss] O_DIRECT question > >> > >> I respect your question, but, in general do you have any > >> positive experience with an FS delivering substantial > >> performance increase with O_DIRECT? I have worked with > >> performance of about ten FS implementations or flavors, > >> mostly vendor proprietary, and have found only the SGI XFS, > >> with large block access, to provide as much as a 20% speed-up,. > >> > >> Marty Barnaby > >> > >> > >> Wei-keng Liao wrote: > >>> > >>> According to the man page of open(), when a file is opened > >> in O_DIRECT > >>> mode, all read/write calls are synchronous. My question is whether > >>> this synchronization on Lustre only reaches to the servers > >> or all the > >>> way to the disks, before a write call is returned? > >>> > >>> Wei-keng Liao > >>> > >>> _______________________________________________ > >>> Lustre-discuss mailing list > >>> Lustre-discuss@clusterfs.com > >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >>> > >>> > >> > >> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss@clusterfs.com > >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >> > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss@clusterfs.com > > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >
Hi Lee, The write request offsets and lengths are 1 MB "aligned" (offsets and lengths are multiple of 1 MB). Actually, the lengths of write calls are either 8, 9, 10, 11, or 12 MB. I have 41 write calls for each of the 16 MPI processes. That makes a file size (or the total write size) of 6.4 GB. Wei-keng On Thu, 21 Sep 2006, Lee Ward wrote:> On Thu, 2006-09-21 at 12:02 -0500, Wei-keng Liao wrote: >> I have an MPI code that performs contiguous, large chunk, 1 MB aligned, >> non-overlapping, non-interleaved writes to a shared file and data will >> never read back. I could not get a better data rate when using O_DIRECT, >> comparing to not using it, although the access pattern should be better if >> using O_DIRECT. So, I am reasoning this result. (I disabled locking when I >> used O_DIRECT. So, locking should not be an issue.) > > Hi Wei-keng. 1MB seems smallish. Is something larger possible and, > importantly, still valid in your test? > > --Lee > >> >> I wonder if O_DIRECT makes a read/write call synchronous all the way to >> the disks on the server side before the the call returns. Or is it that >> the call returns immediately after the servers receive the data. If it is >> the former, it is reasonable that O_DIRECT performs worse. Are there other >> factors that lead to a bad performance of O_DIRECT, given such an access >> pattern. Comments are appreciated. >> >> I can provide the I/O kernel and write trace file. Please let me know. >> >> Wei-keng >> >> >> >> On Wed, 20 Sep 2006, Peter J. Braam wrote: >> >>> >>> >>> Interstingly Lustre internally uses IO extremely similar to direct IO >>> because we did find two performance improvements using that (in Linux 2.4): >>> - using page caches slowed things down (even removing clean pages of all >>> things!) >>> - concurrently running threads did much better allocation with direct IO >>> than normal IO >>> >>> We have not verified if this has been improved in Linux 2.6 and it may very >>> well have been because 2.6 has a much advanced version of ext3. >>> >>> The Lustre server loads are probably not applicable to loads seen on >>> clients, but I thought I''d relate our experience here. >>> >>> - Peter - >>> >>>> -----Original Message----- >>>> From: lustre-discuss-bounces@clusterfs.com >>>> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of >>>> Marty Barnaby >>>> Sent: Wednesday, September 20, 2006 3:44 PM >>>> To: lustre-discuss@clusterfs.com >>>> Subject: Re: [Lustre-discuss] O_DIRECT question >>>> >>>> I respect your question, but, in general do you have any >>>> positive experience with an FS delivering substantial >>>> performance increase with O_DIRECT? I have worked with >>>> performance of about ten FS implementations or flavors, >>>> mostly vendor proprietary, and have found only the SGI XFS, >>>> with large block access, to provide as much as a 20% speed-up,. >>>> >>>> Marty Barnaby >>>> >>>> >>>> Wei-keng Liao wrote: >>>>> >>>>> According to the man page of open(), when a file is opened >>>> in O_DIRECT >>>>> mode, all read/write calls are synchronous. My question is whether >>>>> this synchronization on Lustre only reaches to the servers >>>> or all the >>>>> way to the disks, before a write call is returned? >>>>> >>>>> Wei-keng Liao >>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss@clusterfs.com >>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss@clusterfs.com >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss@clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss@clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> >> >
O_DIRECT IO is synchronous to the disk, fyi.> -----Original Message----- > From: lustre-discuss-bounces@clusterfs.com > [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of > Wei-keng Liao > Sent: Thursday, September 21, 2006 11:17 AM > To: lustre-discuss@clusterfs.com > Subject: RE: [Lustre-discuss] O_DIRECT question > > Hi Lee, > > The write request offsets and lengths are 1 MB "aligned" > (offsets and lengths are multiple of 1 MB). Actually, the > lengths of write calls are either 8, 9, 10, 11, or 12 MB. I > have 41 write calls for each of the 16 MPI processes. That > makes a file size (or the total write size) of 6.4 GB. > > Wei-keng > > > On Thu, 21 Sep 2006, Lee Ward wrote: > > > On Thu, 2006-09-21 at 12:02 -0500, Wei-keng Liao wrote: > >> I have an MPI code that performs contiguous, large chunk, 1 MB > >> aligned, non-overlapping, non-interleaved writes to a > shared file and > >> data will never read back. I could not get a better data rate when > >> using O_DIRECT, comparing to not using it, although the access > >> pattern should be better if using O_DIRECT. So, I am > reasoning this > >> result. (I disabled locking when I used O_DIRECT. So, > locking should > >> not be an issue.) > > > > Hi Wei-keng. 1MB seems smallish. Is something larger possible and, > > importantly, still valid in your test? > > > > --Lee > > > >> > >> I wonder if O_DIRECT makes a read/write call synchronous > all the way > >> to the disks on the server side before the the call > returns. Or is it > >> that the call returns immediately after the servers > receive the data. > >> If it is the former, it is reasonable that O_DIRECT > performs worse. > >> Are there other factors that lead to a bad performance of > O_DIRECT, > >> given such an access pattern. Comments are appreciated. > >> > >> I can provide the I/O kernel and write trace file. Please > let me know. > >> > >> Wei-keng > >> > >> > >> > >> On Wed, 20 Sep 2006, Peter J. Braam wrote: > >> > >>> > >>> > >>> Interstingly Lustre internally uses IO extremely similar > to direct > >>> IO because we did find two performance improvements using > that (in Linux 2.4): > >>> - using page caches slowed things down (even removing > clean pages of > >>> all > >>> things!) > >>> - concurrently running threads did much better allocation with > >>> direct IO than normal IO > >>> > >>> We have not verified if this has been improved in Linux > 2.6 and it > >>> may very well have been because 2.6 has a much advanced > version of ext3. > >>> > >>> The Lustre server loads are probably not applicable to > loads seen on > >>> clients, but I thought I''d relate our experience here. > >>> > >>> - Peter - > >>> > >>>> -----Original Message----- > >>>> From: lustre-discuss-bounces@clusterfs.com > >>>> [mailto:lustre-discuss-bounces@clusterfs.com] On Behalf Of Marty > >>>> Barnaby > >>>> Sent: Wednesday, September 20, 2006 3:44 PM > >>>> To: lustre-discuss@clusterfs.com > >>>> Subject: Re: [Lustre-discuss] O_DIRECT question > >>>> > >>>> I respect your question, but, in general do you have any > positive > >>>> experience with an FS delivering substantial performance > increase > >>>> with O_DIRECT? I have worked with performance of about ten FS > >>>> implementations or flavors, mostly vendor proprietary, and have > >>>> found only the SGI XFS, with large block access, to > provide as much > >>>> as a 20% speed-up,. > >>>> > >>>> Marty Barnaby > >>>> > >>>> > >>>> Wei-keng Liao wrote: > >>>>> > >>>>> According to the man page of open(), when a file is opened > >>>> in O_DIRECT > >>>>> mode, all read/write calls are synchronous. My question > is whether > >>>>> this synchronization on Lustre only reaches to the servers > >>>> or all the > >>>>> way to the disks, before a write call is returned? > >>>>> > >>>>> Wei-keng Liao > >>>>> > >>>>> _______________________________________________ > >>>>> Lustre-discuss mailing list > >>>>> Lustre-discuss@clusterfs.com > >>>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >>>>> > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Lustre-discuss mailing list > >>>> Lustre-discuss@clusterfs.com > >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >>>> > >>> > >>> _______________________________________________ > >>> Lustre-discuss mailing list > >>> Lustre-discuss@clusterfs.com > >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >>> > >> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss@clusterfs.com > >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > >> > >> > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
On 9/21/06, Wei-keng Liao <wkliao@ece.northwestern.edu> wrote:> I wonder if O_DIRECT makes a read/write call synchronous all the way to > the disks on the server side before the the call returns.Yes, that''s how it works. Johann