Evening Just wondering if the user-space port of lustre servers to Solaris/ZFS will have normal fs caching, or will they do direct IO? That is, the lustre servers will effectively get caching simply by being in user space. We have an application that would significantly benefit from caching on lustre server side. We are happy to forgo reliability in the face of crashes to get it. Stu. -- Dr Stuart Midgley sdm900 at gmail.com
On May 22, 2008 17:52 +0800, Stu Midgley wrote:> Just wondering if the user-space port of lustre servers to Solaris/ZFS > will have normal fs caching, or will they do direct IO? That is, the > lustre servers will effectively get caching simply by being in user > space. We have an application that would significantly benefit from > caching on lustre server side. We are happy to forgo reliability in > the face of crashes to get it.The implementation of ZFS is segmented into several major functional units. The data management unit (DMU) is the bulk of the on-disk structure and is what Lustre will actually interface with for the MDT/OST. The adaptive replacement cache (ARC) is the cache management code and is used to manage manage memory in both userspace and the Solaris kernel. As a result, the userspace servers do not benefit at all from having a duplicate kernel-side cache, like any large database implementation, so we will be avoiding that as best possible. Initially we were doing O_DIRECT IO, but using async IO (libaio) showed much better performance for the way the ARC submits IO to disk. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Right. So I am still not clear whether 64GB oss''s will be of any benefit. My understanding at the moment is that oss''s don''t benefit from larger memory, hence we run with 2GB. -- Dr Stuart Midgley sdm900 at gmail.com>> > > The implementation of ZFS is segmented into several major functional > units. The data management unit (DMU) is the bulk of the on-disk > structure and is what Lustre will actually interface with for the > MDT/OST. > The adaptive replacement cache (ARC) is the cache management code and > is used to manage manage memory in both userspace and the Solaris > kernel. > > As a result, the userspace servers do not benefit at all from having a > duplicate kernel-side cache, like any large database implementation, > so we will be avoiding that as best possible. Initially we were doing > O_DIRECT IO, but using async IO (libaio) showed much better > performance > for the way the ARC submits IO to disk. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
On May 23, 2008 08:11 +0800, Stuart Midgley wrote:> Right. So I am still not clear whether 64GB oss''s will be of any benefit. > My understanding at the moment is that oss''s don''t benefit from larger > memory, hence we run with 2GB.Sorry to not be clear - the DMU ARC will allow caching on the OSS node.> -- > Dr Stuart Midgley > sdm900 at gmail.com > > >>> >> >> The implementation of ZFS is segmented into several major functional >> units. The data management unit (DMU) is the bulk of the on-disk >> structure and is what Lustre will actually interface with for the MDT/OST. >> The adaptive replacement cache (ARC) is the cache management code and >> is used to manage manage memory in both userspace and the Solaris kernel. >> >> As a result, the userspace servers do not benefit at all from having a >> duplicate kernel-side cache, like any large database implementation, >> so we will be avoiding that as best possible. Initially we were doing >> O_DIRECT IO, but using async IO (libaio) showed much better performance >> for the way the ARC submits IO to disk. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >>Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.