Greetings-- Packages for Lustre 1.0.2 are now available in the usual place http://www.clusterfs.com/download.html This bug-fix release resolves a number of issues, of which a few are user-visible: - the default debug level is now a more reasonable production value - zero-copy TCP is now enabled by default, if your hardware supports it - you should encounter fewer allocation failures - the MDS service name can now be the same as the hostname There are two known issues in particular which are not yet fixed: - running a client and OST on the same node is still not 100% stable. Application or system hangs are possible. - If you run out of free space, the "out of space" errors may be silently eaten by the write cache, and not communicated to the application. The only workaround is to avoid running out of free space. There are many smaller fixes in this release; a detailed list can be found on the web site. As usual, please read carefully what is stated on the download page. We know that this code can be used with considerable success and excellent performance. Yet we expect many problems will arise when people not familiar with the software start using it. We want to work with you to eliminate those problems and improve documentation. If you have problems with Lustre, you may find some help on the lustre-discuss mailing list: lustre-discuss@lists.clusterfs.com For those requiring additional support or guaranteed response times, per-incident and annual service agreements are also available. For the Lustre team, -Phil
Greetings-- Packages for Lustre 1.0.3 are now available in the usual place http://www.clusterfs.com/download.html This bug-fix release resolves a number of issues, some of which were previously discussed on lustre-discuss: - "lfs find --obd" was fixed - Added /proc/fs/lustre/obdfilter/<uuid>/readcache_max_filesize, which can be used to disable read caching entirely - A fix for the metadata performance issue associated with zero-conf - the kernel-source package was fixed, and is now being made available The two known issues previously mentioned are still present: - running a client and OST on the same node is still not 100% stable. Application or system hangs are possible. - If you run out of free space, the "out of space" errors may be silently eaten by the write cache, and not communicated to the application. The only workaround is to avoid running out of free space. There are many smaller fixes in this release; a detailed list can be found on the web site. As usual, please read carefully what is stated on the download page. We know that this code can be used with considerable success and excellent performance. Yet we expect many problems will arise when people not familiar with the software start using it. We want to work with you to eliminate those problems and improve documentation. If you have problems with Lustre, you may find some help on the lustre-discuss mailing list: lustre-discuss@lists.clusterfs.com For those requiring additional support or guaranteed response times, per-incident and annual service agreements are also available. For the Lustre team, -Phil
Nicholas Henke wrote:> On Sun, 2004-01-11 at 18:01, Phil Schwan wrote: > >>There are two known issues in particular which are not yet fixed: >> >> - running a client and OST on the same node is still not 100% stable. >> Application or system hangs are possible. > > Is this an issue with multiple OSTs, or in a single OST configuration as > well? I am guessing by ''running a client'', you mean having lustre > mounted on that node ?It is an issue with even one OST, and indeed, I mean having Lustre mounted on that OST node. The problem is that the VM on that node can decide that it is low on free memory, and want to flush dirty Lustre client pages. When this happens, Lustre needs to be able to get that data from the client to the OST and onto disk without allocating any memory that could block (and thus deadlock). This is true even when a client is mounted on a node without an OST, but it is compounded when they''re running on the same node. Thanks-- -Phil
On Sun, 2004-01-11 at 18:01, Phil Schwan wrote:> > There are two known issues in particular which are not yet fixed: > > - running a client and OST on the same node is still not 100% stable. > Application or system hangs are possible.Is this an issue with multiple OSTs, or in a single OST configuration as well? I am guessing by ''running a client'', you mean having lustre mounted on that node ? Nic -- Nicholas Henke Penguin Herder & Linux Cluster System Programmer Liniac Project - Univ. of Pennsylvania