We have an occasional problem with parallel fortran programs that open files with status "old" or "unknown" returns errors on open. This seems to be a known problem according to this page (see the Known problems section at the end). http://www.hpsc.csiro.au/userguides/solar/localguide.php They claim this is fixed in a new version of lustre, but I cannot find any bugzilla entry on this. Do anyone know in which version of lustre this is fixed? We are currently running v1.8.0.1 on the servers. We have a 40 line fortran program that reproduce this error consistently on our current setup. Regards, r.
> We have an occasional problem with parallel fortran programs that open > files with status "old" or "unknown" returns errors on open. This seems > to be a known problem according to this page (see the Known problems > section at the end). > > http://www.hpsc.csiro.au/userguides/solar/localguide.php > > They claim this is fixed in a new version of lustre, but I cannot find > any bugzilla entry on this. Do anyone know in which version of lustre > this is fixed? We are currently running v1.8.0.1 on the servers. We > have a 40 line fortran program that reproduce this error consistently on > our current setup.Sounds like bug 17545: https://bugzilla.lustre.org/show_bug.cgi?id=17545 The issue is fixed for v1.8.2 and beyond. v1.8.0.1 is quite old and beyond end-of-life, upgrading would be advisable.
On Thursday, December 23, 2010 15:18:13 Rick Grubin wrote:> > We have an occasional problem with parallel fortran programs that open > > files with status "old" or "unknown" returns errors on open. This seems > > to be a known problem according to this page (see the Known problems > > section at the end). > > > > http://www.hpsc.csiro.au/userguides/solar/localguide.php > > > > They claim this is fixed in a new version of lustre, but I cannot find > > any bugzilla entry on this. Do anyone know in which version of lustre > > this is fixed? We are currently running v1.8.0.1 on the servers. We > > have a 40 line fortran program that reproduce this error consistently on > > our current setup. > > Sounds like bug 17545: https://bugzilla.lustre.org/show_bug.cgi?id=17545 > > The issue is fixed for v1.8.2 and beyond. > > v1.8.0.1 is quite old and beyond end-of-life, upgrading would be advisable.Thanks a lot for your quick reply! This seems to be it, we will upgrade next week. Merry christmas to all, Roy.
On Thu, Dec 23, 2010 at 03:48:50PM +0100, Roy Dragseth wrote:>On Thursday, December 23, 2010 15:18:13 Rick Grubin wrote: >> > We have an occasional problem with parallel fortran programs that open >> > files with status "old" or "unknown" returns errors on open. This seems >> Sounds like bug 17545: https://bugzilla.lustre.org/show_bug.cgi?id=17545 >> The issue is fixed for v1.8.2 and beyond. >Thanks a lot for your quick reply! This seems to be it, we will upgrade next >week.if you are using Intel Fortran, then I think your open() failures will probably continue even with latest Lustre, but at a lower rate. see https://bugzilla.lustre.org/show_bug.cgi?id=23978 this bug has flown under the radar a bit as it causes fairly cryptic app failures, and only Intel fortran hits it with any frequency. what the user sees usually just looks like a failed open with an oddly corrupted filename string. cheers, robin -- Dr Robin Humble, HPC Systems Analyst, NCI National Facility