Hi. Just curious if GFS can be used in a HPC environment, like GPFS or Oracle OCFS2? Thanks Marcelo
Marcelo M. Garcia schrieb:> Hi. > > Just curious if GFS can be used in a HPC environment, like GPFS or > Oracle OCFS2? >I don''t think so. Comments from people in the HPC-business indicate that it doesn''t scale to the number of nodes that typically forms these kinds of environments. NFS still rulez there, together with more (ISILON/Panasas) or less (SUN) specialized NFS-serving-gear. Rainer
> > Just curious if GFS can be used in a HPC environment, like GPFS or > > Oracle OCFS2?> I don''t think so. Comments from people in the HPC-business indicate that > it doesn''t scale to the number of nodes that typically forms these kinds > of environments. > > NFS still rulez there, together with more (ISILON/Panasas) or less (SUN) > specialized NFS-serving-gear. > RainerNFS (<4.1) doesn''t scale either. I would say that GPFS and Lustre is more usable than NFS in an HPC environment. You need a parallell file system when the data rates gets higher. But much depends on the I/O-profile of the jobs. /jens -- Jens Larsson, NSC, Link?pings universitet, SE-58183 LINK?PING, SWEDEN Phone: +46-13-281432, Mobile: +46-709-521432, E-mail: jens at nsc.liu.se GPG/PGP Key: 1024D/C21BB2C7 2001-02-27 Jens Larsson <jens at nsc.liu.se> Key Fingerprint: BAEF 85CF BF1D 7A69 C965 2EE6 C541 D57F C21B B2C7
I''d also like to test gfs for a 30 nodes cluster with sge. Tasks are often quite short, files are also quite small. Job rate can be quite high (can reach 10 to 20/second) We actualy use NFS under centos4.7 and experience coherency problems. I tested AFS, lustre, glusterFS. All showed too much overhead with small files, and less performance than nfs. The coherency problem seems related to the ext3 timestamp resolution (1 second), and the poor NFS cache system. It is not coherent even with the noac (no attribute cache option) First GFS test on 6 nodes (with gnbd) were ok, but there had been unexplained kernel panics (even when not working) that prevented further tests. I will try to upgrade the cluster to a more recent distribution and test GFS on 30 nodes. On Mon, Feb 23, 2009 at 3:52 PM, Jens Larsson <jens at nsc.liu.se> wrote:> > > Just curious if GFS can be used in a HPC environment, like GPFS or > > > Oracle OCFS2? > > > I don''t think so. Comments from people in the HPC-business indicate that > > it doesn''t scale to the number of nodes that typically forms these kinds > > of environments. > > > > NFS still rulez there, together with more (ISILON/Panasas) or less (SUN) > > specialized NFS-serving-gear. > > Rainer > > NFS (<4.1) doesn''t scale either. I would say that GPFS and Lustre is more > usable than NFS in an HPC environment. You need a parallell file system > when the data rates gets higher. But much depends on the I/O-profile of > the jobs. > > /jens > > -- > Jens Larsson, NSC, Link?pings universitet, SE-58183 LINK?PING, SWEDEN > Phone: +46-13-281432, Mobile: +46-709-521432, E-mail: jens at nsc.liu.se > GPG/PGP Key: 1024D/C21BB2C7 2001-02-27 Jens Larsson <jens at nsc.liu.se> > Key Fingerprint: BAEF 85CF BF1D 7A69 C965 2EE6 C541 D57F C21B B2C7 > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.centos.org/pipermail/centos/attachments/20090227/cc080040/attachment.html
Am 27.02.2009 um 01:44 schrieb Joe Barjo:> I''d also like to test gfs for a 30 nodes cluster with sge. > Tasks are often quite short, files are also quite small. Job rate > can be quite high (can reach 10 to 20/second) > We actualy use NFS under centos4.7 and experience coherency problems. > I tested AFS, lustre, glusterFS. All showed too much overhead with > small files, and less performance than nfs. > > The coherency problem seems related to the ext3 timestamp resolution > (1 second), and the poor NFS cache system. It is not coherent even > with the noac (no attribute cache option) > > First GFS test on 6 nodes (with gnbd) were ok, but there had been > unexplained kernel panics (even when not working) that prevented > further tests. > > I will try to upgrade the cluster to a more recent distribution and > test GFS on 30 nodes. >What''s your NFS-server, BTW? Rainer
Each node is also an nfs server (centos 4.7) One nfs server per user. I think it is still nfsv3, I will consider upgrading to v4. On Fri, Feb 27, 2009 at 1:47 AM, Rainer Duffner <rainer at ultra-secure.de>wrote:> > Am 27.02.2009 um 01:44 schrieb Joe Barjo: > > > I''d also like to test gfs for a 30 nodes cluster with sge. > > Tasks are often quite short, files are also quite small. Job rate > > can be quite high (can reach 10 to 20/second) > > We actualy use NFS under centos4.7 and experience coherency problems. > > I tested AFS, lustre, glusterFS. All showed too much overhead with > > small files, and less performance than nfs. > > > > The coherency problem seems related to the ext3 timestamp resolution > > (1 second), and the poor NFS cache system. It is not coherent even > > with the noac (no attribute cache option) > > > > First GFS test on 6 nodes (with gnbd) were ok, but there had been > > unexplained kernel panics (even when not working) that prevented > > further tests. > > > > I will try to upgrade the cluster to a more recent distribution and > > test GFS on 30 nodes. > > > > > What''s your NFS-server, BTW? > > > > Rainer > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.centos.org/pipermail/centos/attachments/20090227/36a41b1c/attachment.html
Am 27.02.2009 um 01:53 schrieb Joe Barjo:> Each node is also an nfs server (centos 4.7) > One nfs server per user. > I think it is still nfsv3, I will consider upgrading to v4.Problem is CentOS4.7. You''ll probably get an improvement with CentOS5. Rainer
Joe Barjo wrote:> First GFS test on 6 nodes (with gnbd) were ok, but there had been > unexplained kernel panics (even when not working) that prevented further > tests.Thats interesting to me, I''ve just recently been working with gfs - and using some of the newer kernels dont seem to have any issue of this nature. Are you sure its gfs and not gfs2 that you are using ? Also, where you able to trace the panic''s down the gfs code or is there something in the supporting cluster code causing issues perhaps ? cman/fence/clvm ? - KB