I''m planning to hang 58 terabytes off of a PowerEdge 1950 with 4 CPUS and 8 gigs of memory. My MDS is a dual core Opteron with a 250gig raid1 metadata volume and 2GB of ram. Do you think this hardware configuration is sane? -Aaron
I''m planning to hang 58 terabytes off of a PowerEdge 1950 with 4 CPUS and 8 gigs of memory. My MDS is a dual core Opteron with a 250gig raid1 metadata volume and 2GB of ram. Do you think this hardware configuration is sane? -Aaron
Aaron Knister wrote:> I''m planning to hang 58 terabytes off of a PowerEdge 1950 with 4 CPUS > and 8 gigs of memory. My MDS is a dual core Opteron with a 250gig > raid1 metadata volume and 2GB of ram. Do you think this hardware > configuration is sane? > > -Aaron >Depends how much you push your disks. Your waiting on io will shoot up real quick if the disks slow down even a bit. My experience is you''ll probably want more machines unless you''re not pushing your disks (in which case why run lustre?) We have about 85TB of disk (in 24 luns) hanging off 4 PE2950''s with those same specs. They are set up in failover pairs (each handles 6 luns) but I can''t run too long on a single machine before it starts thrashing when it takes over the other nodes 6 luns. Daniel> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
In what sense does it start thrasing? I''m thinking about building a poweredge with 8 cpus and 16GB of mem to handle 3x 9TB luns. Does that sound sane? On Oct 5, 2007, at 1:02 PM, Daniel Leaberry wrote:> > Aaron Knister wrote: >> I''m planning to hang 58 terabytes off of a PowerEdge 1950 with 4 >> CPUS and 8 gigs of memory. My MDS is a dual core Opteron with a >> 250gig raid1 metadata volume and 2GB of ram. Do you think this >> hardware configuration is sane? >> >> -Aaron >> > Depends how much you push your disks. Your waiting on io will shoot > up real quick if the disks slow down even a bit. My experience is > you''ll probably want more machines unless you''re not pushing your > disks (in which case why run lustre?) > > We have about 85TB of disk (in 24 luns) hanging off 4 PE2950''s with > those same specs. They are set up in failover pairs (each handles 6 > luns) but I can''t run too long on a single machine before it starts > thrashing when it takes over the other nodes 6 luns. > > Daniel > >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at clusterfs.com >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>
Make that 6x 9.7TB luns. On Oct 5, 2007, at 1:11 PM, Aaron Knister wrote:> In what sense does it start thrasing? I''m thinking about building a > poweredge with 8 cpus and 16GB of mem to handle 3x 9TB luns. Does > that sound sane? > > On Oct 5, 2007, at 1:02 PM, Daniel Leaberry wrote: > >> >> Aaron Knister wrote: >>> I''m planning to hang 58 terabytes off of a PowerEdge 1950 with 4 >>> CPUS and 8 gigs of memory. My MDS is a dual core Opteron with a >>> 250gig raid1 metadata volume and 2GB of ram. Do you think this >>> hardware configuration is sane? >>> >>> -Aaron >>> >> Depends how much you push your disks. Your waiting on io will shoot >> up real quick if the disks slow down even a bit. My experience is >> you''ll probably want more machines unless you''re not pushing your >> disks (in which case why run lustre?) >> >> We have about 85TB of disk (in 24 luns) hanging off 4 PE2950''s with >> those same specs. They are set up in failover pairs (each handles 6 >> luns) but I can''t run too long on a single machine before it starts >> thrashing when it takes over the other nodes 6 luns. >> >> Daniel >> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
Hi, I have 8 MD1000''s (~90 TB of raw disk space) connected to 2 Dell servers ( 2 quad core cpu''s and 4 gigabit interfaces bonded) on an hpc cluster with 6 lustre volumes and parallel i/o. For bigger files, performance is more than adequate and with large number of small files, performance is close to terrible. Does the job, fast enough and is stable. If your storage requirement is for big files, one or two more pe1950 for parallel i/o helps. I have another Lustre installation with one OSS. Again the throughput is better than nfs when mounted on 21 nodes. The best performance i have seen for large number of small files is with gfs and ocfs2. (May be you could have a hybrid, with a few volumes with lustre and few volumes with ocfs2?? never tried that!) The onboard broadcoms on pe1950 has the tendency to drop frames, just need to bump up the receive buffers with ethtool. I had a few occasions where the two lustre oss had kernel panic on heavy i/o and large number of dropped frames. Once the dropping the problem is fixed, never had that problem so far. Regards Balagopal Aaron Knister wrote:> I''m planning to hang 58 terabytes off of a PowerEdge 1950 with 4 CPUS > and 8 gigs of memory. My MDS is a dual core Opteron with a 250gig > raid1 metadata volume and 2GB of ram. Do you think this hardware > configuration is sane? > > -Aaron > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
Daniel Leaberry Systems Administrator iArchives Tel: 801-494-6528 Cell: 801-376-6411 Aaron Knister wrote:> In what sense does it start thrasing? I''m thinking about building a > poweredge with 8 cpus and 16GB of mem to handle 3x 9TB luns. Does that > sound sane?We run heavy amounts of small files. We start thrashing when the disks can''t keep up. CPU really doesn''t have much to do with it. If you''re not running a lot of small files you''ll probably be fine. Daniel> > On Oct 5, 2007, at 1:02 PM, Daniel Leaberry wrote: > >> >> Aaron Knister wrote: >>> I''m planning to hang 58 terabytes off of a PowerEdge 1950 with 4 >>> CPUS and 8 gigs of memory. My MDS is a dual core Opteron with a >>> 250gig raid1 metadata volume and 2GB of ram. Do you think this >>> hardware configuration is sane? >>> >>> -Aaron >>> >> Depends how much you push your disks. Your waiting on io will shoot >> up real quick if the disks slow down even a bit. My experience is >> you''ll probably want more machines unless you''re not pushing your >> disks (in which case why run lustre?) >> >> We have about 85TB of disk (in 24 luns) hanging off 4 PE2950''s with >> those same specs. They are set up in failover pairs (each handles 6 >> luns) but I can''t run too long on a single machine before it starts >> thrashing when it takes over the other nodes 6 luns. >> >> Daniel >> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at clusterfs.com >>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>> >
On Oct 05, 2007 13:14 -0400, Aaron Knister wrote:> Make that 6x 9.7TB luns.Lustre (== ext3) doesn''t support >= 8TB LUNs. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
On Oct 05, 2007 11:02 -0600, Daniel Leaberry wrote:> Aaron Knister wrote: > > I''m planning to hang 58 terabytes off of a PowerEdge 1950 with 4 CPUS > > and 8 gigs of memory. My MDS is a dual core Opteron with a 250gig > > raid1 metadata volume and 2GB of ram. Do you think this hardware > > configuration is sane? > > We have about 85TB of disk (in 24 luns) hanging off 4 PE2950''s with > those same specs. They are set up in failover pairs (each handles 6 > luns) but I can''t run too long on a single machine before it starts > thrashing when it takes over the other nodes 6 luns.If you have 12 OSTs on a single node, that means up to 12 * 400MB = 4.8GB of RAM pinned just by the ext3 journal. Either you need a lot more RAM than this (8TB for example), or you need to shrink the journal size like 128MB (tune2fs to remove then re-add it). Using 128MB should be fine unless you have many hundreds of clients doing concurrent IO. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Oh, right I forgot about that. Well...if i had an 8tb lun and split it into 2 volume groups using LVM do you think the performance would be worse than making 2 raids at the hardware level? -Aaron On Oct 5, 2007, at 6:18 PM, Andreas Dilger wrote:> On Oct 05, 2007 13:14 -0400, Aaron Knister wrote: >> Make that 6x 9.7TB luns. > > Lustre (== ext3) doesn''t support >= 8TB LUNs. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. >Aaron Knister Associate Systems Administrator/Web Designer Center for Research on Environment and Water (301) 595-7001 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071006/c5d3106e/attachment-0002.html
On Oct 06, 2007 10:28 -0400, Aaron Knister wrote:> Oh, right I forgot about that. Well...if i had an 8tb lun and split > it into 2 volume groups using LVM do you think the performance would > be worse than making 2 raids at the hardware level?Well, it won''t be doing the disks any favours, since you''ll now have contention between the OSTs, and the kernel will be doing a poor job with the IO elevator decisions. I would suggest making 2 smaller RAID LUNs instead. In the end it is up to you to decide if the IO performance is acceptable. You can do some testing using lustre-iokit to see what the component device performance is.> On Oct 5, 2007, at 6:18 PM, Andreas Dilger wrote: > > >On Oct 05, 2007 13:14 -0400, Aaron Knister wrote: > >>Make that 6x 9.7TB luns. > > > >Lustre (== ext3) doesn''t support >= 8TB LUNs. >Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
As RH 5.1 will support 16TB ext3 partitions, will lustre inherit that functionality?> -----Original Message----- > From: lustre-discuss-bounces at clusterfs.com > [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of > Andreas Dilger > Sent: Wednesday, October 10, 2007 9:26 AM > To: Aaron Knister > Cc: lustre-discuss at clusterfs.com > Subject: Re: [Lustre-discuss] Hardware Question > > On Oct 06, 2007 10:28 -0400, Aaron Knister wrote: > > Oh, right I forgot about that. Well...if i had an 8tb lun > and split it > > into 2 volume groups using LVM do you think the performance > would be > > worse than making 2 raids at the hardware level? > > Well, it won''t be doing the disks any favours, since you''ll > now have contention between the OSTs, and the kernel will be > doing a poor job with the IO elevator decisions. I would > suggest making 2 smaller RAID LUNs instead. > > In the end it is up to you to decide if the IO performance is > acceptable. > You can do some testing using lustre-iokit to see what the > component device performance is. > > > On Oct 5, 2007, at 6:18 PM, Andreas Dilger wrote: > > > > >On Oct 05, 2007 13:14 -0400, Aaron Knister wrote: > > >>Make that 6x 9.7TB luns. > > > > > >Lustre (== ext3) doesn''t support >= 8TB LUNs. > > > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at clusterfs.com > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >
On Oct 10, 2007 09:40 -0600, Lundgren, Andrew wrote:> As RH 5.1 will support 16TB ext3 partitions, will lustre inherit that > functionality?We haven''t looked at this yet. The ldiskfs code is ext3 + patches, so there is some chance that it will work (more likely on 64-bit platforms), but we haven''t audited the ldiskfs patches to check if they are 32-bit clean.> > -----Original Message----- > > From: lustre-discuss-bounces at clusterfs.com > > [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of > > Andreas Dilger > > Sent: Wednesday, October 10, 2007 9:26 AM > > To: Aaron Knister > > Cc: lustre-discuss at clusterfs.com > > Subject: Re: [Lustre-discuss] Hardware Question > > > > On Oct 06, 2007 10:28 -0400, Aaron Knister wrote: > > > Oh, right I forgot about that. Well...if i had an 8tb lun > > and split it > > > into 2 volume groups using LVM do you think the performance > > would be > > > worse than making 2 raids at the hardware level? > > > > Well, it won''t be doing the disks any favours, since you''ll > > now have contention between the OSTs, and the kernel will be > > doing a poor job with the IO elevator decisions. I would > > suggest making 2 smaller RAID LUNs instead. > > > > In the end it is up to you to decide if the IO performance is > > acceptable. > > You can do some testing using lustre-iokit to see what the > > component device performance is. > > > > > On Oct 5, 2007, at 6:18 PM, Andreas Dilger wrote: > > > > > > >On Oct 05, 2007 13:14 -0400, Aaron Knister wrote: > > > >>Make that 6x 9.7TB luns. > > > > > > > >Lustre (== ext3) doesn''t support >= 8TB LUNs.Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
So if I have arrays with 15 drives in them should I just configure two smaller arrays? Also if I make a giant 30 terabyte filesystem of underlying say 6TB disk arrays and one of my disk arrays bites the dust what happens to the rest of the filesystem and how easy is it to recover from this situation? -Aaron On Oct 10, 2007, at 11:48 AM, Andreas Dilger wrote:> On Oct 10, 2007 09:40 -0600, Lundgren, Andrew wrote: >> As RH 5.1 will support 16TB ext3 partitions, will lustre inherit that >> functionality? > > We haven''t looked at this yet. The ldiskfs code is ext3 + patches, > so there > is some chance that it will work (more likely on 64-bit platforms), > but > we haven''t audited the ldiskfs patches to check if they are 32-bit > clean. > >>> -----Original Message----- >>> From: lustre-discuss-bounces at clusterfs.com >>> [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of >>> Andreas Dilger >>> Sent: Wednesday, October 10, 2007 9:26 AM >>> To: Aaron Knister >>> Cc: lustre-discuss at clusterfs.com >>> Subject: Re: [Lustre-discuss] Hardware Question >>> >>> On Oct 06, 2007 10:28 -0400, Aaron Knister wrote: >>>> Oh, right I forgot about that. Well...if i had an 8tb lun >>> and split it >>>> into 2 volume groups using LVM do you think the performance >>> would be >>>> worse than making 2 raids at the hardware level? >>> >>> Well, it won''t be doing the disks any favours, since you''ll >>> now have contention between the OSTs, and the kernel will be >>> doing a poor job with the IO elevator decisions. I would >>> suggest making 2 smaller RAID LUNs instead. >>> >>> In the end it is up to you to decide if the IO performance is >>> acceptable. >>> You can do some testing using lustre-iokit to see what the >>> component device performance is. >>> >>>> On Oct 5, 2007, at 6:18 PM, Andreas Dilger wrote: >>>> >>>>> On Oct 05, 2007 13:14 -0400, Aaron Knister wrote: >>>>>> Make that 6x 9.7TB luns. >>>>> >>>>> Lustre (== ext3) doesn''t support >= 8TB LUNs. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. >Aaron Knister Associate Systems Administrator/Web Designer Center for Research on Environment and Water (301) 595-7001 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071017/d27c72e2/attachment-0002.html
If the intention is not size, but to spread your I/Os on as many spindles as possible, you could still have these volume groups. Once you create these volumes, you could have them sliced into multiple LUNs where their collective sizes are acceptable by EXT3. Regards -Peter From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of Aaron Knister Sent: Wednesday, October 17, 2007 7:30 PM To: Andreas Dilger Cc: lustre-discuss at clusterfs.com; Lundgren, Andrew Subject: Re: [Lustre-discuss] Hardware Question So if I have arrays with 15 drives in them should I just configure two smaller arrays? Also if I make a giant 30 terabyte filesystem of underlying say 6TB disk arrays and one of my disk arrays bites the dust what happens to the rest of the filesystem and how easy is it to recover from this situation? -Aaron On Oct 10, 2007, at 11:48 AM, Andreas Dilger wrote: On Oct 10, 2007 09:40 -0600, Lundgren, Andrew wrote: As RH 5.1 will support 16TB ext3 partitions, will lustre inherit that functionality? We haven''t looked at this yet. The ldiskfs code is ext3 + patches, so there is some chance that it will work (more likely on 64-bit platforms), but we haven''t audited the ldiskfs patches to check if they are 32-bit clean. -----Original Message----- From: lustre-discuss-bounces at clusterfs.com [mailto:lustre-discuss-bounces at clusterfs.com] On Behalf Of Andreas Dilger Sent: Wednesday, October 10, 2007 9:26 AM To: Aaron Knister Cc: lustre-discuss at clusterfs.com Subject: Re: [Lustre-discuss] Hardware Question On Oct 06, 2007 10:28 -0400, Aaron Knister wrote: Oh, right I forgot about that. Well...if i had an 8tb lun and split it into 2 volume groups using LVM do you think the performance would be worse than making 2 raids at the hardware level? Well, it won''t be doing the disks any favours, since you''ll now have contention between the OSTs, and the kernel will be doing a poor job with the IO elevator decisions. I would suggest making 2 smaller RAID LUNs instead. In the end it is up to you to decide if the IO performance is acceptable. You can do some testing using lustre-iokit to see what the component device performance is. On Oct 5, 2007, at 6:18 PM, Andreas Dilger wrote: On Oct 05, 2007 13:14 -0400, Aaron Knister wrote: Make that 6x 9.7TB luns. Lustre (== ext3) doesn''t support >= 8TB LUNs. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. Aaron Knister Associate Systems Administrator/Web Designer Center for Research on Environment and Water (301) 595-7001 aaron at iges.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20071017/2630fd5e/attachment-0002.html