Dear list, is there a limit on the number of OSTs per OSS? TIA, Arne -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 6380 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090819/b2b91ced/attachment.bin
On Wed, 2009-08-19 at 11:45 +0200, Arne Wiebalck wrote:> > is there a limit on the number of OSTs per OSS?I have never run across one so there is likely no *practical* limit. I could very well be corrected on this though. Unless you are making lots and lots of small OSTs -- which is not usually beneficial anyway -- typically, you will run into resource limitations (memory, bus bandwidth, etc.) on an OSS before you hit a limit on the number of OSTs. Just how many OSTs are you considering, and how big will they be? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090819/1c437146/attachment-0004.bin
Brian J. Murrell wrote:> On Wed, 2009-08-19 at 11:45 +0200, Arne Wiebalck wrote: >> is there a limit on the number of OSTs per OSS? > > I have never run across one so there is likely no *practical* limit. I > could very well be corrected on this though. > > Unless you are making lots and lots of small OSTs -- which is not > usually beneficial anyway -- typically, you will run into resource > limitations (memory, bus bandwidth, etc.) on an OSS before you hit a > limit on the number of OSTs.I just thought I remembered there was something like 8 OSTs per OSS, but apparently I was wrong.> Just how many OSTs are you considering, and how big will they be?My OSSs will have 10 OSTs with 1TB each. Thanks for the quick reply, Arne -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 6380 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090819/e08a566a/attachment.bin
On Wed, 2009-08-19 at 13:55 +0200, Arne Wiebalck wrote:> > I just thought I remembered there was something like 8 OSTs per OSS, > but apparently I was wrong.No, nowhere near that low, if there is any practical limit.> My OSSs will have 10 OSTs with 1TB each.Very much within the realm of "common". b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090819/911b0442/attachment-0004.bin
Hi, Just a small feedback from our own experience. I agree with Brian about the fact that there is no strong limit on the number of OSTs per OSS in the Lustre code. But one should really take into account the available memory on OSSes when defining the number of OSTs per OSS (and so the size of each OST). If you do not have 1GB or 1.2 GB of memory per OST on your OSSes, you will run into serious trouble with "out of memory" messages. For instance, if you want 8 OSTs per OSS, your OSSes should have at least 10GB of RAM. Unfortunately we experienced those "out of memory" problems, so I advise you to read Lustre Operations Manual chapter 33.12 "OSS RAM Size for a Single OST". Cheers, Sebastien. Arne Wiebalck a ?crit :> Brian J. Murrell wrote: >> On Wed, 2009-08-19 at 11:45 +0200, Arne Wiebalck wrote: >>> is there a limit on the number of OSTs per OSS? >> >> I have never run across one so there is likely no *practical* limit. I >> could very well be corrected on this though. >> >> Unless you are making lots and lots of small OSTs -- which is not >> usually beneficial anyway -- typically, you will run into resource >> limitations (memory, bus bandwidth, etc.) on an OSS before you hit a >> limit on the number of OSTs. > > I just thought I remembered there was something like 8 OSTs per OSS, > but apparently I was wrong. > >> Just how many OSTs are you considering, and how big will they be? > > My OSSs will have 10 OSTs with 1TB each. > > Thanks for the quick reply, > Arne > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
OK, I will keep that in mind. Thanks, Arne S?bastien Buisson wrote:> Hi, > > Just a small feedback from our own experience. > I agree with Brian about the fact that there is no strong limit on the > number of OSTs per OSS in the Lustre code. But one should really take > into account the available memory on OSSes when defining the number of > OSTs per OSS (and so the size of each OST). If you do not have 1GB or > 1.2 GB of memory per OST on your OSSes, you will run into serious > trouble with "out of memory" messages. > > For instance, if you want 8 OSTs per OSS, your OSSes should have at > least 10GB of RAM. > > Unfortunately we experienced those "out of memory" problems, so I advise > you to read Lustre Operations Manual chapter 33.12 "OSS RAM Size for a > Single OST". > > Cheers, > Sebastien. > > > Arne Wiebalck a ?crit : >> Brian J. Murrell wrote: >>> On Wed, 2009-08-19 at 11:45 +0200, Arne Wiebalck wrote: >>>> is there a limit on the number of OSTs per OSS? >>> I have never run across one so there is likely no *practical* limit. I >>> could very well be corrected on this though. >>> >>> Unless you are making lots and lots of small OSTs -- which is not >>> usually beneficial anyway -- typically, you will run into resource >>> limitations (memory, bus bandwidth, etc.) on an OSS before you hit a >>> limit on the number of OSTs. >> I just thought I remembered there was something like 8 OSTs per OSS, >> but apparently I was wrong. >> >>> Just how many OSTs are you considering, and how big will they be? >> My OSSs will have 10 OSTs with 1TB each. >> >> Thanks for the quick reply, >> Arne >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 6380 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090819/0e9c817b/attachment-0001.bin
The journal pages are effectively pinned I think, so with 400MB journals on each OST, count on losing 400MB of memory at least. Here is some anecdotal data from our environment: We seem to do OK with 4G nodes having 3 OST''s each, including sometimes running with 6 during failover. On the other hand 2G nodes with 2 OST''s each would run out of memory when running with 4 during failover. We had to reduce the journal size from 400MB to 200MB for failover to work reliably on the 2G nodes. In summary, with 400MB journals: - 0.66G per OST works - 0.50G per OST doesn''t That''s with servers running Lustre 1.6.6, RHEL 5.3, x86_64 arch, and socklnd. Jim On Wed, Aug 19, 2009 at 02:40:35PM +0200, Arne Wiebalck wrote:> > OK, I will keep that in mind. > > Thanks, > Arne > > S?bastien Buisson wrote: > >Hi, > > > >Just a small feedback from our own experience. > >I agree with Brian about the fact that there is no strong limit on the > >number of OSTs per OSS in the Lustre code. But one should really take > >into account the available memory on OSSes when defining the number of > >OSTs per OSS (and so the size of each OST). If you do not have 1GB or > >1.2 GB of memory per OST on your OSSes, you will run into serious > >trouble with "out of memory" messages. > > > >For instance, if you want 8 OSTs per OSS, your OSSes should have at > >least 10GB of RAM. > > > >Unfortunately we experienced those "out of memory" problems, so I advise > >you to read Lustre Operations Manual chapter 33.12 "OSS RAM Size for a > >Single OST". > > > >Cheers, > >Sebastien. > > > > > >Arne Wiebalck a ?crit : > >>Brian J. Murrell wrote: > >>>On Wed, 2009-08-19 at 11:45 +0200, Arne Wiebalck wrote: > >>>>is there a limit on the number of OSTs per OSS? > >>>I have never run across one so there is likely no *practical* limit. I > >>>could very well be corrected on this though. > >>> > >>>Unless you are making lots and lots of small OSTs -- which is not > >>>usually beneficial anyway -- typically, you will run into resource > >>>limitations (memory, bus bandwidth, etc.) on an OSS before you hit a > >>>limit on the number of OSTs. > >>I just thought I remembered there was something like 8 OSTs per OSS, > >>but apparently I was wrong. > >> > >>>Just how many OSTs are you considering, and how big will they be? > >>My OSSs will have 10 OSTs with 1TB each. > >> > >>Thanks for the quick reply, > >> Arne > >> > >> > >>------------------------------------------------------------------------ > >> > >>_______________________________________________ > >>Lustre-discuss mailing list > >>Lustre-discuss at lists.lustre.org > >>http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://*lists.lustre.org/mailman/listinfo/lustre-discuss
On Aug 19, 2009 13:55 +0200, Arne Wiebalck wrote:> Brian J. Murrell wrote: >> On Wed, 2009-08-19 at 11:45 +0200, Arne Wiebalck wrote: >> Unless you are making lots and lots of small OSTs -- which is not >> usually beneficial anyway -- typically, you will run into resource >> limitations (memory, bus bandwidth, etc.) on an OSS before you hit a >> limit on the number of OSTs. > > I just thought I remembered there was something like 8 OSTs per OSS, > but apparently I was wrong. > >> Just how many OSTs are you considering, and how big will they be? > > My OSSs will have 10 OSTs with 1TB each.Is there a reason to do this instead of, say, two 5TB OSTs using MD RAID-0? Or for that matter one 8+2 8TB OST with MD RAID-6? That will give better space utilization if you have large files, otherwise you will have a lot of smaller chunks of free space on each OST that cannot be utilized well when the OSTs are nearly full. If you are looking at straight performance it may be that 10x 1TB OSTs is the fastest, since each one can be seeked independently. Reducing the journal size from the default 400MB is probably not harmful if you have correspondingly more OSTs. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Andreas Dilger wrote:> On Aug 19, 2009 13:55 +0200, Arne Wiebalck wrote: >> Brian J. Murrell wrote: >>> On Wed, 2009-08-19 at 11:45 +0200, Arne Wiebalck wrote: >>> Unless you are making lots and lots of small OSTs -- which is not >>> usually beneficial anyway -- typically, you will run into resource >>> limitations (memory, bus bandwidth, etc.) on an OSS before you hit a >>> limit on the number of OSTs. >> I just thought I remembered there was something like 8 OSTs per OSS, >> but apparently I was wrong. >> >>> Just how many OSTs are you considering, and how big will they be? >> My OSSs will have 10 OSTs with 1TB each. > > Is there a reason to do this instead of, say, two 5TB OSTs using MD RAID-0?I thought a higher number of independent spindles was better. I should add that the 1TB OSTs are HW RAID-1s already.> Or for that matter one 8+2 8TB OST with MD RAID-6? That will give > better space utilization if you have large files, otherwise you will > have a lot of smaller chunks of free space on each OST that cannot > be utilized well when the OSTs are nearly full. > > If you are looking at straight performance it may be that 10x 1TB OSTs > is the fastest, since each one can be seeked independently. Reducing > the journal size from the default 400MB is probably not harmful if you > have correspondingly more OSTs.OK, thanks for the hints. Cheers, Arne -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 6380 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090820/f9effc79/attachment-0001.bin
Andreas,> Is there a reason to do this instead of, say, two 5TB OSTs using MD RAID-0? > Or for that matter one 8+2 8TB OST with MD RAID-6? That will give > better space utilization if you have large files, otherwise you willWhat is a ''large'' file for you? TIA, Arne -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 6380 bytes Desc: S/MIME Cryptographic Signature Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090820/46408784/attachment.bin
On Aug 20, 2009 09:27 +0200, Arne Wiebalck wrote:>> Is there a reason to do this instead of, say, two 5TB OSTs using MD RAID-0? >> Or for that matter one 8+2 8TB OST with MD RAID-6? That will give >> better space utilization if you have large files, otherwise you will > > What is a ''large'' file for you?Well, this is relative to the size of the OST itself, and the striping. For example, if you have 1TB OSTs and stripecount=1, but your files are 500GB in size, then you might only be able to fit one onto each OST (due to fs-internal overhead), leaving about 50% of each OST unusable. If you have 8TB OSTs then you could store 15 files/OST, leaving about 6% of each OST unusable. This is "external" free space fragmentation. Similarly, if you have, say, 5% of space free in each OST, then for an 8TB OST the free space would likely be in 8x larger chunks of space, vs a 1TB OST. Other benefits of fewer, larger OSTs: - less configuration - more bandwidth per stripe - more space per file (if you have > 160 OSTs) The reasons for NOT going to a larger single OST are: - more resource contention - larger point of failure - longer e2fsck time Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.