I was curious what limitations exist for o2ib network numbers. Most of the time I am dealing with o2ib0, o2ib1, etc. As as experiment, I tried configuring a machine with o2ib1000, and that seemed to be OK. I figured there must be some limit on how large the network number can get, but after doing some searching, I have been unable to find any docs that specify a limit. Does any know what the max network number is? -- Rick Mohr HPC Systems Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu/
Hi, LNet reserved 32 bits for network number, so you can choose a very large network number if only have a few networks, but really create many networks will have some issues: - o2iblnd will pre-allocate memory resources for each network, so it will consume a lot of memory - Main stream LNet will have performance issue if there''re many networks, for example, hundreds, although it''s not difficult to fix this. Liang On Aug 8, 2012, at 10:23 PM, Rick Mohr wrote:> > I was curious what limitations exist for o2ib network numbers. Most of > the time I am dealing with o2ib0, o2ib1, etc. As as experiment, I tried > configuring a machine with o2ib1000, and that seemed to be OK. I > figured there must be some limit on how large the network number can > get, but after doing some searching, I have been unable to find any docs > that specify a limit. Does any know what the max network number is? > > -- > Rick Mohr > HPC Systems Administrator > National Institute for Computational Sciences > http://www.nics.tennessee.edu/ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Liang, What main stream perf. issue do you refer to? Is there a JIRA ticket tracking it? Thanks, -Cory On 08/08/2012 09:38 AM, Liang Zhen wrote:> Hi, LNet reserved 32 bits for network number, so you can choose a very large network number if only have a few networks, but really create many networks will have some issues: > - o2iblnd will pre-allocate memory resources for each network, so it will consume a lot of memory > - Main stream LNet will have performance issue if there''re many networks, for example, hundreds, although it''s not difficult to fix this. > > Liang > > On Aug 8, 2012, at 10:23 PM, Rick Mohr wrote: > >> >> I was curious what limitations exist for o2ib network numbers. Most of >> the time I am dealing with o2ib0, o2ib1, etc. As as experiment, I tried >> configuring a machine with o2ib1000, and that seemed to be OK. I >> figured there must be some limit on how large the network number can >> get, but after doing some searching, I have been unable to find any docs >> that specify a limit. Does any know what the max network number is? >> >> -- >> Rick Mohr >> HPC Systems Administrator >> National Institute for Computational Sciences >> http://www.nics.tennessee.edu/ >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Wed, 2012-08-08 at 22:38 +0800, Liang Zhen wrote:> LNet reserved 32 bits for network number, so you can choose a very > large network number if only have a few networksThanks. That was exactly what I was looking for.> but really create many networks will have some issues: > - o2iblnd will pre-allocate memory resources for each network, so it will consume a lot of memoryI am mainly looking at using network numbers to keep things better organized. For example, instead of having two different clusters each use o2ib1 for their internal networks, I could assign o2ib101 to one cluster and o2ib102 to the second (reserving o2ib[0-100] for other purposes). Any given client would probably only know about a few networks (maybe 2-3), but the lustre servers would obviously need to have more (maybe 10-20). Is there an estimate on how much memory is consumed for each network? Also, if a node has o2ib0 and o2ib5 configured, will it just allocate memory for those networks, or will it also allocate memory for o2ib[1-4] even if they are unused? (I wouldn''t expect it to allocate memory, but better to find it out now than discover my mistake later.)> - Main stream LNet will have performance issue if there''re many > networks, for example, hundreds, although it''s not difficult to fix > this.I don''t expect to have hundreds of networks, but I am curious how I would fix it if I ever did. Thanks. -- Rick Mohr HPC Systems Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu/
No Jira ticket yet, reason of the potential performance issue is straightforward, it''s because all LNet NIs are linked on a plain list, and we need scan the whole list for each sending/receiving, it''s not an issue for a few networks, but it could be problematic for hundreds or tens. Liang On Aug 8, 2012, at 10:48 PM, Cory Spitz wrote:> Liang, > > What main stream perf. issue do you refer to? Is there a JIRA ticket > tracking it? > > Thanks, > -Cory > > On 08/08/2012 09:38 AM, Liang Zhen wrote: >> Hi, LNet reserved 32 bits for network number, so you can choose a very large network number if only have a few networks, but really create many networks will have some issues: >> - o2iblnd will pre-allocate memory resources for each network, so it will consume a lot of memory >> - Main stream LNet will have performance issue if there''re many networks, for example, hundreds, although it''s not difficult to fix this. >> >> Liang >> >> On Aug 8, 2012, at 10:23 PM, Rick Mohr wrote: >> >>> >>> I was curious what limitations exist for o2ib network numbers. Most of >>> the time I am dealing with o2ib0, o2ib1, etc. As as experiment, I tried >>> configuring a machine with o2ib1000, and that seemed to be OK. I >>> figured there must be some limit on how large the network number can >>> get, but after doing some searching, I have been unable to find any docs >>> that specify a limit. Does any know what the max network number is? >>> >>> -- >>> Rick Mohr >>> HPC Systems Administrator >>> National Institute for Computational Sciences >>> http://www.nics.tennessee.edu/ >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Thu, 2012-08-09 at 15:14 +0800, Liang Zhen wrote:> No Jira ticket yet, reason of the potential performance issue is > straightforward, it''s because all LNet NIs are linked on a plain list, > and we need scan the whole list for each sending/receiving, it''s not > an issue for a few networks, but it could be problematic for hundreds > or tens.Are there any whitepapers or other docs which quantify the potential performance issue? I can easily see how this might be a problem for a hundred networks, but given that I am not likely to have a setup with that many networks, I probably wouldn''t lose any sleep over it. However, if there can be performance issues with as few as 10 networks, then this becomes much more relevant and may require me to rethink some of my choices. I''d appreciate any information which could help me with those decisions. -- Rick Mohr HPC Systems Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu/