Hi , How can one programmatically probe the lustre system an application is running on? At compile time I''d like access to the various lustre system limits , for example those listed in ch.32 of operations manual. Incidentally one I didn''t see listed in that chapter is the maximum number of OST''s a single file can be striped across. At run time I''d like to be able to probe the size (number of OSS, OST etc...) of the system the application is running on. I hope not to have to hard code such values. Thanks Burlen
On 2010-03-23, at 14:25, burlen wrote:> How can one programmatically probe the lustre system an application is > running on?Lustre-specific interfaces are generally "llapi_*" functions, from liblustreapi.> At compile time I''d like access to the various lustre system limits , > for example those listed in ch.32 of operations manual.There are no llapi_* functions for this today. Can you explain a bit better what you are trying to use this for? statfs(2) will tell you a number of limits, as will pathconf(3), and those are standard POSIX APIs.> Incidentally one I didn''t see listed in that chapter is the maximum > number of OST''s a single file can be striped across.That is the first thing listed:>> 32.1 Maximum Stripe Count >> The maximum number of stripe count is 160. This limit is hard- >> coded, but is near the upper limit imposed by the underlying ext3 >> file system. It may be increased in future releases. Under normal >> circumstances, the stripe count is not affected by ACLs. >> > At run time I''d like to be able to probe the size (number of OSS, OST > etc...) of the system the application is running on.One shortcut is to specify "-1" for the stripe count will stripe a file across all available OSTs, which is what most applications want, if they are not being striped over only 1 or 2 OSTs. If you are using MPIIO, the Lustre ADIO layer can optimize these things for you, based on application hints. If you could elaborate on your needs, there may not be any need to make your application more Lustre-aware. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
System limits are sometimes provided in a header, I wasn''t sure if Lustre adopted that approach. The llapi_* functions are great, I see how to set the stripe count and size. I wasn''t sure if there was also a function to query about the configuration, eg number of OST''s deployed? This would be for use in a global hybrid megnetospheric simulation that runs on a large scale (1E4-1E5 cores). The good striping parameters depend on the run, and could be calculated at run time. It can make a significant difference in our run times to have these set correctly. I am not sure if we always want a stripe count of the maximum. I think this depends on how many files we are synchronously writing, and the number of available OST''s total. Eg if there are 256 OST''s on some system and we have 2 files to write would it not make sense to set the stripe count to 128? We can''t rely on our user to set the Lustre parameter correctly. We can''t rely on the system defaults either, they typically aren''t set optimally for our use case. MPI hints look promising but the ADIO Lustre optimization are fairly new, as far as I understand not publically available in MPICH until next release (maybe in may?). We run on a variety of systems some with variety of MPI implementation (eg Cray, SGI). The MPI hints will only be useful on implementation that support the particular hint. From a consistency point of view we need to both make use of MPI hints and direct access via the llapi so that we run well on all those systems, regardless of which MPI implementation is deployed. Thanks Burlen Andreas Dilger wrote:> On 2010-03-23, at 14:25, burlen wrote: >> How can one programmatically probe the lustre system an application is >> running on? > > Lustre-specific interfaces are generally "llapi_*" functions, from > liblustreapi. > >> At compile time I''d like access to the various lustre system limits , >> for example those listed in ch.32 of operations manual. > > There are no llapi_* functions for this today. Can you explain a bit > better what you are trying to use this for? > > statfs(2) will tell you a number of limits, as will pathconf(3), and > those are standard POSIX APIs. > >> Incidentally one I didn''t see listed in that chapter is the maximum >> number of OST''s a single file can be striped across. > > That is the first thing listed: > >>> 32.1 Maximum Stripe Count >>> The maximum number of stripe count is 160. This limit is hard-coded, >>> but is near the upper limit imposed by the underlying ext3 file >>> system. It may be increased in future releases. Under normal >>> circumstances, the stripe count is not affected by ACLs. >>> >> At run time I''d like to be able to probe the size (number of OSS, OST >> etc...) of the system the application is running on. > > > One shortcut is to specify "-1" for the stripe count will stripe a > file across all available OSTs, which is what most applications want, > if they are not being striped over only 1 or 2 OSTs. > > If you are using MPIIO, the Lustre ADIO layer can optimize these > things for you, based on application hints. > > If you could elaborate on your needs, there may not be any need to > make your application more Lustre-aware. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >
burlen wrote:> System limits are sometimes provided in a header, I wasn''t sure if > Lustre adopted that approach. The llapi_* functions are great, I see how > to set the stripe count and size. I wasn''t sure if there was also a > function to query about the configuration, eg number of OST''s deployed? > > This would be for use in a global hybrid megnetospheric simulation that > runs on a large scale (1E4-1E5 cores). The good striping parameters > depend on the run, and could be calculated at run time. It can make a > significant difference in our run times to have these set correctly. I > am not sure if we always want a stripe count of the maximum. I think > this depends on how many files we are synchronously writing, and the > number of available OST''s total. Eg if there are 256 OST''s on some > system and we have 2 files to write would it not make sense to set the > stripe count to 128? > > We can''t rely on our user to set the Lustre parameter correctly. We > can''t rely on the system defaults either, they typically aren''t set > optimally for our use case. MPI hints look promising but the ADIO Lustre > optimization are fairly new, as far as I understand not publically > available in MPICH until next release (maybe in may?). We run on a > variety of systems some with variety of MPI implementation (eg Cray, > SGI). The MPI hints will only be useful on implementation that support > the particular hint. From a consistency point of view we need to both > make use of MPI hints and direct access via the llapi so that we run > well on all those systems, regardless of which MPI implementation is > deployed. >I don''t know what your constraints are, but should note that this sort of information (number of OSTs) can be obtained rather trivially from any lustre client via shell prompt, to wit: # lctl dl |grep OST |wc -l 2 or: # ls /proc/fs/lustre/osc | grep OST |wc -l 2 probably a few other ways to do that. Not as stylish as llapi_*.. cliffw> Thanks > Burlen > > > Andreas Dilger wrote: >> On 2010-03-23, at 14:25, burlen wrote: >>> How can one programmatically probe the lustre system an application is >>> running on? >> Lustre-specific interfaces are generally "llapi_*" functions, from >> liblustreapi. >> >>> At compile time I''d like access to the various lustre system limits , >>> for example those listed in ch.32 of operations manual. >> There are no llapi_* functions for this today. Can you explain a bit >> better what you are trying to use this for? >> >> statfs(2) will tell you a number of limits, as will pathconf(3), and >> those are standard POSIX APIs. >> >>> Incidentally one I didn''t see listed in that chapter is the maximum >>> number of OST''s a single file can be striped across. >> That is the first thing listed: >> >>>> 32.1 Maximum Stripe Count >>>> The maximum number of stripe count is 160. This limit is hard-coded, >>>> but is near the upper limit imposed by the underlying ext3 file >>>> system. It may be increased in future releases. Under normal >>>> circumstances, the stripe count is not affected by ACLs. >>>> >>> At run time I''d like to be able to probe the size (number of OSS, OST >>> etc...) of the system the application is running on. >> >> One shortcut is to specify "-1" for the stripe count will stripe a >> file across all available OSTs, which is what most applications want, >> if they are not being striped over only 1 or 2 OSTs. >> >> If you are using MPIIO, the Lustre ADIO layer can optimize these >> things for you, based on application hints. >> >> If you could elaborate on your needs, there may not be any need to >> make your application more Lustre-aware. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
> I don''t know what your constraints are, but should note that this sort > of information (number of OSTs) can be obtained rather trivially from > any lustre client via shell prompt, to wit:True, but parsing the output of a c "system" call is something I hoped to avoid. It might not be portable and might be fragile over time. This also gets at my motivation for asking for a header with the Lustre limits, if I hard code something down the road the limits change, and we are suddenly shooting ourselves in the foot. I think I have made a mistake about the MPI hints in my last mail,. The striping_* hints are a part of the MPI standard at least as far back as 2003. It says that they are reserved but implementations are not required to interpret them. That''s a pretty weak assurance. I''d like this thread to be considered by Lustre team as a feature request for better programmatic support. I think it makes sense because the performance is fairly sensitive to both the deployed hardware and striping parameters. There also can be more information regarding the specific IO needs available at the application level than at the MPI level. And MPI implementations don''t have to honor hints. Thanks, I am grateful for the help as I get up to speed on Lustre fs Burlen Cliff White wrote:> burlen wrote: >> System limits are sometimes provided in a header, I wasn''t sure if >> Lustre adopted that approach. The llapi_* functions are great, I see >> how to set the stripe count and size. I wasn''t sure if there was also >> a function to query about the configuration, eg number of OST''s >> deployed? >> >> This would be for use in a global hybrid megnetospheric simulation >> that runs on a large scale (1E4-1E5 cores). The good striping >> parameters depend on the run, and could be calculated at run time. It >> can make a significant difference in our run times to have these set >> correctly. I am not sure if we always want a stripe count of the >> maximum. I think this depends on how many files we are synchronously >> writing, and the number of available OST''s total. Eg if there are 256 >> OST''s on some system and we have 2 files to write would it not make >> sense to set the stripe count to 128? >> >> We can''t rely on our user to set the Lustre parameter correctly. We >> can''t rely on the system defaults either, they typically aren''t set >> optimally for our use case. MPI hints look promising but the ADIO >> Lustre optimization are fairly new, as far as I understand not >> publically available in MPICH until next release (maybe in may?). We >> run on a variety of systems some with variety of MPI implementation >> (eg Cray, SGI). The MPI hints will only be useful on implementation >> that support the particular hint. From a consistency point of view we >> need to both make use of MPI hints and direct access via the llapi so >> that we run well on all those systems, regardless of which MPI >> implementation is deployed. > > I don''t know what your constraints are, but should note that this sort > of information (number of OSTs) can be obtained rather trivially from > any lustre client via shell prompt, to wit: > # lctl dl |grep OST |wc -l > 2 > or: > # ls /proc/fs/lustre/osc | grep OST |wc -l > 2 > > probably a few other ways to do that. Not as stylish as llapi_*.. > > cliffw > >> Thanks >> Burlen >> >> >> Andreas Dilger wrote: >>> On 2010-03-23, at 14:25, burlen wrote: >>>> How can one programmatically probe the lustre system an application is >>>> running on? >>> Lustre-specific interfaces are generally "llapi_*" functions, from >>> liblustreapi. >>> >>>> At compile time I''d like access to the various lustre system limits , >>>> for example those listed in ch.32 of operations manual. >>> There are no llapi_* functions for this today. Can you explain a >>> bit better what you are trying to use this for? >>> >>> statfs(2) will tell you a number of limits, as will pathconf(3), and >>> those are standard POSIX APIs. >>> >>>> Incidentally one I didn''t see listed in that chapter is the maximum >>>> number of OST''s a single file can be striped across. >>> That is the first thing listed: >>> >>>>> 32.1 Maximum Stripe Count >>>>> The maximum number of stripe count is 160. This limit is >>>>> hard-coded, but is near the upper limit imposed by the underlying >>>>> ext3 file system. It may be increased in future releases. Under >>>>> normal circumstances, the stripe count is not affected by ACLs. >>>>> >>>> At run time I''d like to be able to probe the size (number of OSS, OST >>>> etc...) of the system the application is running on. >>> >>> One shortcut is to specify "-1" for the stripe count will stripe a >>> file across all available OSTs, which is what most applications >>> want, if they are not being striped over only 1 or 2 OSTs. >>> >>> If you are using MPIIO, the Lustre ADIO layer can optimize these >>> things for you, based on application hints. >>> >>> If you could elaborate on your needs, there may not be any need to >>> make your application more Lustre-aware. >>> >>> Cheers, Andreas >>> -- >>> Andreas Dilger >>> Sr. Staff Engineer, Lustre Group >>> Sun Microsystems of Canada, Inc. >>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >
On Thu, Mar 25, 2010 at 10:07:03AM -0700, burlen wrote:> > I don''t know what your constraints are, but should note that this sort > > of information (number of OSTs) can be obtained rather trivially from > > any lustre client via shell prompt, to wit: > True, but parsing the output of a c "system" call is something I hoped > to avoid. It might not be portable and might be fragile over time. > > This also gets at my motivation for asking for a header with the Lustre > limits, if I hard code something down the road the limits change, and we > are suddenly shooting ourselves in the foot.Instead of teaching the application some filesystem intrinsics, you could also teach the queueing system about your application''s output behaviour and let it set up and adequately configured working directory. GridEngine allows to run queue-specific prolog scripts for this purpose, other systems certainly offer similar features. Regards, Daniel.
On 2010-03-24, at 18:39, burlen wrote:> System limits are sometimes provided in a header, I wasn''t sure if > Lustre adopted that approach.Well, having static limits in a header doesn''t help if the limits are dynamic.> The llapi_* functions are great, I see how to set the stripe count > and size. I wasn''t sure if there was also a function to query about > the configuration, eg number of OST''s deployed?There isn''t directly such a function, but indirectly this is possible to get from userspace without changing Lustre or the liblustreapi library: /* Return the number of OSTs configured for the filesystem on which the * file descriptor "fd" is opened. * Return +ve number of OSTs on success, or -ve errno on failure. */ int get_ost_count(int fd) { int ost_count = 0; int rc; rc = llapi_lov_get_uuids(fd, NULL, &ost_count); if (rc < 0) return rc; return ost_count; } /* return the maximum possible number of OSTs any file can be striped over * for the filesystem on which the file descriptor "fd" is opened. * Return +ve max_stripe_count on success, or -ve errno on failure. */ int get_max_stripe_count(int fd) { int max_stripe_count = 0; int rc; rc = llapi_lov_get_uuids(fd, NULL, &max_stripe_count); if (rc < 0) return rc; if (max_stripe_count > LOV_MAX_STRIPE_COUNT) max_stripe_count = LOV_MAX_STRIPE_COUNT; return max_stripe_count; } Note that there is a drawback from forcing a file to have N stripes, vs. letting Lustre make this decision. If you request N stripes, but e.g. one OST is unavailable (full, offline, whatever) your file creation will fail. If Lustre is making this decision it will use the currently available OSTs, regardless of how many there are configured.> This would be for use in a global hybrid megnetospheric simulation > that runs on a large scale (1E4-1E5 cores). The good striping > parameters depend on the run, and could be calculated at run time. > It can make a significant difference in our run times to have these > set correctly. I am not sure if we always want a stripe count of the > maximum. I think this depends on how many files we are synchronously > writing, and the number of available OST''s total. Eg if there are > 256 OST''s on some system and we have 2 files to write would it not > make sense to set the stripe count to 128?Sure, but not many applications run in this mode. Either they have 1:1 (file per process), N:M (shared single file, maximally-striped) or M:M (shared single file, maximally-striped, 1 process writing per stripe).> We can''t rely on our user to set the Lustre parameter correctly. We > can''t rely on the system defaults either, they typically aren''t set > optimally for our use case. MPI hints look promising but the ADIO > Lustre optimization are fairly new, as far as I understand not > publically available in MPICH until next release (maybe in may?). We > run on a variety of systems some with variety of MPI implementation > (eg Cray, SGI). The MPI hints will only be useful on implementation > that support the particular hint. From a consistency point of view > we need to both make use of MPI hints and direct access via the > llapi so that we run well on all those systems, regardless of which > MPI implementation is deployed. > Thanks > Burlen > > > Andreas Dilger wrote: >> On 2010-03-23, at 14:25, burlen wrote: >>> How can one programmatically probe the lustre system an >>> application is >>> running on? >> >> Lustre-specific interfaces are generally "llapi_*" functions, from >> liblustreapi. >> >>> At compile time I''d like access to the various lustre system >>> limits , >>> for example those listed in ch.32 of operations manual. >> >> There are no llapi_* functions for this today. Can you explain a >> bit better what you are trying to use this for? >> >> statfs(2) will tell you a number of limits, as will pathconf(3), >> and those are standard POSIX APIs. >> >>> Incidentally one I didn''t see listed in that chapter is the >>> maximum number of OST''s a single file can be striped across. >> >> That is the first thing listed: >> >>>> 32.1 Maximum Stripe Count >>>> The maximum number of stripe count is 160. This limit is hard- >>>> coded, but is near the upper limit imposed by the underlying ext3 >>>> file system. It may be increased in future releases. Under normal >>>> circumstances, the stripe count is not affected by ACLs. >>>> >>> At run time I''d like to be able to probe the size (number of OSS, >>> OST >>> etc...) of the system the application is running on. >> >> >> One shortcut is to specify "-1" for the stripe count will stripe a >> file across all available OSTs, which is what most applications >> want, if they are not being striped over only 1 or 2 OSTs. >> >> If you are using MPIIO, the Lustre ADIO layer can optimize these >> things for you, based on application hints. >> >> If you could elaborate on your needs, there may not be any need to >> make your application more Lustre-aware. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Sr. Staff Engineer, Lustre Group >> Sun Microsystems of Canada, Inc. >> >Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On 2010-03-25, at 15:12, Andreas Dilger wrote:>> The llapi_* functions are great, I see how to set the stripe count >> and size. I wasn''t sure if there was also a function to query about >> the configuration, eg number of OST''s deployed? > > There isn''t directly such a function, but indirectly this is possible > to get from userspace without changing Lustre or the liblustreapi > library:I filed bug 22472 for this issue, with a proposed patch, though the actual implementation may change before this is included into any release. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Christopher J. Morrone
2010-Mar-26 01:25 UTC
[Lustre-discuss] programmatic access to parameters
Cliff White wrote:> I don''t know what your constraints are, but should note that this sort > of information (number of OSTs) can be obtained rather trivially from > any lustre client via shell prompt, to wit: > # lctl dl |grep OST |wc -l > 2 > or: > # ls /proc/fs/lustre/osc | grep OST |wc -l > 2IF you only have one lustre filesystem mounted on that node.
On Mar 25, 2010, at 6:25 PM, Christopher J. Morrone wrote:> Cliff White wrote: > >> I don''t know what your constraints are, but should note that this sort >> of information (number of OSTs) can be obtained rather trivially from >> any lustre client via shell prompt, to wit: >> # lctl dl |grep OST |wc -l >> 2 >> or: >> # ls /proc/fs/lustre/osc | grep OST |wc -l >> 2 > > IF you only have one lustre filesystem mounted on that node.How about /proc/fs/lustre/lov/<filesystem-*/numobd $ cat /proc/fs/lustre/lov/nbp10-clilov-ffff81007e33b800/numobd 120 j -- Jason Rappleye System Administrator NASA Advanced Supercomputing Division NASA Ames Research Center Moffett Field, CA 94035
Christopher J. Morrone wrote:> Cliff White wrote: > >> I don''t know what your constraints are, but should note that this sort >> of information (number of OSTs) can be obtained rather trivially from >> any lustre client via shell prompt, to wit: >> # lctl dl |grep OST |wc -l >> 2 >> or: >> # ls /proc/fs/lustre/osc | grep OST |wc -l >> 2 > > IF you only have one lustre filesystem mounted on that node.okay, so change ''grep OST'' to ''grep $FSNAME-OST''..... :) cliffw> _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
On 2010-03-25, at 22:04, Jason Rappleye wrote:> On Mar 25, 2010, at 6:25 PM, Christopher J. Morrone wrote: >> Cliff White wrote: >>> I don''t know what your constraints are, but should note that this >>> sort >>> of information (number of OSTs) can be obtained rather trivially >>> from >>> any lustre client via shell prompt, to wit: >>> # lctl dl |grep OST |wc -l >>> 2 >>> or: >>> # ls /proc/fs/lustre/osc | grep OST |wc -l >>> 2 >> >> IF you only have one lustre filesystem mounted on that node. > > How about /proc/fs/lustre/lov/<filesystem-*/numobd > > $ cat /proc/fs/lustre/lov/nbp10-clilov-ffff81007e33b800/numobd > 120Please don''t access /proc/fs/lustre files directly. It is preferable to access them like "lctl get_param -n lov.$fsname-clilov-*.numobd". We are adding an llapi_get_param() interface for a future release of Lustre, but it wouldn''t be too hard for someone to create a wrapper for this in 1.8.x either, Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Thanks very much for your help and advise. Andreas Dilger wrote:> On 2010-03-25, at 15:12, Andreas Dilger wrote: >>> The llapi_* functions are great, I see how to set the stripe count >>> and size. I wasn''t sure if there was also a function to query about >>> the configuration, eg number of OST''s deployed? >> >> There isn''t directly such a function, but indirectly this is possible >> to get from userspace without changing Lustre or the liblustreapi >> library: > > > I filed bug 22472 for this issue, with a proposed patch, though the > actual implementation may change before this is included into any > release. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. >