Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl conf_parm will persist between reboots/remounts?
Jean-Francois Le Fillatre
2012-Oct-10 14:17 UTC
[Lustre-discuss] Service thread count parameter
Hi David, It needs to be specified as a module parameter at boot time, in /etc/modprobe.conf. Check the Lustre tuning page: http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTuning.html Note that once created, the threads won''t be destroyed, so if you want to lower your thread count you''ll need to reboot your system. Thanks, JF On Tue, Oct 9, 2012 at 6:00 PM, David Noriega <tsk133 at my.utsa.edu> wrote:> Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl > conf_parm will persist between reboots/remounts? > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Jean-Fran?ois Le Fill?tre Calcul Qu?bec / Universit? Laval, Qu?bec, Canada jean-francois.lefillatre at clumeq.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121010/c95bd843/attachment.html
How does one estimate a good number of service threads? I''m not sure I understand the following: 1 thread / 128MB * number of cpus On Wed, Oct 10, 2012 at 9:17 AM, Jean-Francois Le Fillatre <jean-francois.lefillatre at clumeq.ca> wrote:> > Hi David, > > It needs to be specified as a module parameter at boot time, in > /etc/modprobe.conf. Check the Lustre tuning page: > http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html > http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTuning.html > > Note that once created, the threads won''t be destroyed, so if you want to > lower your thread count you''ll need to reboot your system. > > Thanks, > JF > > > On Tue, Oct 9, 2012 at 6:00 PM, David Noriega <tsk133 at my.utsa.edu> wrote: >> >> Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl >> conf_parm will persist between reboots/remounts? >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > -- > Jean-Fran?ois Le Fill?tre > Calcul Qu?bec / Universit? Laval, Qu?bec, Canada > jean-francois.lefillatre at clumeq.ca >-- David Noriega CSBC/CBI System Administrator University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.114 Phone: 210-458-7100 http://www.cbi.utsa.edu Please remember to acknowledge the RCMI grant , wording should be as stated below:This project was supported by a grant from the National Institute on Minority Health and Health Disparities (G12MD007591) from the National Institutes of Health. Also, remember to register all publications with PubMed Central.
Jean-Francois Le Fillatre
2012-Oct-15 19:01 UTC
[Lustre-discuss] Service thread count parameter
Hi David, Yes this is one strange formula... There are two ways of reading it: - "one thread per 128MB of RAM, times the number of CPUs in the system" On one of our typical OSSes (24 GB, 8 cores), that would give: ((24*1024) / 128) * 8 = 1536 And that''s waaaay out there... - "as many threads as you can fit (128MB * numbers of CPUs) in the RAM of your system" Which would then give: (24*1024) / (128*8) = 24 For a whole system, that''s really low. But for one single OST, it almost makes sense, in which case you''d want to multiply that by the number of OSTs connected to your OSS. The way we did it here is that we identified that the major limiting parameter is the software RAID, both in terms of bandwidth performance and CPU use. So I did some tests on a spare machine to get some load and perf figures for one array, using sgpdd-survey. Then, taking into account the number of OST per OSS (4) and the overhead of Lustre, I figured out that the best thread count would be around 96 (which is 24*4, spot on). One major limitation in Lustre 1.8.x (I don''t know if it has changed in 2.x) is that only the global thread count for the OSS can be specified. We have cases where all OSS threads are used on a single OST, and that completely trashes the bandwidth and latency. We would really need a max thread count per OST too, so that no single OST would get hit that way. On our systems, I''d put the max OST thread count at 32 (to stay in the software RAID performance sweet spot) and the max OSS thread count at 96 (to limit CPU load). Thanks! JF On Mon, Oct 15, 2012 at 2:20 PM, David Noriega <tsk133 at my.utsa.edu> wrote:> How does one estimate a good number of service threads? I''m not sure I > understand the following: 1 thread / 128MB * number of cpus > > On Wed, Oct 10, 2012 at 9:17 AM, Jean-Francois Le Fillatre > <jean-francois.lefillatre at clumeq.ca> wrote: > > > > Hi David, > > > > It needs to be specified as a module parameter at boot time, in > > /etc/modprobe.conf. Check the Lustre tuning page: > > http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html > > http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTuning.html > > > > Note that once created, the threads won''t be destroyed, so if you want to > > lower your thread count you''ll need to reboot your system. > > > > Thanks, > > JF > > > > > > On Tue, Oct 9, 2012 at 6:00 PM, David Noriega <tsk133 at my.utsa.edu> > wrote: > >> > >> Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl > >> conf_parm will persist between reboots/remounts? > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > > > -- > > Jean-Fran?ois Le Fill?tre > > Calcul Qu?bec / Universit? Laval, Qu?bec, Canada > > jean-francois.lefillatre at clumeq.ca > > > > > > -- > David Noriega > CSBC/CBI System Administrator > University of Texas at San Antonio > One UTSA Circle > San Antonio, TX 78249 > Office: BSE 3.114 > Phone: 210-458-7100 > http://www.cbi.utsa.edu > > Please remember to acknowledge the RCMI grant , wording should be as > stated below:This project was supported by a grant from the National > Institute on Minority Health and Health Disparities (G12MD007591) from > the National Institutes of Health. Also, remember to register all > publications with PubMed Central. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-- Jean-Fran?ois Le Fill?tre Calcul Qu?bec / Universit? Laval, Qu?bec, Canada jean-francois.lefillatre at clumeq.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121015/37efdadf/attachment.html
On Oct 15, 2012, at 1:01 PM, Jean-Francois Le Fillatre wrote:> Yes this is one strange formula... There are two ways of reading it: > > - "one thread per 128MB of RAM, times the number of CPUs in the system" > On one of our typical OSSes (24 GB, 8 cores), that would give: ((24*1024) / 128) * 8 = 1536 > And that''s waaaay out there?This formula was first created when there was perhaps 2GB of RAM and 2 cores in the system, intended to get some rough correspondence between server size and thread count. Note that there is also a default upper limit of 512 for threads created on the system. However, on some systems in the past with slow/synchronous storage, having 1500-2000 IO threads was still improving performance and could be set manually. That said, it was always intended as a reasonable heuristic and local performance testing/tuning should pick the optimal number.> - "as many threads as you can fit (128MB * numbers of CPUs) in the RAM of your system" > Which would then give: (24*1024) / (128*8) = 24This isn''t actually representing what the formula calculates.> For a whole system, that''s really low. But for one single OST, it almost makes sense, in which case you''d want to multiply that by the number of OSTs connected to your OSS.The rule of thumb that I''ve seen in the past, based on benchmarks at many sites is 32 threads/OST, which will keep the low-level elevators busy, but not make the queue depth too high.> The way we did it here is that we identified that the major limiting parameter is the software RAID, both in terms of bandwidth performance and CPU use. So I did some tests on a spare machine to get some load and perf figures for one array, using sgpdd-survey. Then, taking into account the number of OST per OSS (4) and the overhead of Lustre, I figured out that the best thread count would be around 96 (which is 24*4, spot on). > > One major limitation in Lustre 1.8.x (I don''t know if it has changed in 2.x) is that only the global thread count for the OSS can be specified. We have cases where all OSS threads are used on a single OST, and that completely trashes the bandwidth and latency. We would really need a max thread count per OST too, so that no single OST would get hit that way. On our systems, I''d put the max OST thread count at 32 (to stay in the software RAID performance sweet spot) and the max OSS thread count at 96 (to limit CPU load).Right. This is improved in Lustre 2.3, which binds the threads to specific cores. I believe it is also possible to bind OSTs to specific cores as well for PCI/HBA/HCA affinity though I''m not 100% sure if the OST/CPU binding was included or not.> Thanks! > JF > > > > On Mon, Oct 15, 2012 at 2:20 PM, David Noriega <tsk133 at my.utsa.edu> wrote: > How does one estimate a good number of service threads? I''m not sure I > understand the following: 1 thread / 128MB * number of cpus > > On Wed, Oct 10, 2012 at 9:17 AM, Jean-Francois Le Fillatre > <jean-francois.lefillatre at clumeq.ca> wrote: > > > > Hi David, > > > > It needs to be specified as a module parameter at boot time, in > > /etc/modprobe.conf. Check the Lustre tuning page: > > http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html > > http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTuning.html > > > > Note that once created, the threads won''t be destroyed, so if you want to > > lower your thread count you''ll need to reboot your system. > > > > Thanks, > > JF > > > > > > On Tue, Oct 9, 2012 at 6:00 PM, David Noriega <tsk133 at my.utsa.edu> wrote: > >> > >> Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl > >> conf_parm will persist between reboots/remounts? > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > > > -- > > Jean-Fran?ois Le Fill?tre > > Calcul Qu?bec / Universit? Laval, Qu?bec, Canada > > jean-francois.lefillatre at clumeq.ca > > > > > > -- > David Noriega > CSBC/CBI System Administrator > University of Texas at San Antonio > One UTSA Circle > San Antonio, TX 78249 > Office: BSE 3.114 > Phone: 210-458-7100 > http://www.cbi.utsa.edu > > Please remember to acknowledge the RCMI grant , wording should be as > stated below:This project was supported by a grant from the National > Institute on Minority Health and Health Disparities (G12MD007591) from > the National Institutes of Health. Also, remember to register all > publications with PubMed Central. > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > -- > Jean-Fran?ois Le Fill?tre > Calcul Qu?bec / Universit? Laval, Qu?bec, Canada > jean-francois.lefillatre at clumeq.ca > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Lustre Software Architect Intel Corporation
Jean-Francois Le Fillatre
2012-Oct-19 15:39 UTC
[Lustre-discuss] Service thread count parameter
Hi Andreas, Thanks for your update! Some comments below. JF On Mon, Oct 15, 2012 at 7:04 PM, Dilger, Andreas <andreas.dilger at intel.com>wrote:> On Oct 15, 2012, at 1:01 PM, Jean-Francois Le Fillatre wrote: > > Yes this is one strange formula... There are two ways of reading it: > > > > - "one thread per 128MB of RAM, times the number of CPUs in the system" > > On one of our typical OSSes (24 GB, 8 cores), that would give: > ((24*1024) / 128) * 8 = 1536 > > And that''s waaaay out there? > > This formula was first created when there was perhaps 2GB of RAM and 2 > cores in the system, intended to get some rough correspondence between > server size and thread count. Note that there is also a default upper > limit of 512 for threads created on the system. However, on some systems > in the past with slow/synchronous storage, having 1500-2000 IO threads was > still improving performance and could be set manually. That said, it was > always intended as a reasonable heuristic and local performance > testing/tuning should pick the optimal number. > > > - "as many threads as you can fit (128MB * numbers of CPUs) in the RAM > of your system" > > Which would then give: (24*1024) / (128*8) = 24 > > This isn''t actually representing what the formula calculates. > > > For a whole system, that''s really low. But for one single OST, it almost > makes sense, in which case you''d want to multiply that by the number of > OSTs connected to your OSS. > > The rule of thumb that I''ve seen in the past, based on benchmarks at many > sites is 32 threads/OST, which will keep the low-level elevators busy, but > not make the queue depth too high. > > > The way we did it here is that we identified that the major limiting > parameter is the software RAID, both in terms of bandwidth performance and > CPU use. So I did some tests on a spare machine to get some load and perf > figures for one array, using sgpdd-survey. Then, taking into account the > number of OST per OSS (4) and the overhead of Lustre, I figured out that > the best thread count would be around 96 (which is 24*4, spot on). > > > > One major limitation in Lustre 1.8.x (I don''t know if it has changed in > 2.x) is that only the global thread count for the OSS can be specified. We > have cases where all OSS threads are used on a single OST, and that > completely trashes the bandwidth and latency. We would really need a max > thread count per OST too, so that no single OST would get hit that way. On > our systems, I''d put the max OST thread count at 32 (to stay in the > software RAID performance sweet spot) and the max OSS thread count at 96 > (to limit CPU load). > > Right. This is improved in Lustre 2.3, which binds the threads to > specific cores. I believe it is also possible to bind OSTs to specific > cores as well for PCI/HBA/HCA affinity though I''m not 100% sure if the > OST/CPU binding was included or not. >Even in I could bind both OST and threads to a given CPU, it''s only a topological optimization for bandwidth and latency, but what would prevent a thread to answer a request for a target that is bound to another CPU? I mean, this is a very nice feature, and with proper configuration it can bring some notable improvements in performance, but I fail to see how it would solve the issue of having all threads on an OSS hammering a single OST. I am aware that this is a border case, in general use there the load is spread over multiple targets and there''s no problem. But we''ve hit it here a few times, and I know of some other sites where they have had the issue too. If you combine that with RAID issues (like slow disk / read errors / disk failure / rebuild or resync), you have a machine that locks up so bad that a cold reset is the only way to get it back under control. Worst case? Yes. But because the consequences of such a situation can be so nasty, I would be very happy to be able to control thread allocation per OST more finely.> > Thanks! > > JF > > > > > > > > On Mon, Oct 15, 2012 at 2:20 PM, David Noriega <tsk133 at my.utsa.edu> > wrote: > > How does one estimate a good number of service threads? I''m not sure I > > understand the following: 1 thread / 128MB * number of cpus > > > > On Wed, Oct 10, 2012 at 9:17 AM, Jean-Francois Le Fillatre > > <jean-francois.lefillatre at clumeq.ca> wrote: > > > > > > Hi David, > > > > > > It needs to be specified as a module parameter at boot time, in > > > /etc/modprobe.conf. Check the Lustre tuning page: > > > http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html > > > http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTuning.html > > > > > > Note that once created, the threads won''t be destroyed, so if you want > to > > > lower your thread count you''ll need to reboot your system. > > > > > > Thanks, > > > JF > > > > > > > > > On Tue, Oct 9, 2012 at 6:00 PM, David Noriega <tsk133 at my.utsa.edu> > wrote: > > >> > > >> Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl > > >> conf_parm will persist between reboots/remounts? > > >> _______________________________________________ > > >> Lustre-discuss mailing list > > >> Lustre-discuss at lists.lustre.org > > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > > > > > > > > -- > > > Jean-Fran?ois Le Fill?tre > > > Calcul Qu?bec / Universit? Laval, Qu?bec, Canada > > > jean-francois.lefillatre at clumeq.ca > > > > > > > > > > > -- > > David Noriega > > CSBC/CBI System Administrator > > University of Texas at San Antonio > > One UTSA Circle > > San Antonio, TX 78249 > > Office: BSE 3.114 > > Phone: 210-458-7100 > > http://www.cbi.utsa.edu > > > > Please remember to acknowledge the RCMI grant , wording should be as > > stated below:This project was supported by a grant from the National > > Institute on Minority Health and Health Disparities (G12MD007591) from > > the National Institutes of Health. Also, remember to register all > > publications with PubMed Central. > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > -- > > Jean-Fran?ois Le Fill?tre > > Calcul Qu?bec / Universit? Laval, Qu?bec, Canada > > jean-francois.lefillatre at clumeq.ca > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Cheers, Andreas > -- > Andreas Dilger > Lustre Software Architect > Intel Corporation > > > > > > >-- Jean-Fran?ois Le Fill?tre Calcul Qu?bec / Universit? Laval, Qu?bec, Canada jean-francois.lefillatre at clumeq.ca -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20121019/dd50dd5e/attachment.html