Hi, My lustre system specs: Lustre-1.6.6 RHEL4 2 lustre file systems: one consists of 4 OSTs and other consists of 20 OSTs 4 x OSS/6OSTs Storage: S2A9500 Clients: 600 Interconnect: Ethernet I noticed that my OSSs sometimes report very high load (around 500). I read that increasing number of ost_num_threads may help in such situation. I am trying to calculate the optimal number of OST threads for my OSSs. Each OSS has 16GB of RAM and 2 dual core CPUs. In the Lustre manual I read: "An OSS can have a maximum of 512 service threads and a minimum of 2 service threads. The number of service threads is a function of how much RAM and how many CPUs are on each OSS node (1 thread / 128MB * num_cpus)." So if I understand above statement correctly the equation to calculate number of OST threads will look like this: ost_num_threads = (RAM_size*Number_of_cores)/128MB For my particular case it gives 512 ost_num_threads which is the Lustre max number for this particular parameter. Manual says that each thread uses actually 1.5MB of RAM, so 768MB of RAM will be consumed on each of my OSSs for I/O threads. So I guess with 16GB of RAM the initial (default) value of the ost_num_threads is already being set to 512, is that correct? I know that adding more OSSs and OSTs might help but at the moment this isn''t an option for me. Is there any other way I could lower down high load on the OSSs? Can tuning client side help? Best regards, Wojciech
Hello! On Jan 25, 2009, at 6:56 PM, Wojciech Turek wrote:> For my particular case it gives 512 ost_num_threads which is the > Lustre > max number for this particular parameter. Manual says that each thread > uses actually 1.5MB of RAM, so 768MB of RAM will be consumed on each > of > my OSSs for I/O threads. > So I guess with 16GB of RAM the initial (default) value of the > ost_num_threads is already being set to 512, is that correct? > I know that adding more OSSs and OSTs might help but at the moment > this > isn''t an option for me. > Is there any other way I could lower down high load on the OSSs? Can > tuning client side help?To decrease the load you actually want to decrease the number of OST threads (ost_num_threads module parameter to ost.ko module). Essentially what is happening is your drives are only able to sustain certain amount of parallel i/o activity before degrading the performance due to all the seeking going on. Ideally you need to set the number of ost threads to this number, but this is complicated by the fact that different workloads (as in i/o sizes) result in different parallel streams the drives can handle. Anyway after you reach that point of congestion the performance only goes downhill, the threads just wait for i/o and contribute to your LA figures. You need to experiment a bit to see what number of threads makes sense for you. Perhaps start with number of threads equal to number of actual disk spindles you have on that node (if you use raid5+, subtract any dead spindles not used for actual data (e.g. 1/3 of spindles for raid5)) and and watch the performance of the clients during usual workloads (not LA on OSSes, it won''t go much higher than the max_threads you''d specify), if you feel the performance degraded, try increasing thread count somewhat and see how that works until performance starts degrading again or until you reach satisfactory performance. If your disk configuration does not have writeback cache enabled and your activity is mostly writes, you might also want to give patch from bug 16919 a try, it removes synchronous journal commit requirements and therefore should somewhat speedup OST writes in this case (unless you already use fast external journal, or unless there is a write cache enabled that somewhat mitigates the synchronousness of journal commit right now). Hope that helps. Bye, Oleg
On Mon, 2009-01-26 at 00:01 -0500, Oleg Drokin wrote:> Hello!In addition to Oleg''s suggestions...> Essentially what is happening is your drives are only able to sustain > certain > amount of parallel i/o activity before degrading the performance due > to all the > seeking going on. Ideally you need to set the number of ost threads to > this > number, but this is complicated by the fact that different workloads > (as in > i/o sizes) result in different parallel streams the drives can handle.Understanding the performance of your storage hardware is exactly why we always recommend profiling it with the lustre iokit -- ideally, prior to deployment of the filesystem. The obdfilter-survey specifically profiles the overall throughput of your hardware while the sgpdd-survey profiles individual disks. The former is supposed to be usable, non-destructively, on an existing fileystem, however the latter is absolutely destructive and should not be run anywhere you want preserve existing data. Now, I mention the non-destructive nature of obdfilter-survey with trepidation. That is it''s intent, and for the number of times I have used it has proven to be as advertised, however I am doubtful that that specific aspect gets regularly tested by our QA department and as such there is always a possibility of a bug sneaking in which voids that intent. Proceed with caution. Anyway, the obdfilter-survey simply sends ranges of workloads to your OSTs, varying in thread counts and I/O sizes. When it''s all done it gives you a (figurative, or graphic if you use the plot scripts on teh data) picture of the performance abilities of your storage hardware and will show you where the saturation points are. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090126/1d0451cb/attachment.bin