Dan Magenheimer
2009-Mar-19 17:52 UTC
[Xen-devel] [RFO] #2: removing a concurrency bottleneck
Request for opinion #2: In order to remove a (last?) concurrency bottleneck in tmem, I have to replicate a pair of fairly large buffers, one is two pages and the other is 8 pages. (Note that if tmem ever works on ia64, pagesize is larger.) Since the buffers are too large for the stack, they are declared as globals and protected by a single lock. But the buffers are used for compression, which can take quite a bit of time (up to tens of thousands of cycles and probably >80% of the total time spent in tmem), and so are magnets for any spinlock. I see two solutions: cascading or per-cpu. In per-cpu, I would allocate at system initialization one pair of buffers for each cpu (question: num_present_cpus, num_online_cpus, or num_possible_cpus?). Then no lock is required. In cascading, I would allocate a small number of pairs of buffers, perhaps only two or three, and "trylock" each, falling back to trylock the second if locked, then the third and so on, then spinlock if all are in use. Statistically this is probably good enough, unless I choose a small number, and Xen is running on a huge box. I suppose a combination of the two would be to cascade, but dynamically choose and allocate the quantity of buffers based on (maybe log+1 of?) the number of cpus (again, present, online, or possible?). But this is probably going overboard. Opinions? And if per-cpu, is the current Xen infrastructure sufficiently robust to handle hot-plug CPUs and I should too? Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-19 18:27 UTC
Re: [Xen-devel] [RFO] #2: removing a concurrency bottleneck
On 19/03/2009 17:52, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> I see two solutions: cascading or per-cpu. > > In per-cpu, I would allocate at system initialization one > pair of buffers for each cpu (question: num_present_cpus, > num_online_cpus, or num_possible_cpus?). Then no lock > is required.Just do this. A few pages per cpu is no significant overhead. I suggest you allocate the buffers and store their addresses in your own array of size NR_CPUS. You could do this from an __initcall function, or from some other boot-time init function you have handy. The percpu stuff is a bit fragile to large allocations (since we manually size the percpu region!) and you will get no real benefit from it since we do not free/alloc the percpu regions as cpus are onlined/offlined. -- keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2009-Mar-20 08:30 UTC
Re: [Xen-devel] [RFO] #2: removing a concurrency bottleneck
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 19.03.09 18:52 >>> >In per-cpu, I would allocate at system initialization one >pair of buffers for each cpu (question: num_present_cpus, >num_online_cpus, or num_possible_cpus?). Then no lock >is required.It should certainly be num_possible_cpus(), which (hopefully) is identical to num_present_cpus() until physical CPU hotplug gets supported, in which case some infrastructure will need to be added anyway so that your and other code could do such per-CPU allocations on demand. Other than Keir suggested, I''d not recommend adding further NR_CPUS sized arrays (based mostly on how long it took to mostly (fully?) get rid of them in Linux in order to support huge systems) - just use per-CPU pointers to dynamically allocated memory. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-20 08:46 UTC
Re: [Xen-devel] [RFO] #2: removing a concurrency bottleneck
On 20/03/2009 08:30, "Jan Beulich" <jbeulich@novell.com> wrote:> It should certainly be num_possible_cpus(), which (hopefully) is identical to > num_present_cpus() until physical CPU hotplug gets supported, in which > case some infrastructure will need to be added anyway so that your and > other code could do such per-CPU allocations on demand. > > Other than Keir suggested, I''d not recommend adding further NR_CPUS > sized arrays (based mostly on how long it took to mostly (fully?) get rid of > them in Linux in order to support huge systems) - just use per-CPU pointers > to dynamically allocated memory.Oh yes, that''s a better idea! Please do that instead. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel