Hi all, I''ve just gotten a new dual dual-core box at work, and am experimenting with Xen 3.0 on it. I''ve found that I can compile SMP just fine, and it sees all four processors, but when I add K8 NUMA, the compile breaks. I''ve benchmarked tasks taking 50-80% longer than native, and I''m pretty sure it''s because the kernel doesn''t know that different processors are closer to different areas of memory. Y''all are probably working on it, but I wanted to make sure you were aware that it''s broken. This box doesn''t need to go live for a while yet, so if you need me to try anything out, I''d be glad to. Thanks for all your work! Alex _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/7/05, Alexander Charbonnet <alexander@charbonnet.com> wrote:> Xen 3.0 on it. I''ve found that I can compile SMP just fine, and it sees all > four processors, but when I add K8 NUMA, the compile breaks.I just noted this complier breakage as well. Thinking about it, I expect the nature of Xen makes it tricky to get going at the moment. -- Nicholas Lee http://stateless.geek.nz gpg 8072 4F86 EDCD 4FC1 18EF 5BDD 07B0 9597 6D58 D70C _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7 Oct 2005, at 00:09, Nicholas Lee wrote:>> Xen 3.0 on it. I''ve found that I can compile SMP just fine, and it >> sees all >> four processors, but when I add K8 NUMA, the compile breaks. > > I just noted this complier breakage as well. > > Thinking about it, I expect the nature of Xen makes it tricky to get > going at the moment.We have some ideas about how to get this working, but you''re right that it is not a totally trivial thing to fix. Don''t hold your breath until at least 3.0 is out the door. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
I now think that lack of NUMA support isn''t the problem. I''ve done some benchmarking of my database-backed Perl Web app with the Apache benchmark tool, "ab". I ran the same query with differing total requests and concurrency. After the first time, which I did manually before the tests, all the results will be in the database query cache. Therefore, I don''t expect any disk I/O. The results seem to be significantly slower than they should be. Is this a problem with my config, the new x86-64 code, SMP stuff, or is it just the cost of doing business? This table shows the time in seconds for each request x concurrency and kernel. There''s native with NUMA, native without NUMA, and finally Domain0 (you may want to view this with a fixed font). NUMA noNUMA Dom0 20x1 44.8 45.5 67.4 20x2 23.2 24.0 36.6 20x3 17.9 18.4 30.2 20x4 13.3 13.5 23.9 20x8 14.6 14.6 23.6 40x2 46.1 47.5 72.8 40x4 25.2 26.6 46.1 40x8 28.7 27.8 40.4 Thanks, Alex On Friday 07 October 2005 03:58 am, Keir Fraser wrote:> On 7 Oct 2005, at 00:09, Nicholas Lee wrote: > >> Xen 3.0 on it. I''ve found that I can compile SMP just fine, and it > >> sees all > >> four processors, but when I add K8 NUMA, the compile breaks. > > > > I just noted this complier breakage as well. > > > > Thinking about it, I expect the nature of Xen makes it tricky to get > > going at the moment. > > We have some ideas about how to get this working, but you''re right that > it is not a totally trivial thing to fix. Don''t hold your breath until > at least 3.0 is out the door. > > -- Keir > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 10/8/05, Alexander Charbonnet <alexander@charbonnet.com> wrote:> I now think that lack of NUMA support isn''t the problem. I''ve done some > benchmarking of my database-backed Perl Web app with the Apache benchmark > tool, "ab". I ran the same query with differing total requests and > concurrency. After the first time, which I did manually before the tests, > all the results will be in the database query cache. Therefore, I don''t > expect any disk I/O.Could be Net IO. One thing I note with my Xen 2.0 setup is that it wasn''t very high performance. There are a couple of emails I''ve noted but haven''t had the time to try out yet: http://lists.xensource.com/archives/html/xen-users/2005-08/msg00211.html "virtual eth devices had packet queueing disabled (txqueuelen:0 on ifconfig), and had a 4k size max. ring buffer for transfering packets between domains. Whenever that buffer would run full, packets are dropped..." http://lists.xensource.com/archives/html/xen-users/2005-07/msg00708.html "The problem is that the data structure in which the ring buffer is organized has to fit into one memory page (4096 bytes)" http://lists.xensource.com/archives/html/xen-devel/2005-09/msg00196.html "default qdisc being inadequate" "tc qdisc add dev eth0 root tbf rate 50mbit latency 20ms burst 50k" One of the problems with Xen at the moment is the lack of documentation/cluebooks for stuff like this at the moment. Best practice guides etc. Still a learning process. -- Nicholas Lee http://stateless.geek.nz gpg 8072 4F86 EDCD 4FC1 18EF 5BDD 07B0 9597 6D58 D70C _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
The traffic over the local adapter between my app and MySQL may be significant. I''ve tried both the socket connection and the TCP connection without a big difference. I recompiled with the necessary kernel option and ran: tc qdisc add dev eth0 root tbf rate 50mbit latency 20ms burst 50k as suggested by one of your links, but there wasn''t a significant difference. I used ab to benchmark grabbing a 100MB file 100 times from localhost. Natively it took 15.9 seconds. Under Domain0 it took 41% longer, 22.4 seconds. That sure looks like it makes up a huge chunk of the overall slowdown I''m noticing. Any other tricks to try to get network performance closer to native? Thanks! Alex On Saturday 08 October 2005 06:16 am, Nicholas Lee wrote:> On 10/8/05, Alexander Charbonnet <alexander@charbonnet.com> wrote: > > I now think that lack of NUMA support isn''t the problem. I''ve done some > > benchmarking of my database-backed Perl Web app with the Apache benchmark > > tool, "ab". I ran the same query with differing total requests and > > concurrency. After the first time, which I did manually before the > > tests, all the results will be in the database query cache. Therefore, I > > don''t expect any disk I/O. > > Could be Net IO. One thing I note with my Xen 2.0 setup is that it > wasn''t very high performance. > > There are a couple of emails I''ve noted but haven''t had the time to try out > yet: > > http://lists.xensource.com/archives/html/xen-users/2005-08/msg00211.html > > "virtual eth devices had packet queueing disabled (txqueuelen:0 on > ifconfig), and had a 4k size max. ring buffer for > transfering packets between domains. Whenever that buffer would run > full, packets are dropped..." > > http://lists.xensource.com/archives/html/xen-users/2005-07/msg00708.html > > "The problem is that the data structure in which the ring buffer is > organized has to fit into one memory page (4096 bytes)" > > http://lists.xensource.com/archives/html/xen-devel/2005-09/msg00196.html > > "default qdisc being inadequate" > "tc qdisc add dev eth0 root tbf rate 50mbit latency 20ms burst 50k" > > > One of the problems with Xen at the moment is the lack of > documentation/cluebooks for stuff like this at the moment. Best > practice guides etc. Still a learning process. > > -- > Nicholas Lee > http://stateless.geek.nz > gpg 8072 4F86 EDCD 4FC1 18EF 5BDD 07B0 9597 6D58 D70C > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I used ab to benchmark grabbing a 100MB file 100 times from > localhost. > Natively it took 15.9 seconds. Under Domain0 it took 41% > longer, 22.4 seconds. That sure looks like it makes up a > huge chunk of the overall slowdown I''m noticing. > > Any other tricks to try to get network performance closer to native?Can you try running that test on the same machine running a 32bit version of Xen. It''s possible something weird is happening just on 64 bit. Thanks, Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Alexander Charbonnet
2005-Oct-09 05:46 UTC
[Xen-devel] x86-64 Net Performance [was: Opteron server and NUMA]
> Can you try running that test on the same machine running a 32bit > version of Xen. It''s possible something weird is happening just on 64 > bit.file.zip is approximately 100MB. Ran `ab -n 300 -c 1 localhost/file.zip`. Disk I/O should not be a factor; I had Apache slurp the file into RAM before each test. Table shows time in seconds and the percent penalty for Xen. Native Domain0 Penalty 64-bit TLS 50.2 65.0 29.5% 32-bit TLS 59.6 64.8 8.7% 32-bit no TLS 59.6 62.6 5.0% On 32-bit, disabling TLS is as easy as moving /lib/tls. For 64-bit, it''s compiled into Debian''s glibc package. I can recompile and run that test if you think it''s worth doing. Alex _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Update: I went ahead and compiled glibc without TLS / NPTL. Also, I realized that I wasn''t allocating all my physical memory to Domain0 for these tests, so I was giving Xen a disadvantage. It didn''t change much, but those tests have been re-done. Here''s the odd thing: disabling TLS for 64-bit improved native performance by around 10%, but had no effect on Xen. Native Domain0 Penalty 64-bit TLS 50.2 65.0 29.5% 64-bit no TLS 44.8 65.0 45.1% 32-bit TLS 59.6 59.5 -0.2% 32-bit no TLS 59.6 60.2 1.0% Alex _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nicholas Lee
2005-Oct-09 10:07 UTC
[Xen-devel] Re: x86-64 Net Performance [was: Opteron server and NUMA]
On 10/9/05, Alexander Charbonnet <alexander@charbonnet.com> wrote:> compiled into Debian''s glibc package. I can recompile and run that test if > you think it''s worth doing.Very interesting number. Have you tried doing the same tests between two different domUs or dom0 and a domU of the same type? If you do recompile you could try the Xen TLS glibc patch. [1] Although I''m not sure if 64bit Xen will work with this. [1] http://wiki.xensource.com/xenwiki/XenSpecificGlibc -- Nicholas Lee http://stateless.geek.nz gpg 8072 4F86 EDCD 4FC1 18EF 5BDD 07B0 9597 6D58 D70C _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel