Hi folks, I was to post this over the weekend, but didn't get around to it, and look in the meanwhile Xen 3.0.0 was released! Good work all around, but I got some performance problems to report :) I'm running Changeset 00c349d5b40d269da4fec9510f1dd7c6bb3b3327. This is a dual CPU machine but I'm currently running with noht, nosmp. All tests are done using iperf -- both endpoints are sitting on the same switch on our cluster. The machines have Broadcom BCM5704 NICs. N/W performance from dom0 seems fine (though I used to get 930+ until a few days back): [ 6] 0.0-20.0 sec 1.95 GBytes 835 Mbits/sec However, from a VM, the throughput is really bad: [ 5] 0.0-20.0 sec 1.05 GBytes 450 Mbits/sec The above numbers are using the BVT scheduler. With the SEDF scheduler, the numbers are even worse (a VM can't get more than 300Mbps in my tests). I can post concrete figures if people are interested. I'm _not_ running pipelined netback. Is anyone else observing such performance problems? Diwaker -- Web/Blog/Gallery: http://floatingsun.net _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 5 Dec 2005, at 23:11, Diwaker Gupta wrote:> N/W performance from dom0 seems fine (though I used to get 930+ until > a few days back): > > [ 6] 0.0-20.0 sec 1.95 GBytes 835 Mbits/sec > > However, from a VM, the throughput is really bad: > > [ 5] 0.0-20.0 sec 1.05 GBytes 450 Mbits/sec > > The above numbers are using the BVT scheduler. With the SEDF > scheduler, the numbers are even worse (a VM can''t get more than > 300Mbps in my tests). I can post concrete figures if people are > interested. I''m _not_ running pipelined netback.We used to be able to saturate GigE with a single CPU, although admittedly burning quite a bit more CPU than using dom0 as the endpoint. I guess things have got out of tune, but there are a bunch of things we could do to encourage I/O batching (''x packets or y milliseconds'' style receive batching, and transmitting batches of packets every x milliseconds or when the domU goes idle). This, together with scheduler tuning, should definitely get the performance back, although its a balancing act with one CPU to ensure no stage of the I/O processing pipeline gets starved. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> I''m running Changeset > 00c349d5b40d269da4fec9510f1dd7c6bb3b3327. This is a dual CPU > machine but I''m currently running with noht, nosmp. All tests > are done using iperf -- both endpoints are sitting on the > same switch on our cluster. The machines have Broadcom BCM5704 NICs. > > N/W performance from dom0 seems fine (though I used to get > 930+ until a few days back): > > [ 6] 0.0-20.0 sec 1.95 GBytes 835 Mbits/sec > > However, from a VM, the throughput is really bad: > > [ 5] 0.0-20.0 sec 1.05 GBytes 450 Mbits/sec > > The above numbers are using the BVT scheduler. With the SEDF > scheduler, the numbers are even worse (a VM can''t get more > than 300Mbps in my tests). I can post concrete figures if > people are interested. I''m _not_ running pipelined netback. > > Is anyone else observing such performance problems?We haven''t really done much tuing for the single CPU case recently as the vast majority of platforms that Xen is used on are either hyperthreaded, dual core or SMP. The main focus of the 3.0.0 release has been corectness rather than performance tuning. We plan to do some tweaking over the coming weeks to address this. We used to get 900Mb/s with a single CPU core, and there''s absoloutely no reason why we shouldn''t do so again -- in fact, we should do better in terms of CPU usage than 2.0 as as we now have checksum offload. Now we have great performance monitoring tools like xen-oprofile, xenperf, xenmon etc it should be wuite straightforward to optimize things. Let''s just wait until we''ve delt with any critical bugs arising from the release... Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> We used to be able to saturate GigE with a single CPU, althoughSame here.> admittedly burning quite a bit more CPU than using dom0 as the > endpoint. I guess things have got out of tune, but there are a bunch of > things we could do to encourage I/O batching ('x packets or y > milliseconds' style receive batching, and transmitting batches of > packets every x milliseconds or when the domU goes idle). This, > together with scheduler tuning, should definitely get the performance > back, although its a balancing act with one CPU to ensure no stage of > the I/O processing pipeline gets starved.Look forward to it. Whats the deal with the pipelined backend? Whats the target scenario there? Meanwhile, though I agree that SMPs and hyperthreaded processors are becoming the norm, it still doesn't solve this problem. Even on an SMP machine, I can have dom0 co-located with a VM on the same CPU, and I'm not sure how different that would be from the current situation. On a related note, has anyone been working on the IDD stuff? Is it possible to wrap up a device driver in its own domain? The last time I tried, it basically wasn't possible, but I'd really be interested in helping out any which way to get it working. Diwaker -- Web/Blog/Gallery: http://floatingsun.net _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> The main focus of the 3.0.0 release has been corectness rather than > performance tuning. We plan to do some tweaking over the coming weeks to > address this. We used to get 900Mb/s with a single CPU core, and there's > absoloutely no reason why we shouldn't do so again -- in fact, we should > do better in terms of CPU usage than 2.0 as as we now have checksum > offload.I completely agree. Now that 3.0.0 is out of the door, it should be easier to focus on performance, rather than functionality and correctness.> Now we have great performance monitoring tools like xen-oprofile, > xenperf, xenmon etc it should be wuite straightforward to optimize > things. Let's just wait until we've delt with any critical bugs arising > from the release...Sounds good, thanks. Diwaker -- Web/Blog/Gallery: http://floatingsun.net _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Diwaker Gupta wrote:> Hi folks, > > I was to post this over the weekend, but didn''t get around to it, and > look in the meanwhile Xen 3.0.0 was released! Good work all around, > but I got some performance problems to report :) > > I''m running Changeset 00c349d5b40d269da4fec9510f1dd7c6bb3b3327. This > is a dual CPU machine but I''m currently running with noht, nosmp. All > tests are done using iperf -- both endpoints are sitting on the same > switch on our cluster. The machines have Broadcom BCM5704 NICs. > > N/W performance from dom0 seems fine (though I used to get 930+ until > a few days back): > > [ 6] 0.0-20.0 sec 1.95 GBytes 835 Mbits/sec > > However, from a VM, the throughput is really bad: > > [ 5] 0.0-20.0 sec 1.05 GBytes 450 Mbits/sec > > The above numbers are using the BVT scheduler. With the SEDF > scheduler, the numbers are even worse (a VM can''t get more than > 300Mbps in my tests). I can post concrete figures if people are > interested. I''m _not_ running pipelined netback. > > Is anyone else observing such performance problems?Diwaker, Can you run oprofile and obtain any kind of breakdown on that, if possible? Also, are you dedicating individual CPUs to dom0 and guest? (Avoiding much of the context switching on I/O). What are your memory allocations? How much of a bump do you get if you increase memory? thanks, Nivedita _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Diwaker, > > Can you run oprofile and obtain any kind of breakdown > on that, if possible?Not today. Unfortunately I'm going to be offline for the next couple of weeks. Anyhow, I'll try to see if I can post some more data in the interim.> Also, are you dedicating individual > CPUs to dom0 and guest? (Avoiding much of the context > switching on I/O)Like I mentioned in my email, everything is on the same CPU right now (nosmp, noht)> What are your memory allocations? How much > of a bump do you get if you increase memory?Currently, both dom0 and the vm have 128MB. I rebooted with dom0 having 512MB and VM with 256 MB. Here are the numbers: dom0: [ 5] 0.0-10.0 sec 987 MBytes 828 Mbits/sec VM: [ 5] 0.0-10.0 sec 938 MBytes 787 Mbits/sec So getting slightly better. I haven't run these with SEDF though, above are using BVT. -- Web/Blog/Gallery: http://floatingsun.net _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > What are your memory allocations? How much of a bump do you > get if you > > increase memory? > > Currently, both dom0 and the vm have 128MB. I rebooted with > dom0 having 512MB and VM with 256 MB. Here are the numbers: > > dom0: > [ 5] 0.0-10.0 sec 987 MBytes 828 Mbits/sec > > VM: > [ 5] 0.0-10.0 sec 938 MBytes 787 Mbits/sec > > So getting slightly better. I haven''t run these with SEDF > though, above are using BVT.Ah, I expect I know what''s going on here. Linux sizes the default socket buffer size based on how much ''system'' memory it has. With a 128MB domU it probably defaults to just 64K. For 256MB it probably steps up to 128KB. You can prove this by setting /proc/sys/net/core/{r,w}mem_{max,default}. For a gigabit network you need at least 128KB. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Ah, I expect I know what's going on here. > > Linux sizes the default socket buffer size based on how much 'system' > memory it has.Yes it does. But I've managed to saturate gigabit links from my VMs with the _exact_ same configuration earlier, hence my original email.> For a gigabit network you need at least 128KB.These are extremely low latency networks, so the requirements might actually be lower than this. Diwaker -- Web/Blog/Gallery: http://floatingsun.net _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Diwaker Gupta wrote:> Not today. Unfortunately I''m going to be offline for the next couple > of weeks. Anyhow, I''ll try to see if I can post some more data in the > interim.No problem, thanks for running your tests again..>>What are your memory allocations? How much >>of a bump do you get if you increase memory? > > > Currently, both dom0 and the vm have 128MB. I rebooted with dom0 > having 512MB and VM with 256 MB. Here are the numbers: > > dom0: > [ 5] 0.0-10.0 sec 987 MBytes 828 Mbits/sec > > VM: > [ 5] 0.0-10.0 sec 938 MBytes 787 Mbits/sec > > So getting slightly better. I haven''t run these with SEDF though, > above are using BVT.Yep, the bigger difference in Xen 3.0 is that it''s just more sensitive to memory - when I tested earlier in the summer most of the difference between dom0 and domU could be gained back by 350MB dom0 and 512MB domUs doing heavy network traffic. So you have: 128MB domU - 450 Mbits/sec 250MB domU - 787 Mbits/sec Which is roughly the same amount off the linear that I recall.. thanks, Nivedita _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt wrote:> > > What are your memory allocations? How much of a bump do you > >>get if you >> >>>increase memory? >> >>Currently, both dom0 and the vm have 128MB. I rebooted with >>dom0 having 512MB and VM with 256 MB. Here are the numbers: >> >>dom0: >>[ 5] 0.0-10.0 sec 987 MBytes 828 Mbits/sec >> >>VM: >>[ 5] 0.0-10.0 sec 938 MBytes 787 Mbits/sec >> >>So getting slightly better. I haven''t run these with SEDF >>though, above are using BVT. > > > Ah, I expect I know what''s going on here. > > Linux sizes the default socket buffer size based on how much ''system'' > memory it has. > > With a 128MB domU it probably defaults to just 64K. For 256MB it > probably steps up to 128KB. You can prove this by setting > /proc/sys/net/core/{r,w}mem_{max,default}.For TCP sockets, you''ll also have to bump up net/ipv4/tcp_rmem[1,2] and net/ipv4/tcp_wmem[1,2], don''t forget. Fow low mem systems, the default size of the tcp read buffer (tcp_rmem[1]) is 43689, and max size (tcp_rmem[2]) is 2*43689, which is really too low to do network heavy lifting.> For a gigabit network you need at least 128KB. > > IanAt least. thanks, Nivedita _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Ah, I expect I know what's going on here. > > > > Linux sizes the default socket buffer size based on how much 'system' > > memory it has. > > > > With a 128MB domU it probably defaults to just 64K. For 256MB it > > probably steps up to 128KB. You can prove this by setting > > /proc/sys/net/core/{r,w}mem_{max,default}. > > For TCP sockets, you'll also have to bump up > net/ipv4/tcp_rmem[1,2] and net/ipv4/tcp_wmem[1,2], > don't forget. > > Fow low mem systems, the default size of the tcp read buffer > (tcp_rmem[1]) is 43689, and max size (tcp_rmem[2]) is > 2*43689, which is really too low to do network heavy > lifting.Just as an aside, I wanted to point out that my dom0's were running at the exact same configuration (memory, socket sizes) as the VM. And I can mostly saturate a gig link from dom0. So while socket sizes might certainly have an impact, there are still additional bottlenecks that need to be fine tuned. -- Web/Blog/Gallery: http://floatingsun.net _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > Fow low mem systems, the default size of the tcp read buffer > > (tcp_rmem[1]) is 43689, and max size (tcp_rmem[2]) is > 2*43689, which > > is really too low to do network heavy lifting. > > Just as an aside, I wanted to point out that my dom0''s were > running at the exact same configuration (memory, socket > sizes) as the VM. And I can mostly saturate a gig link from > dom0. So while socket sizes might certainly have an impact, > there are still additional bottlenecks that need to be fine tuned.Xen is certainly going to be more sensitive to small socket buffer sizes when you''re trying to run dom0 and the guest on the same CPU thread. If you''re running a single TCP connection the socket buffer size basically determines how frequently you''re forced to switch between domains. Switching every 43KB at 1Gb/s amounts to thoudands of domain switches a second which burns CPU. Doubling the socket buffer size halves the rate of domain switches. Under Xen this would be a more sensible default. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 6 Dec 2005, at 00:04, Diwaker Gupta wrote: > Look forward to it. Whats the deal with the pipelined backend? Whats > the target scenario there? It''s a form of request-notification avoidance. It allows the frontend to avoid sending a notification (virtual ipi) to the backend if any previous requests are still in flight (no response received). In some situations, pipelining can mean you basically need to send no notifications, because the backend will pull down new requests as it responds to old ones. Thing is, it doesn;t work for some forms of packet processing in domain0. For example, if you send fragmented IP datagrams and they are reassembled in domain0 there may be a dependency between packets. Hence you need to send notifications even if buffers are in flight because you won''t get a response until the datagram is reassembled and forwarded.> Meanwhile, though I agree that SMPs and hyperthreaded processors are > becoming the norm, it still doesn''t solve this problem. Even on an SMP > machine, I can have dom0 co-located with a VM on the same CPU, and I''m > not sure how different that would be from the current situation.With enough hardware contexts, it''ll soon become sensible to dedicate a context to domain0 or your IDD. It''s certainly a very sensible use of a hyperthread (almost certainly a better use than multiprocessing within a guest, if you do a significant amount of i/o).> On a related note, has anyone been working on the IDD stuff? Is it > possible to wrap up a device driver in its own domain? The last time > I tried, it basically wasn''t possible, but I''d really be interested in > helping out any which way to get it working.Needs some PCI virtualization in Xen (managed by platform code in domain0). Nothing too tricky I think. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel