John Byrne
2006-Dec-21 03:00 UTC
[Xen-devel] Live migration: "netbuf race" messages can cause significant perfomance impact
Hi, Someone found that doing a live migration of a domain that had ballooned down took far longer to migrate. (Ballooned down from 3000M to 1000M, 31 seconds vs 89 seconds, real time) I came up with a complex theory and asked him to look in the xend.log to confirm it. He didn''t, but he mentioned there was a lot of "netbuf race" messages in the log. In this particular case, live migration generated approximately 512000 "netbuf race" messages. Deleting the DPRINTF reduced the migration time to 11 seconds. While it is simple enough to submit a patch to delete this DPRINTF, perhaps something more subtle is called for such as modifying the migrate/save command paths to accept a debug argument and passing to xc_save? Thanks, John Byrne _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2006-Dec-21 18:18 UTC
RE: [Xen-devel] Live migration: "netbuf race" messages can cause significant perfomance impact
> Someone found that doing a live migration of a domain that hadballooned> down took far longer to migrate. (Ballooned down from 3000M to 1000M,31> seconds vs 89 seconds, real time) I came up with a complex theory and > asked him to look in the xend.log to confirm it. He didn''t, but he > mentioned there was a lot of "netbuf race" messages in the log. Inthis> particular case, live migration generated approximately 512000 "netbuf > race" messages. Deleting the DPRINTF reduced the migration time to 11 > seconds. > > While it is simple enough to submit a patch to delete this DPRINTF, > perhaps something more subtle is called for such as modifying the > migrate/save command paths to accept a debug argument and passing to > xc_save?Ideally, we''d do more than suppress the printf. We''re needless re-scanning the bitmap for the pages that are ballooned out because we''re not distinguishing them from other pages like network buffers that are temporarily not part of the p2m map. I''m pretty sure my original implementation got this right and its since been broken :) This scanning probably isn''t very expensive (sans the printf), but its worth cleaning up. Ian> Thanks, > > John Byrne > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Steven Hand
2006-Dec-29 13:07 UTC
Re: [Xen-devel] Live migration: "netbuf race" messages can cause significant perfomance impact
>Someone found that doing a live migration of a domain that had ballooned >down took far longer to migrate. (Ballooned down from 3000M to 1000M, 31 >seconds vs 89 seconds, real time) I came up with a complex theory and >asked him to look in the xend.log to confirm it. He didn''t, but he >mentioned there was a lot of "netbuf race" messages in the log. In this >particular case, live migration generated approximately 512000 "netbuf >race" messages. Deleting the DPRINTF reduced the migration time to 11 >seconds. > >While it is simple enough to submit a patch to delete this DPRINTF, >perhaps something more subtle is called for such as modifying the >migrate/save command paths to accept a debug argument and passing to >xc_save?There''s nothing much we can do here - there''s no easy way for us to distinguish between pages which are ''ballooned out'' and pages which are temporarily being used for network buffers. I''ve checked in a fix to unstable (cset 13185:62ef527eb19f) which simply removes this particular debug output. thanks for spotting this! cheers, S. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel