George Dunlap
2009-Apr-14  12:38 UTC
[Xen-devel] Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1)
Hey all, Thanks for the feedback; and, sorry for sending it just before a holiday weekend so there was a delay in writing up a response. (OTOH, as I did read the e-mails as they came out, it''s given me more time to think and coalesce.) A couple of high bits: This first e-mail was meant to lay out design goals and discuss interface. If we can agree (for example) that we want latency-sensitive workloads (such as network, audio, and video) to perform well, and use latency-sensitive workloads as test cases while developing, then we don''t need to agree on a specific algorithm up-front. OK, with that in mind, some specific responses: * [Jeremy] Is that forward-looking enough? That hardware is currently available; what''s going to be commonplace in 2-3 years? I think we need to distinguish between "works optimally" and "works well". Obviously we want the design to be scalable, and we don''t want to have to do a major revision in a year because 16 logical cpus works well but 32 tanks. And it may be a good idea to "lead" the target, so that when we actually ship something it will be right on, rather than 6 months behind. Still, in 2-3 years, will the vast majority of servers have 32 logical cpus, or still only 16 or less? Any thoughts on a reasonable target? * [Kevin Tian] How is 80%/800% chosen here? Heuristics. 80% is a general rule of thumb for optimal server performance. Above 80% and you may get a higher total throughput (or maybe not) but it will be common for individual VMs to have to wait for CPU resources, which may cause significant performance impact. (I should clarify, 80% means 80% of *all* resources, not 80% of one cpu; i.e., if you have 4 cores, xenuse may report 360% of one cpu; but 100% of all resources would be 400% of one cpu.) 800% was just a general boundary. I think it''s sometimes as important to say what you *aren''t* doing as what you are doing. For example, if someone comes in and says, "This new scheduler sucks if you have a load average of 10 (i.e., 1000% utilization)", we can say, "Running with a load average of 10 isn''t what we''re designing for. Patches will be accepted if they don''t adversely impact performance at 80%. Otherwise feel free to write your own scheduler for that kind of system." OTOH, if a hosting provider (for example) says, "Performance really tanks around a load of 3", we should make an effort to accomodate that. * [Kevin Tian] How about VM number in total you''d like to support? Good question. I''ll do some research for how many VMs a virtual desktop system might want to support. For servers, I think a reasonable design space would be between 1 VM every 3 cores (for a few extremely high-load servers) to 8 VMs every core (for highly aggregated servers). I suppose server farms may want more. Does anyone else have any thoughts on this subject -- either suggestions for different numbers, or other use cases they want considered? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2009-Apr-16  03:29 UTC
[Xen-devel] RE: Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1)
>From: George Dunlap >Sent: 2009年4月14日 20:38 > >Hey all, > >Thanks for the feedback; and, sorry for sending it just before a >holiday weekend so there was a delay in writing up a response. (OTOH, >as I did read the e-mails as they came out, it's given me more time to >think and coalesce.) > >A couple of high bits: This first e-mail was meant to lay out design >goals and discuss interface. If we can agree (for example) that we >want latency-sensitive workloads (such as network, audio, and video) >to perform well, and use latency-sensitive workloads as test cases >while developing, then we don't need to agree on a specific algorithm >up-front.That looks fine to me, but latency-sentitive shouldn't be the only part to be concerned. :-)> >* [Kevin Tian] How is 80%/800% chosen here? > >Heuristics. 80% is a general rule of thumb for optimal server >performance. Above 80% and you may get a higher total throughput (or >maybe not) but it will be common for individual VMs to have to wait >for CPU resources, which may cause significant performance impact. > >(I should clarify, 80% means 80% of *all* resources, not 80% of one >cpu; i.e., if you have 4 cores, xenuse may report 360% of one cpu; >but 100% of all resources would be 400% of one cpu.) > >800% was just a general boundary. I think it's sometimes as important >to say what you *aren't* doing as what you are doing. For example, if >someone comes in and says, "This new scheduler sucks if you have a >load average of 10 (i.e., 1000% utilization)", we can say, "Running >with a load average of 10 isn't what we're designing for. Patches >will be accepted if they don't adversely impact performance at 80%. >Otherwise feel free to write your own scheduler for that kind of >system." OTOH, if a hosting provider (for example) says, "Performance >really tanks around a load of 3", we should make an effort to >accomodate that.Got it. So one more interesting question is, how do you define a ''function reasonablely well'' under 800% utilization, any criteria? Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2009-Apr-17  12:11 UTC
Re: [Xen-devel] Scheduler follow-up: Design target (was [RFC] Scheduler work, part 1)
George Dunlap wrote:> * [Jeremy] Is that forward-looking enough? That hardware is currently > available; what''s going to be commonplace in 2-3 years? > > I think we need to distinguish between "works optimally" and "works > well". Obviously we want the design to be scalable, and we don''t want > to have to do a major revision in a year because 16 logical cpus works > well but 32 tanks. And it may be a good idea to "lead" the target, so > that when we actually ship something it will be right on, rather than > 6 months behind.This problem might be less critical if cpupools are supported. On really large systems it would be possible to limit the number of logical cpus for a scheduler.> > Still, in 2-3 years, will the vast majority of servers have 32 logical > cpus, or still only 16 or less?I think Nehalem-EX will have 16 on one socket (8 cores with 2 HT each). With 4 sockets this would sum up to 64.> * [Kevin Tian] How about VM number in total you''d like to support? > > Good question. I''ll do some research for how many VMs a virtual > desktop system might want to support. > > For servers, I think a reasonable design space would be between 1 VM > every 3 cores (for a few extremely high-load servers) to 8 VMs every > core (for highly aggregated servers). I suppose server farms may want > more. > > Does anyone else have any thoughts on this subject -- either > suggestions for different numbers, or other use cases they want > considered?For our BS2000 servers we would really appreciate support of cpupools :-) Or as an alternative correct handling of weights with cpu-pinning. Another question: do you plan to replace the current credit scheduler or will the new scheduler be another alternative to credit and sedf? Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 636 47950 Fujitsu Technolgy Solutions e-mail: juergen.gross@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: ts.fujitsu.com D-81739 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel