Graham, Simon
2007-Feb-14 13:57 UTC
RE: [Xen-devel] DomU crash during migration when suspendingsource domain
> Are you migrating between unlike boxes? My guess is that the original > box > has processors supporting cacheinfo cpuid leaves and the target box > does > not. Migrating to older less-capable CPUs is definitely hit-and-miss > I''m > afraid. It really is best not to do it! >I think this is indeed what is happening -- supporting this is kind of important for HA/FT - you need to be able to keep the domains running when upgrading/replacing hardware. I guess I''m still a tad confused, but presumably the CPU_DEAD processing is not completely uninitializing the cache info (it seems to me that if it discarded the cache info and NULL''s the pointer in the CPU_DEAD processing then it should get recreated when the CPU_ONLINE is done - presumably there is some path where this is not done when it should be. I''ll do some more digging and get back with a proposed fix. Simon _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-14 14:35 UTC
Re: [Xen-devel] DomU crash during migration when suspendingsource domain
In general we *cannot* expect to support CPUs with different features in CPUID. We plan to fix this in two ways: 1. Allow a guest to be given a restricted CPUID view (e.g., with features masked out, or cacheinfo leaves missing). 2. Where a guest has been exposed to extended features and leaves, prevent it from being migrated to a less-capable CPU. A further option (3) for cache info might be to fake out the leaves for CPUs that do not support them. But I''m not sure whether, for example, this would be compatible with AMD''s CPUID instruction. This issue is hardly specific to HA/FT. You can safely build yourself a HA/FT cluster out of homogeneous hardware. Building it out of odds and ends you have already is going to be hard or impossible to guarantee safety of in general. I don''t believe anyone sells or supports software to allow you to do this, and there''s a reason for that. -- Keir On 14/2/07 13:57, "Graham, Simon" <Simon.Graham@stratus.com> wrote:> I think this is indeed what is happening -- supporting this is kind of > important for HA/FT - you need to be able to keep the domains running > when upgrading/replacing hardware. > > I guess I''m still a tad confused, but presumably the CPU_DEAD processing > is not completely uninitializing the cache info (it seems to me that if > it discarded the cache info and NULL''s the pointer in the CPU_DEAD > processing then it should get recreated when the CPU_ONLINE is done - > presumably there is some path where this is not done when it should be. > > I''ll do some more digging and get back with a proposed fix._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel