Hi, On my devel machine xen comes up fine with PAE paging enabled. Well, it boots not that far yet, it stops at the end of paging_init() right now. The address space issues need fixing now before I can attempt to boot domain 0 with PAE ... At the moment the xen virtual address space (top 64 MB) looks like this: 0xffc0 | 4 MB | ioremap area 0xff80 | 4 MB | mapping cache 0xff40 | 4 MB | per domain mapping (gdt, ...) 0xff00 | 4 MB | shadow linear page tables 0xfec0 | 4 MB | linear page tables 0xfd40 | 24 MB | frame table 0xfd00 | 4 MB | MPT (rw) 0xfc40 | 12 MB | low mem, xen code, xen heap 0xfc00 | 4 MB | MPT (ro) For PAE we''ll have to change: * linear page tables and linear shadow tables need 8 MB each (because pte size is doubled with PAE). * frame table needs to grow, the size depends on the amount of memory we are willing to support (total), for 16 GB it would be 96 MB. * MPT might need more space, depending on how much memory we are willing to support (per domain). With a 4GB per-domain limit the current 4 MB size would be fine. [ side note: the shadow code seems to reuse the MPT address space for something else in some cases, not sure which implications this has ] * not sure about xen''s heap. What this is used for? Might we need more space here as well to support large amounts of memory? If we touch the address space anyway we might fix some other issues along the way. Ian mentioned he wants to move the ioremap area to the bottom. I guess next to the ro MPT table, so it''s easy to grant domains read-only access to ACPI tables? Is it possible (and/or useful) to make the address layout dynamic? So the size of the frametable can be adjusted at boot time depending on the amount of memory installed in the machine? That would imply the ro MPT doesn''t have a fixed address any more, not sure this is possible ... In any case I''d try to make the memory layout as fixed as possible, i.e. move the fixed size stuff to the top, below the data structures which are not fixed-size, at the bottom the ro MPT + ioremap area for r/o domain access, i.e. something like this: [ fixed size ] 0xff00 | 16 MB | low mem, xen code, xen heap 0xfec0 | 4 MB | mapping cache 0xfe80 | 4 MB | per domain mapping (gdt, ...) [ Hmm, debatable whenever make that fixed-size or not. It would waste some address space in the non-pae case, on the other hand the memory layout would be identical for both pae and non-pae. ] 0xfe00 | 8 MB | shadow linear page tables 0xfd80 | 8 MB | linear page tables 0xfbc0 | 4 MB | MPT (rw) [ not fixed size ] 0xfc00 | 24 MB | frame table (larger for PAE ...) [ r/o access for domains ] 0xfb80 | 4 MB | MPT (ro) 0xfb40 | 4 MB | ioremap area Comments? Anything else to consider when touching the address layout anyway? Gerd PS: my current patches are @ http://dl.bytesex.org/patches/xen/ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13 Apr 2005, at 18:59, Gerd Knorr wrote:> If we touch the address space anyway we might fix some other > issues along the way. Ian mentioned he wants to move the > ioremap area to the bottom. I guess next to the ro MPT > table, so it''s easy to grant domains read-only access to > ACPI tables?Ian was talking about the ioremap area in XenLinux, not in Xen. Doing this for XenLinux would be useful if we were to make HYPERVISOR_START a run-time, rather than compile-time, constant. However, I am not convinced that this is a very good idea (see my reasoning below). That given, it is also not clear that reorg''ing the Xen address space is a worthwhile effort. Yes, things like the location of the Xen heap will be different on PAE vs. non-PAE builds, but it will still be a constant decided at compile time. Reorg''ing is hopefully a pain we can do without for the time being. We should revisit these considerations if/when we want a single Xen binary that can do both PAE and non-PAE. But let''s just get the basic support working first. :-)> Is it possible (and/or useful) to make the address layout > dynamic? So the size of the frametable can be adjusted at > boot time depending on the amount of memory installed in the > machine? That would imply the ro MPT doesn''t have a fixed > address any more, not sure this is possible ...I do not think that we should have run-time selected HYPERVISOR_VIRT_START. This is because it will make it impossible to migrate a guest to a machine with more memory than the one on which it is currently executing. The reasoning is: the new target machine will have a lower HYPERVISOR_VIRT_START, but the guest will have sized its lowmem area according to the available space on the original machine. It therefore cannot run on the new target because its lowmem area simply will not fit. I think we should just fix on two VIRT_START values: -64MB for non-PAE and something like -192MB for PAE (or whatever allows us to map up to 16GB -- I think we will treat bigger memory configs than that as rare enough to ignore). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13 Apr 2005, at 20:29, Keir Fraser wrote:> I do not think that we should have run-time selected > HYPERVISOR_VIRT_START. This is because it will make it impossible to > migrate a guest to a machine with more memory than the one on which it > is currently executing. > > The reasoning is: the new target machine will have a lower > HYPERVISOR_VIRT_START, but the guest will have sized its lowmem area > according to the available space on the original machine. It therefore > cannot run on the new target because its lowmem area simply will not > fit. > > I think we should just fix on two VIRT_START values: -64MB for non-PAE > and something like -192MB for PAE (or whatever allows us to map up to > 16GB -- I think we will treat bigger memory configs than that as rare > enough to ignore).I chatted to Christian a bit about this and he changed my mind. There probably are some situations where a variable virt_start would be useful for us, although we still may not want to do it for an initial pae patch. We need generally to think about how flexible we want to be in allowing migration between different machine configurations. Shoudl we require identical h/w specs, or allow differences in I/O devices, CPU and/or memory? We will already have to be careful about downgrading cpu specs when we migrate (e.g., Linux locks onto using multimedia instructions for software raid that are unavailable post-migrate). A pragmatic middleground may be that, if people want to migrate in a heterogeneous cluster, we require them to configure ''worst-case specs'' up front when building a domain (e.g., lowest-spec cpu the domain should run on; biggest hypervisor address-space hole the domain should work with). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser wrote:> We need generally to think about how flexible we want to be in allowing > migration between different machine configurations. Shoudl we require > identical h/w specs, or allow differences in I/O devices, CPU and/or > memory? We will already have to be careful about downgrading cpu specs > when we migrate (e.g., Linux locks onto using multimedia instructions > for software raid that are unavailable post-migrate).Why not treat the functions that use special mm-instructions (like the software RAID code) as critical sections that cannot overlap with migration, and then have the guestOS re-calibrate its use of these features upon arrival? [ insert standard plug of self-migration here :-) ] Jacob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > We need generally to think about how flexible we want to be in > > allowing migration between different machine > configurations. Shoudl we > > require identical h/w specs, or allow differences in I/O > devices, CPU > > and/or memory? We will already have to be careful about downgrading > > cpu specs when we migrate (e.g., Linux locks onto using multimedia > > instructions for software raid that are unavailable post-migrate). > > Why not treat the functions that use special mm-instructions > (like the software RAID code) as critical sections that > cannot overlap with migration, and then have the guestOS > re-calibrate its use of these features upon arrival?That works OK for the kernel, but you might have user space apps that have adapted their behviour based on what the''ve found in /proc/cpuinfo. A particularly nasty case is apps or libraries that go at ''cpuid'' directly, as we can''t trap that instruction. I guess VMware have the same problem, as I don''t believe they translate ring 3 code. As regards your proposed critical region, we already effectively do this. We don''t recalibrate stuff after a migration yet though (some of the tests are quite slow, so I''m not sure you''d want to do them all anyhow). Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt wrote:> That works OK for the kernel, but you might have user space apps that > have adapted their behviour based on what the''ve found in /proc/cpuinfo.A compromise then would be to lie to userspace and still recalibrate the kernel.> A particularly nasty case is apps or libraries that go at ''cpuid'' > directly, as we can''t trap that instruction. I guess VMware have the > same problem, as I don''t believe they translate ring 3 code.Yeah, nothing we can do there really, except tell people not to :-(> As regards your proposed critical region, we already effectively do > this. We don''t recalibrate stuff after a migration yet though (some of > the tests are quite slow, so I''m not sure you''d want to do them all > anyhow).Could perhaps do them in advance, or during the pre-copy phase as they are likely to stay constant for the lifetime of the machine, but that demands that the receiving side knows what you are looking for. My system uploads a bootstrapper to the target VM in advance, so I could have this info ready when the rest of the OS arrives. Jacob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > That works OK for the kernel, but you might have user space apps that > > have adapted their behviour based on what the''ve found in /proc/cpuinfo. > > A compromise then would be to lie to userspace and still recalibrate the > kernel.If you''re saying what I think you are - I don''t think people who have customized their apps for SSE2 would like that. You''ll either have to have xend propagate a bitmask of the cpuid capabilities so that xfrd won''t migrate it or you tell users that "the behaviour is undefined" if they use migration in a heterogeneous cluster. -Kip _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kip Macy wrote:>>>That works OK for the kernel, but you might have user space apps that >>>have adapted their behviour based on what the''ve found in /proc/cpuinfo. >> >>A compromise then would be to lie to userspace and still recalibrate the >>kernel. > > > If you''re saying what I think you are - I don''t think people who have > customized their apps for SSE2 would like that.This is what Keir was suggesting, but still giving the kernel a chance to use SSE[12] for its own purposes. It is not perfect, the only perfect solution would alert user-space apps about the impending change, so that they can react to or veto the migration. Jacob _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 13 Apr 2005, Jacob Gorm Hansen wrote:> Kip Macy wrote: > >>>That works OK for the kernel, but you might have user space apps that > >>>have adapted their behviour based on what the''ve found in /proc/cpuinfo. > >> > >>A compromise then would be to lie to userspace and still recalibrate the > >>kernel. > > > > > > If you''re saying what I think you are - I don''t think people who have > > customized their apps for SSE2 would like that. > > This is what Keir was suggesting, but still giving the kernel a chance > to use SSE[12] for its own purposes. It is not perfect, the only perfect > solution would alert user-space apps about the impending change, so that > they can react to or veto the migration.How does beowulf handle this situation? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, 13 Apr 2005, Kip Macy wrote:> You''ll either have to have xend propagate a bitmask of the cpuid > capabilities so that xfrd won''t migrate it or you tell users that "the > behaviour is undefined" if they use migration in a heterogeneous > cluster.I''d prefer the latter. It really is a configuration problem, and not something I''d want the software to solve for me. People whose applications need SSE2 will install CPUs that have those instructions. -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> People whose applications need SSE2 will install CPUs that > have those instructions. >One would hope, but just because a customer has lots of money to spend on hardware doesn''t mean that he is rowing with both oars. This is a supportability issue. The xen{source} folks would do themselves a favor by trapping a guest''s use of unsupported instructions and logging it. That would make it easy enough to track down if a customer''s apps stop working when using migration. -Kip _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 14 Apr 2005, Kip Macy wrote:> > People whose applications need SSE2 will install CPUs that > > have those instructions. > > > One would hope, but just because a customer has lots of money to spend > on hardware doesn''t mean that he is rowing with both oars. This is a > supportability issue. The xen{source} folks would do themselves a > favor by trapping a guest''s use of unsupported instructions and > logging it. That would make it easy enough to track down if a > customer''s apps stop working when using migration.Good idea, trapping unsupported instructions and printing out the category the instruction belongs to (eg. SSE2) will make things a lot easier to track. I like this idea a lot... -- "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." - Brian W. Kernighan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > One would hope, but just because a customer has lots of > money to spend > > on hardware doesn''t mean that he is rowing with both oars. > This is a > > supportability issue. The xen{source} folks would do themselves a > > favor by trapping a guest''s use of unsupported instructions and > > logging it. That would make it easy enough to track down if a > > customer''s apps stop working when using migration. > > Good idea, trapping unsupported instructions and printing out > the category the instruction belongs to (eg. SSE2) will make > things a lot easier to track. I like this idea a lot...Yep, although we can''t trap the cpuid, we can trap the use of e.g. SSE2. We have to be a bit careful though, to prevent DoS of the Xen console. We''d need to rate limit such messages. Patches welcome :-) Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Apr 14, 2005 at 08:03:18PM +0100, Ian Pratt wrote:> Yep, although we can''t trap the cpuid, we can trap the use of > e.g. SSE2.and emulate them or just warn? for the former it''s going to be slow as hell, and i would almost argue you could stop the domain doing this with a warning... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel