I''m not certain when this code was changed significantly from what I remember, but - What is the purpose of using alloc_bootmem_low variants here? I.e., where is the dependency on physical addresses being below 4G here (machine addresses are being restricted after the allocation anyway)? The panic message text after the failed allocation is confusing me additionally. - While I can see the idea behind the overflow buffer, it doesn''t seem to prevent data corruption, and if I understand it correctly it doesn''t even prevent memory corruption (since its machine address doesn''t get restricted anywhere, so the fall back return value would not necessarily meet the device requirements). - With various parameters now being command line configurable, if any of these get set inconsistently or incorrectly the user would probably get a cryptic crash (from the BUG_ON() following the call to xen_create_contiguous_region()). - The default bit width for DMA (also in Xen itself) was now changed to 30, just because of a single device (b44). Shouldn''t it, with the settings being customizable now, rather be 32 (and those who own such ill devices need to make use of the option)? - The DMA bit widths can be set to different values in Xen and kernel, which can lead to surprising results, I would think. Shouldn''t the kernel rather obtain Xen''s value, so they are consistent? Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/12/06 12:50, "Jan Beulich" <jbeulich@novell.com> wrote:> I''m not certain when this code was changed significantly from what I remember, > but > > - What is the purpose of using alloc_bootmem_low variants here? I.e., where is > the dependency on physical addresses being below 4G here (machine addressesare> being restricted after the allocation anyway)? The panic message text after > the > failed allocation is confusing me additionally.This is how it''s always been since we took ia64''s swiotlb.c.> - While I can see the idea behind the overflow buffer, it doesn''t seem to > prevent > data corruption, and if I understand it correctly it doesn''t even prevent > memory > corruption (since its machine address doesn''t get restricted anywhere, so the > fall > back return value would not necessarily meet the device requirements).Same here. We didn''t implement this. It doesn''t seem to make that much sense. Sync''ing with lib/swiotb.c and throwing away our special one would be very nice. :-)> - With various parameters now being command line configurable, if any of these > get set inconsistently or incorrectly the user would probably get a cryptic > crash > (from the BUG_ON() following the call to xen_create_contiguous_region()).True. Best know what you''re doing if you mess with these.> - The default bit width for DMA (also in Xen itself) was now changed to 30, > just > because of a single device (b44). Shouldn''t it, with the settings being > customizable now, rather be 32 (and those who own such ill devices need to > make use of the option)?Arguable. Depends how common 31- and 30-bit limitations are. Limiting the swiotlb to 1GB of memory doesn''t seem that harsh -- how much swiotlb memory are you likely to want, system wide?> - The DMA bit widths can be set to different values in Xen and kernel, which > can > lead to surprising results, I would think. Shouldn''t the kernel rather obtain > Xen''s > value, so they are consistent?We would like to generalise Xen''s heap allocator so that it keeps separate heaps for different bit widths. Then there would be no ''DMA width'' or ''DMA pool'' in Xen. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> - The DMA bit widths can be set to different values in Xen and kernel, which >> can >> lead to surprising results, I would think. Shouldn''t the kernel rather obtain >> Xen''s >> value, so they are consistent? > >We would like to generalise Xen''s heap allocator so that it keeps separate >heaps for different bit widths. Then there would be no ''DMA width'' or ''DMA >pool'' in Xen.I already have patches ready to do this (the DMA thing really is a nice side effect, I mostly wanted it for 32on64, so that I can restrict domain allocations for 32-bit domains). Are you saying I should throw away the DMA specialization then altogether (I already have no special DMA heap anymore)? The leftovers from it are so that one can reserve some portion of low memory to be returned only when the width restriction is low enough (i.e. to retain dma_emergency_pool functionality), which certainly isn''t really appropriate anymore now (it should rather be a percentage or something like that, so that the lower you get the more of the memory remains reserved for specialized allocations). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/12/06 13:53, "Jan Beulich" <jbeulich@novell.com> wrote:> I already have patches ready to do this (the DMA thing really is a nice side > effect, I mostly wanted it for 32on64, so that I can restrict domain > allocations for 32-bit domains). Are you saying I should throw away the > DMA specialization then altogether (I already have no special DMA heap > anymore)? The leftovers from it are so that one can reserve some portion > of low memory to be returned only when the width restriction is low enough > (i.e. to retain dma_emergency_pool functionality), which certainly isn''t > really appropriate anymore now (it should rather be a percentage or > something like that, so that the lower you get the more of the memory > remains reserved for specialized allocations).I think dma_emergency_pool as is can go. Possibly it should be replaced by allocator-management tools in dom0 to allow setting of limits on a per-bitwidth basis. Is this one of the patches you already sent in your 32-on-64 batch, or an additional one? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir@xensource.com> 15.12.06 15:03 >>> >On 15/12/06 13:53, "Jan Beulich" <jbeulich@novell.com> wrote: > >> I already have patches ready to do this (the DMA thing really is a nice side >> effect, I mostly wanted it for 32on64, so that I can restrict domain >> allocations for 32-bit domains). Are you saying I should throw away the >> DMA specialization then altogether (I already have no special DMA heap >> anymore)? The leftovers from it are so that one can reserve some portion >> of low memory to be returned only when the width restriction is low enough >> (i.e. to retain dma_emergency_pool functionality), which certainly isn''t >> really appropriate anymore now (it should rather be a percentage or >> something like that, so that the lower you get the more of the memory >> remains reserved for specialized allocations). > >I think dma_emergency_pool as is can go. Possibly it should be replaced by >allocator-management tools in dom0 to allow setting of limits on a >per-bitwidth basis.Okay, but I think I''ll leave this as a separate change (that we probably first should reach agreement on what it really ought to do and not do).>Is this one of the patches you already sent in your 32-on-64 batch, or an >additional one?An additional one (or actually, a set of them, to make the individual steps more clear). As a followup, I''m also planning to get rid of the Xen heaps on those arches where they aren''t needed (x86-64, not sure about ppc and ia64, but I would assume it''s really only x86-32 that needs it). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/12/06 14:17, "Jan Beulich" <jbeulich@novell.com> wrote:>> Is this one of the patches you already sent in your 32-on-64 batch, or an >> additional one? > > An additional one (or actually, a set of them, to make the individual steps > more clear). As a followup, I''m also planning to get rid of the Xen heaps > on those arches where they aren''t needed (x86-64, not sure about ppc > and ia64, but I would assume it''s really only x86-32 that needs it).Yes, that would be great. There are a couple of minor caveats around special semantics of the Xen heap but those could do with cleaning up and being made explicit anyway. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir@xensource.com> 15.12.06 15:19 >>> >On 15/12/06 14:17, "Jan Beulich" <jbeulich@novell.com> wrote: > >>> Is this one of the patches you already sent in your 32-on-64 batch, or an >>> additional one? >> >> An additional one (or actually, a set of them, to make the individual steps >> more clear). As a followup, I''m also planning to get rid of the Xen heaps >> on those arches where they aren''t needed (x86-64, not sure about ppc >> and ia64, but I would assume it''s really only x86-32 that needs it). > >Yes, that would be great. There are a couple of minor caveats around special >semantics of the Xen heap but those could do with cleaning up and being made >explicit anyway.Mind adding a little more detail here? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Arguable. Depends how common 31- and 30-bit limitations are. Limiting the > swiotlb to 1GB of memory doesn''t seem that harsh -- how much swiotlb memory > are you likely to want, system wide?31bit pops up now and then, 30bit also. There are a few devices that are even worse but they are older sound cards so not important. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/12/06 14:46, "Jan Beulich" <jbeulich@novell.com> wrote:>> Yes, that would be great. There are a couple of minor caveats around special >> semantics of the Xen heap but those could do with cleaning up and being made >> explicit anyway. > > Mind adding a little more detail here?Some allocations need to be below 4GB (e.g., domain pointers, as they get pickled into a 32-bit field in page structures). These should use an explicit address-width parameter to the allocator. Pages allocated from the ''Xen heap'' but shared with a domain have special semantics when their refcount falls to zero. This includes shared_info and grant-table pages. We may need an extra PGC_ bit to flag these pages. That''s all I can think of. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>> - What is the purpose of using alloc_bootmem_low variants here? I.e., where is >> the dependency on physical addresses being below 4G here (machine addresses >are >> being restricted after the allocation anyway)? The panic message text after >> the >> failed allocation is confusing me additionally. > >This is how it''s always been since we took ia64''s swiotlb.c.Okay, then I must have forgotten about how it looked like. However, the specific panic message has a Xen-specific addition, so I still wonder what its background is...>> - While I can see the idea behind the overflow buffer, it doesn''t seem to >> prevent >> data corruption, and if I understand it correctly it doesn''t even prevent >> memory >> corruption (since its machine address doesn''t get restricted anywhere, so the >> fall >> back return value would not necessarily meet the device requirements). > >Same here. We didn''t implement this. It doesn''t seem to make that much >sense. Sync''ing with lib/swiotb.c and throwing away our special one would be >very nice. :-)Trying to do that I find one extra issue: in_swiotlb_aperture() does its check based on pfn, while lib/swiotlb.c uses the virtual address in the respective checks instead. Is there some subtlety behind that (that then should be commented upon), or is this just due to this originally having been an mfn-based check? Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 18/12/06 7:44 am, "Jan Beulich" <jbeulich@novell.com> wrote:>> Same here. We didn''t implement this. It doesn''t seem to make that much >> sense. Sync''ing with lib/swiotb.c and throwing away our special one would be >> very nice. :-) > > Trying to do that I find one extra issue: in_swiotlb_aperture() does its check > based on pfn, while lib/swiotlb.c uses the virtual address in the respective > checks instead. Is there some subtlety behind that (that then should be > commented upon), or is this just due to this originally having been an > mfn-based check?Yes, it''s because we used to do an mfn range check which was okay when the swiotlb aperture was filled with contiguous machine memory. Since it is composed of discontiguous slabs now, we changed to a pfn check but that could equally well be a virtual-address check. Do we merge okay with lib/swiotlb.c then? One concern I had was with our preferred setup semantics -- we really want the user to be able to forcibly enable the swiotlb via a boot parameter *but* not have to suffer using it for every DMA operation. Last I looked the generic swiotlb didn''t have that option. That and our very Xen-specific checks for whether to auto-enable the swiotlb led me to think that the very start-of-day setup of swiotlb would need to be overridable by architecture. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>Do we merge okay with lib/swiotlb.c then?Not yet - because of the highmem handling needed for i386. I wonder, however, how native Linux gets away with not handling this through swiotlb, and why nevertheless Xen needs to special case this. Any ideas? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 19/12/06 12:48, "Jan Beulich" <jbeulich@novell.com> wrote:>> Do we merge okay with lib/swiotlb.c then? > > Not yet - because of the highmem handling needed for i386. I wonder, however, > how native Linux gets away with not handling this through swiotlb, and why > nevertheless Xen needs to special case this. Any ideas?Probably because GFP_KERNEL and GFP_DMA allocations are guaranteed to be DMAable by 30-bit-capable devices on native, but not on Xen. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir@xensource.com> 19.12.06 15:14 >>> >On 19/12/06 12:48, "Jan Beulich" <jbeulich@novell.com> wrote: > >>> Do we merge okay with lib/swiotlb.c then? >> >> Not yet - because of the highmem handling needed for i386. I wonder, however, >> how native Linux gets away with not handling this through swiotlb, and why >> nevertheless Xen needs to special case this. Any ideas? > >Probably because GFP_KERNEL and GFP_DMA allocations are guaranteed to be >DMAable by 30-bit-capable devices on native, but not on Xen.Not sure I understand your thinking here. Nothing prevents user pages (or anything else that I/O may happen against) to come from highmem, hence the bounce logic in mm/highmem.c needs to control this anyway (as I understand it). And since all we''re talking about here are physical addresses (and their translations to virtual ones), I would rather conclude that we''ll never see a page in the I/O path that page_address() would return NULL for, but if that''s the case, then there''s no need to kmap such pages or to favor page_to_bus() over virt_to_bus(page_address()). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
One thing is we change the value of PCI_BUS_IS_PHYS (or some similar macro which I can''t quite remember the name of) which I believe turns off some bounce-buffer logic contained within the block-device subsystem. So that will mean that we get highmem requests hitting the DMA interfaces, where on native they would have got filtered earlier by the highmem/lowmem bounce buffer logic that is specific to block-device requests. Obviously we want to turn off that lowmem/highmem bounce buffer logic on Xen as it is nonsense when there is no direct correspondence to low/high machine addresses. -- Keir On 19/12/06 14:39, "Jan Beulich" <jbeulich@novell.com> wrote:>>> Not yet - because of the highmem handling needed for i386. I wonder, >>> however, >>> how native Linux gets away with not handling this through swiotlb, and why >>> nevertheless Xen needs to special case this. Any ideas? >> >> Probably because GFP_KERNEL and GFP_DMA allocations are guaranteed to be >> DMAable by 30-bit-capable devices on native, but not on Xen. > > Not sure I understand your thinking here. Nothing prevents user pages (or > anything > else that I/O may happen against) to come from highmem, hence the bounce logic > in mm/highmem.c needs to control this anyway (as I understand it). And since > all > we''re talking about here are physical addresses (and their translations to > virtual > ones), I would rather conclude that we''ll never see a page in the I/O path > that > page_address() would return NULL for, but if that''s the case, then there''s no > need > to kmap such pages or to favor page_to_bus() over virt_to_bus(page_address())._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Dec 19, 2006 at 02:46:41PM +0000, Keir Fraser wrote:> > One thing is we change the value of PCI_BUS_IS_PHYS (or some similar macro > which I can''t quite remember the name of) which I believe turns off some > bounce-buffer logic contained within the block-device subsystem. So that > will mean that we get highmem requests hitting the DMA interfaces, where on > native they would have got filtered earlier by the highmem/lowmem bounce > buffer logic that is specific to block-device requests.PCI_DMA_BUS_IS_PHYS, commonly found in asm-$(arch)/pci.h. Cheers, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>Do we merge okay with lib/swiotlb.c then? One concern I had was with our >preferred setup semantics -- we really want the user to be able to forcibly >enable the swiotlb via a boot parameter *but* not have to suffer using it >for every DMA operation. Last I looked the generic swiotlb didn''t have that >option. That and our very Xen-specific checks for whether to auto-enable the >swiotlb led me to think that the very start-of-day setup of swiotlb would >need to be overridable by architecture.I think I retained all of the semantics, attached the patches as I have them by now. This is a submission for review only, as the first four patches will need to go to kernel.org (and hopefully will get accepted). The Xen customization is fairly ugly, but I didn''t see anything nicer than that while also keeping the amount of changes to lib/swiotlb.c reasonable. Patch order is swiotlb-bugs.patch swiotlb-bus.patch swiotlb-cleanup.patch swiotlb-split.patch xen-swiotlb.patch _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Patch update, fixing a bug on x86/PAE, and making include/xen/swiotlb.h look a lot nicer (but still not really nice). My plan is to submit the non-Xen ones to lkml right after New Year, unless I hear negative feedback. What are the plans on the Xen side - pull the non-Xen patches into patches/, or ignore everything until (hopefully) mainline has picked up some or all of the native ones? Jan>>Do we merge okay with lib/swiotlb.c then? One concern I had was with our >>preferred setup semantics -- we really want the user to be able to forcibly >>enable the swiotlb via a boot parameter *but* not have to suffer using it >>for every DMA operation. Last I looked the generic swiotlb didn''t have that >>option. That and our very Xen-specific checks for whether to auto-enable the >>swiotlb led me to think that the very start-of-day setup of swiotlb would >>need to be overridable by architecture. > >I think I retained all of the semantics, attached the patches as I have them >by now. This is a submission for review only, as the first four patches will >need to go to kernel.org (and hopefully will get accepted). The Xen >customization is fairly ugly, but I didn''t see anything nicer than that while >also keeping the amount of changes to lib/swiotlb.c reasonable. > >Patch order is >swiotlb-bugs.patch >swiotlb-bus.patch >swiotlb-cleanup.patch >swiotlb-split.patch >xen-swiotlb.patch_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
One more thing: Is it really necessary to restrict dma_alloc_coherent() to dma_bits? I.e., couldn''t we, once the bit-level page allocator is merged, use the real bit width needed for the requesting device here? If not, this would then permit using the original implementation of swiotlb_dma_supported() (as dma_alloc_coherent() then no longer depends on dma_bits), and perhaps even auto-setting dma_bits based on what memory we can get out of Xen in swiotlb_init(), making the mismatching of command line options (between Xen and kernel) impossible (the kernel simply wouldn''t have one anymore). As a nice side effect, using the original implementation of swiotlb_dma_supported() would require slightly less tweaking of lib/swiotlb.c, hence slightly raising the chances of the changes getting accepted into mainline. And clearly, if the kernel manages to allocate the swiotlb at an address with less than dma_bits bits, there seems to be no reason to refuse use of I/O devices that the actual buffer fits, but dma_bits doesn''t. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Dec 22, 2006 at 04:20:28PM +0000, Jan Beulich wrote:> One more thing: Is it really necessary to restrict dma_alloc_coherent() to dma_bits? > I.e., couldn''t we, once the bit-level page allocator is merged, use the real bit width > needed for the requesting device here? If not, this would then permit using theYes I think that should work. If the width is less than what the hypervisor supports it''ll simply fail which is better than returning the wrong memory silently. Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Firstly, I think we should wait and see if the patches are acceptable upstream in their current form before switching to using them. As for dma_alloc_coherent() -- yes it makes sense to make an allocation request with the device''s specified bit width. And we could be opportunistic about the bit width we advertise from swiotlb if we happen to get lower memory than we asked for. *but*: 1. Our swiotlb is made up of separately allocated strides, so the swiotlb is not contiguous in machine memory. That needs to be kept in mind when calculating the bit width as it''ll be max over all strides. 2. Also because of this, the existing swiotlb_dma_supported() cannot work as is. Firstly it would need to use virt_to_machine(), and even then it doesn''t take into account that the aperture is not contiguous in machine memory. And as I said before, dma_bits will disappear from Xen in due course when the dma pool goes away and is replaced with something more flexible. The plan is to leave it (or a similarly-named parameter) in Linux, at least as a guide to the swiotlb pre-allocation (even if no longer used for dma_alloc_coherent). -- Keir On 22/12/06 4:20 pm, "Jan Beulich" <jbeulich@novell.com> wrote:> One more thing: Is it really necessary to restrict dma_alloc_coherent() to > dma_bits? > I.e., couldn''t we, once the bit-level page allocator is merged, use the real > bit width > needed for the requesting device here? If not, this would then permit using > the > original implementation of swiotlb_dma_supported() (as dma_alloc_coherent() > then > no longer depends on dma_bits), and perhaps even auto-setting dma_bits based > on what memory we can get out of Xen in swiotlb_init(), making the mismatching > of > command line options (between Xen and kernel) impossible (the kernel simply > wouldn''t have one anymore). > > As a nice side effect, using the original implementation of > swiotlb_dma_supported() > would require slightly less tweaking of lib/swiotlb.c, hence slightly raising > the > chances of the changes getting accepted into mainline. And clearly, if the > kernel > manages to allocate the swiotlb at an address with less than dma_bits bits, > there > seems to be no reason to refuse use of I/O devices that the actual buffer > fits, but > dma_bits doesn''t._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Fri, Dec 22, 2006 at 02:49:53PM +0000, Jan Beulich wrote:> Patch update, fixing a bug on x86/PAE, and making > include/xen/swiotlb.h look a lot nicer (but still not really > nice). My plan is to submit the non-Xen ones to lkml right after New > Year, unless I hear negative feedback.> >Patch order is > >swiotlb-bugs.patch > >swiotlb-bus.patch > >swiotlb-cleanup.patch > >swiotlb-split.patch > >xen-swiotlb.patchComments inline. swiotlb-bugs.patch: [snip]> /* > @@ -758,8 +739,10 @@ swiotlb_sync_sg(struct device *hwdev, st > > for (i = 0; i < nelems; i++, sg++) > if (sg->dma_address != SG_ENT_PHYS_ADDRESS(sg)) > - sync_single(hwdev, (void *) sg->dma_address, > + sync_single(hwdev, phys_to_virt(sg->dma_address), > sg->dma_length, dir, target);Fix looks correct and bug looks painful. I think you should send this one to mainline immediately. swiotlb-bus.patch:> Convert all phys_to_virt/virt_to_phys uses to > bus_to_virt/virt_to_bus."... because Xen needs it", otherwise someone is bound to ask why, as all other archs define _bus as _phys.> Signed-off-by: Jan Beulich <jbeulich@novell.com>Acked-by: Muli Ben-Yehuda <muli@il.ibm.com> swiotlb-cleanup:> This patch > - adds proper __init decoration to swiotlb''s init code (and the code calling > it, where not already the case) > - replaces uses of ''unsigned long'' with dma_addr_t where appropriate > - does miscellaneous simplicfication and cleanup > > Signed-off-by: Jan Beulich <jbeulich@novell.com>Looks good in general, not acking yet because I want to give it a spin first. swiotlb-split.patch:> This patch adds abstraction so that the file can be used by environments other > than IA64 and EM64T, namely for Xen.I''m sorry, but this patch is horrible. swiotlb.c is now pretty much unreadable. I''d be surprised if mainline accepted it - I would certainly NAK it with my mainline hat on, especially for an unmerged architecture. If Xen needs so many "abstractions", I have to ask whether it isn''t better off just using its own swiotlb.c as we are now. I''ll take another look at this later and try to come up with a different way of merging them that isn''t quite this horrible. Maybe using function pointers for the "low level" operations? Cheers, Muli _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 25/12/06 4:50 am, "Muli Ben-Yehuda" <muli@il.ibm.com> wrote:> I''m sorry, but this patch is horrible. swiotlb.c is now pretty much > unreadable. I''d be surprised if mainline accepted it - I would > certainly NAK it with my mainline hat on, especially for an unmerged > architecture. > > If Xen needs so many "abstractions", I have to ask whether it isn''t > better off just using its own swiotlb.c as we are now. > > I''ll take another look at this later and try to come up with a > different way of merging them that isn''t quite this horrible. Maybe > using function pointers for the "low level" operations?A lot of this ugliness comes from swiotlb_[un]map_page, the introduction of a structural ''io_tlb_addr_t'', and a more complicated synchronisation function than memcpy (which uses copy_to_user and kmap). However, I don''t remember *why* we need these Xen-specific customisations (even though I was the one who originally made them, way back). That given, it would probably be sensible to make a more brutal merge that discards Xen differences which we cannot explain. For example: * Why can''t we turn dma_[un]map_page into dma_[un]map_single, as x86_64 does? This would avoid needing to expand the swiotlb api. * Why can''t we store virtual addresses in the io_tlb_orig_addr[] array just like lib/swiotlb.c, given that the native swiotlb is happy to use page_address() on any ''struct page'' that is passed to it? I can''t see why we actually need KM_SWIOTLB and to use kmap_atomic. * If we make the above changes, do we need special sync_single() any more? We''ll no longer need kmap_atomic, but we''ll still have copy_to_user_inatomic, due to abuses of the DMA API by the block-device subsystem. Maybe that has been fixed already? Or maybe, in merging this upstream, we should aim to fix the block-device drivers rather than work around? Even if we wait for some of the patches to be merged upstream before applying the swiotlb changes to our own Linux tree, I''d consider patches just to remove the above differences relative to lib/swiotlb.c. This might reduce the diff but would also let us get some testing of these swiotlb simplifications. The differences I was expecting to need to make were just the following: * Initialisation time -- we want to be able to automatically conditionally initialise the swiotlb, and allocate the aperture contents in a special way. * Usage -- ''force'' has difference semantics for Xen (means forcibly allocate a swiotlb, but not use it on every device access whether it is needed or not). We want to be able to use the swiotlb only when it is needed for a particular device access; this may in particular require special treatment of the unmap api calls to decide whether or not the given dma_address resides in the swiotlb aperture. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <Keir.Fraser@cl.cam.ac.uk> 12/25/06 11:20 AM >>> >On 25/12/06 4:50 am, "Muli Ben-Yehuda" <muli@il.ibm.com> wrote: > >> I''m sorry, but this patch is horrible. swiotlb.c is now pretty much >> unreadable. I''d be surprised if mainline accepted it - I would >> certainly NAK it with my mainline hat on, especially for an unmerged >> architecture. >> >> If Xen needs so many "abstractions", I have to ask whether it isn''t >> better off just using its own swiotlb.c as we are now. >> >> I''ll take another look at this later and try to come up with a >> different way of merging them that isn''t quite this horrible. Maybe >> using function pointers for the "low level" operations? > >A lot of this ugliness comes from swiotlb_[un]map_page, the introduction of<a structural ''io_tlb_addr_t'', and a more complicated synchronisation>function than memcpy (which uses copy_to_user and kmap). > >However, I don''t remember *why* we need these Xen-specific customisations >(even though I was the one who originally made them, way back). That given, >it would probably be sensible to make a more brutal merge that discards Xen >differences which we cannot explain. For example: > > * Why can''t we turn dma_[un]map_page into dma_[un]map_single, as x86_64 >does? This would avoid needing to expand the swiotlb api.Because we allow highmem pages in the I/O path, hence page_address() cannot be used. As you may have concluded from my sending of a second rev of the patches, I had a bug in exactly that path, so I know it is being exercised. Of course, all this exists for x86-32/PAE *only*, so it may be valid to raise the question if it''s worth it. But otoh with supporting (only) 32-bit PAE PV guests on x86-64 we are in the process of widening the use case here.> * Why can''t we store virtual addresses in the io_tlb_orig_addr[] array just >like lib/swiotlb.c, given that the native swiotlb is happy to use >page_address() on any ''struct page'' that is passed to it? I can''t see why we >actually need KM_SWIOTLB and to use kmap_atomic.Likewise.> * If we make the above changes, do we need special sync_single() any more? >We''ll no longer need kmap_atomic, but we''ll still have >copy_to_user_inatomic, due to abuses of the DMA API by the block-device >subsystem. Maybe that has been fixed already? Or maybe, in merging this >upstream, we should aim to fix the block-device drivers rather than work >around?Fixing the block drivers would certainly be nice (I''m not exactly clear where this dependency lives on their side), but due to the above we''ll be unable to completely switch over to memcpy() only. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 30/12/06 5:32 pm, "Jan Beulich" <jbeulich@novell.com> wrote:>> * Why can''t we turn dma_[un]map_page into dma_[un]map_single, as x86_64 >> does? This would avoid needing to expand the swiotlb api. > > Because we allow highmem pages in the I/O path, hence page_address() cannot > be used. As you may have concluded from my sending of a second rev of the > patches, I had a bug in exactly that path, so I know it is being exercised. > Of course, all this exists for x86-32/PAE *only*, so it may be valid to > raise the question if it''s worth it. But otoh with supporting (only) 32-bit > PAE PV guests on x86-64 we are in the process of widening the use case here.Ah of course, the generic swiotlb has not (yet) been used by an architecture with highmem requiring use of kmap(). I forgot about that. Unfortunately highmem does rather complicate things -- I guess it''s up to the lib/swiotlb maintainers whether they want to keep that complexity in an xen-i386-specific swiotlb.c or attempt a merge. Here''s a thought: if the highmem DMA requests come from *only* the blkdev subsystem, then perhaps we could use its highmem bounce buffer (I think that still exists?). We turn that off on Xen right now, but we could re-enable it, leading to a slightly odd ''double bounce buffer'': the first taking us from high pseudophysical memory to low pseudophysical memory, and the second taking us from high machine memory to low machine memory. If we can ensure only lowmem requests get to the swiotlb then a lot of the Xen diffs go away. I''m not sure whether we might get DMA requests from high memory from things other than block devices though, nor whether all block devices would actually pass through the blkdev bounce buffer code. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>Here''s a thought: if the highmem DMA requests come from *only* the blkdev >subsystem, then perhaps we could use its highmem bounce buffer (I think that >still exists?). We turn that off on Xen right now, but we could re-enable >it, leading to a slightly odd ''double bounce buffer'': the first taking us >from high pseudophysical memory to low pseudophysical memory, and the second >taking us from high machine memory to low machine memory. If we can ensure >only lowmem requests get to the swiotlb then a lot of the Xen diffs go away. >I''m not sure whether we might get DMA requests from high memory from things >other than block devices though, nor whether all block devices would >actually pass through the blkdev bounce buffer code.Proving that no driver ever passes highmem pages into any of the DMA ops should be at least difficult, but would be needed since native i386 has no requirement that dma_map_sg() or dma_map_page() be called only with non-highmem pages. I am therefore not thinking this is a realistic option. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>Here''s a thought: if the highmem DMA requests come from *only* the blkdev >subsystem, then perhaps we could use its highmem bounce buffer (I think that >still exists?).Another, only partially related thought: The addition of KM_SWIOTLB is the only difference to native kmap_types.h. It would seem to me that, at the price of disabling interrupts around the use, it should be possible to replace that with KM_BOUNCE_READ and let go of the Xen specific header... Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 3/1/07 07:10, "Jan Beulich" <jbeulich@novell.com> wrote:>> Here''s a thought: if the highmem DMA requests come from *only* the blkdev >> subsystem, then perhaps we could use its highmem bounce buffer (I think that >> still exists?). > > Another, only partially related thought: The addition of KM_SWIOTLB is the > only > difference to native kmap_types.h. It would seem to me that, at the price of > disabling interrupts around the use, it should be possible to replace that > with > KM_BOUNCE_READ and let go of the Xen specific header...Sounds reasonable, although KM_BOUNCE_READ is a misnomer (since we write to the kmap as well as read from it). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir@xensource.com> 03.01.07 10:32 >>> >On 3/1/07 07:10, "Jan Beulich" <jbeulich@novell.com> wrote: >> Another, only partially related thought: The addition of KM_SWIOTLB is the >> only >> difference to native kmap_types.h. It would seem to me that, at the price of >> disabling interrupts around the use, it should be possible to replace that >> with >> KM_BOUNCE_READ and let go of the Xen specific header... > >Sounds reasonable, although KM_BOUNCE_READ is a misnomer (since we write to >the kmap as well as read from it).... similar to the (mis-)use of it in the EDAC driver. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel