thr3ads.net - Xen devel - [Xen-devel] high memory dma update: up against a wall [Jul 2005]

If this information is useful, please help other people find it:
Share via:

Scott Parish

2005-Jul-12 17:18 UTC

[Xen-devel] high memory dma update: up against a wall

I''ve been slowly working on the dma problem i ran into; thought i was
making progress, but i think i''m up against a wall, so more discussion
and ideas might be helpful.

The problem was that on x86_32 PAE and x86_64, our physical address size
is greater then 32 bits, yet many (most?) io devices can only address
the first 32 bits of memory. So if/when we try to do dma to an address
that''s has bits greater then 32 set (call these high addresses), due to
truncation the dma ends up happening to the wrong address.

I saw this problem on x86_64 with 6gigs ram, if i made dom0 too big, the
allocator put it in high memory, the linux kernel booted fine, but the
partition scan failed, and it couldn''t mount root.

My original solution was to add another type to the xen zoneinfo array
to divide memory between high and low. Finally, only allocate low memory
when a domain needs to do dma or when high memory is exhausted. This was
an easy patch that worked fine. I can provide it if anyone wants it.

On the linux side of things, my first approach was to try to use linux
zones to divide up memory. Currently under xen, all memory is placed in
the dma zone. I was hoping i could somewhere loop over memory, do check
the machine address of each page, and place it in the proper zones. The
first problem with this approach is that linux zones are designed more
for dealing with the even smaller isa address space. That aside, it
seems to make large assumptions about memory being (mostly) contiguous
and most frequently deals with "start" and "size" rather
then arrays
of pages. I start looking at code, thinking that i might change
that, but at some point finally realized that on an abstract level,
what i was fundamentally doing was the exact reason that the pfn/mfn
mapping exists---teaching linux about non-contiguous memory looks fairly
non-trivial.

The next approach i started on was to have xen reback memory with
low pages when it went to do dma. dma_alloc_coherent() makes a call
to xen_contig_memory(), which forces a range of memory to be backed
by machine contiguous pages by freeing the buffer to xen, and then
asking for it back[1]. I tried adding another hypercall to request that
dma''able pages be returned. This worked great for the network cards,
but
disk was another story. First off, there were several code paths that
do dma that don''t end up calling xen_contig_memory (which right now is
fine because its only ever on single pages). I started down the path of
finding those, but in the mean time realized that for disk, we could be
dma''ing to any memory. Additionally, Michael Hohnbaum reminded me of
page flipping. Between these two, it seems reasonable to think that the
pool for free dma memory could eventually become exhausted.

That is the wall.

Footnote: this will not be a problem on all machines. AMD x86_64 has
iommu which should make this a non-problem (if the kernel chooses to use
it). Unfortunately, from what i understand, EMT64 is not so blessed.

sRp

1| incidentally, it seems to me that optimally xen_contig_memory()
should just return if order==0.

-- 
Scott Parish
Signed-off-by: srparish@us.ibm.com

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jon Mason

2005-Jul-12 23:21 UTC

head link

Re: [Xen-devel] high memory dma update: up against a wall

On Tuesday 12 July 2005 12:18 pm, Scott Parish wrote:> I''ve been slowly working on the dma problem i ran into; thought i
was
> making progress, but i think i''m up against a wall, so more
discussion
> and ideas might be helpful.
>
> The problem was that on x86_32 PAE and x86_64, our physical address size
> is greater then 32 bits, yet many (most?) io devices can only address
> the first 32 bits of memory. So if/when we try to do dma to an address
> that''s has bits greater then 32 set (call these high addresses),
due to
> truncation the dma ends up happening to the wrong address.
>
> I saw this problem on x86_64 with 6gigs ram, if i made dom0 too big, the
> allocator put it in high memory, the linux kernel booted fine, but the
> partition scan failed, and it couldn''t mount root.
Why not have the allocator force all driver domains to be in memory < 4GB?
> My original solution was to add another type to the xen zoneinfo array
> to divide memory between high and low. Finally, only allocate low memory
> when a domain needs to do dma or when high memory is exhausted. This was
> an easy patch that worked fine. I can provide it if anyone wants it.
>
> On the linux side of things, my first approach was to try to use linux
> zones to divide up memory. Currently under xen, all memory is placed in
> the dma zone. I was hoping i could somewhere loop over memory, do check
> the machine address of each page, and place it in the proper zones. The
> first problem with this approach is that linux zones are designed more
> for dealing with the even smaller isa address space. That aside, it
> seems to make large assumptions about memory being (mostly) contiguous
> and most frequently deals with "start" and "size"
rather then arrays
> of pages. I start looking at code, thinking that i might change
> that, but at some point finally realized that on an abstract level,
> what i was fundamentally doing was the exact reason that the pfn/mfn
> mapping exists---teaching linux about non-contiguous memory looks fairly
> non-trivial.
>
> The next approach i started on was to have xen reback memory with
> low pages when it went to do dma. dma_alloc_coherent() makes a call
> to xen_contig_memory(), which forces a range of memory to be backed
> by machine contiguous pages by freeing the buffer to xen, and then
> asking for it back[1]. I tried adding another hypercall to request that
> dma''able pages be returned. This worked great for the network
cards, but
> disk was another story. First off, there were several code paths that
> do dma that don''t end up calling xen_contig_memory (which right
now is
> fine because its only ever on single pages). I started down the path of
> finding those, but in the mean time realized that for disk, we could be
> dma''ing to any memory. Additionally, Michael Hohnbaum reminded me
of
> page flipping. Between these two, it seems reasonable to think that the
> pool for free dma memory could eventually become exhausted.
Running out of DMA''able memory happens.  Perf sucks, but it
shouldn''t kill
your system.  What''s the problem?
> That is the wall.
>
> Footnote: this will not be a problem on all machines. AMD x86_64 has
> iommu which should make this a non-problem (if the kernel chooses to use
> it). Unfortunately, from what i understand, EMT64 is not so blessed.
AMD64 has IOMMU HW acceleration.  EM64T has software IOMMU.  Whenever I get 
IOMMU working on x86-64, this should solve your problem.
> sRp
>
> 1| incidentally, it seems to me that optimally xen_contig_memory()
> should just return if order==0.
Thanks,
Jon

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nakajima, Jun

2005-Jul-13 04:36 UTC

head link

RE: [Xen-devel] high memory dma update: up against a wall

Scott Parish wrote:> I''ve been slowly working on the dma problem i ran into; thought i
was
> making progress, but i think i''m up against a wall, so more
discussion
> and ideas might be helpful.
I think porting swiotlb (arch/ia64/lib/swiotlb.c) is one of other
approaches for EM64T as we are using it in the native x86_64 Linux. We
need at least 64MB physically contiguous memory below 4GB for that. For
dom0, I think we can find such area at boot time. 

We have a plan to work on that, but it will be after OLS...

Basically, the io_tlb_start is the starting address of the buffer. You
need to ensure that the memory is physically contiguous in machine
physical. I think it''s easy to find such an area in dom0.
alloc_bootmem_low_pages() may not work, so you may need to write a new
(simple) function.

swiotlb_init_with_default_size (size_t default_size)
{
	unsigned long i;

	if (!io_tlb_nslabs) {
		io_tlb_nslabs = (default_size >> PAGE_SHIFT);
		io_tlb_nslabs = ALIGN(io_tlb_nslabs, IO_TLB_SEGSIZE);
	}

	/*
	 * Get IO TLB memory from the low pages
	 */
	io_tlb_start = alloc_bootmem_low_pages(io_tlb_nslabs *
					       (1 << IO_TLB_SHIFT));

Other thing is to use virt_to_bus() not virt_to_phys(). See below.

void *
swiotlb_alloc_coherent(struct device *hwdev, size_t size,
		       dma_addr_t *dma_handle, int flags)
{
	unsigned long dev_addr;
	void *ret;
	int order = get_order(size);

	/*
	 * XXX fix me: the DMA API should pass us an explicit DMA mask
	 * instead, or use ZONE_DMA32 (ia64 overloads ZONE_DMA to be a
~32
	 * bit range instead of a 16MB one).
	 */
	flags |= GFP_DMA;

	ret = (void *)__get_free_pages(flags, order);
	if (ret && address_needs_mapping(hwdev, virt_to_phys(ret))) {
		/*
		 * The allocated memory isn''t reachable by the device.
		 * Fall back on swiotlb_map_single().
		 */
            free_pages((unsigned long) ret, order);
		ret = NULL;
	}

The baisc idea of swiotlb is if the memory allocate is lower than 4GB,
then just use it. If not, allocate memory chunk from the buffer:

	if (!ret) {
		/*
		 * We are either out of memory or the device can''t DMA
		 * to GFP_DMA memory; fall back on
		 * swiotlb_map_single(), which will grab memory from
		 * the lowest available address range.
		 */
		dma_addr_t handle;
		handle = swiotlb_map_single(NULL, NULL, size,
DMA_FROM_DEVICE);
		if (dma_mapping_error(handle))
			return NULL;

		ret = phys_to_virt(handle);
	}

Jun
---
Intel Open Source Technology Center

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Jul-13 08:48 UTC

head link

Re: [Xen-devel] high memory dma update: up against a wall

On 13 Jul 2005, at 00:21, Jon Mason wrote:
>
> AMD64 has IOMMU HW acceleration.  EM64T has software IOMMU.  Whenever 
> I get
> IOMMU working on x86-64, this should solve your problem.
If dom0 controls the IOMMU (instead of Xen) then that will be a good 
solution for dom0 drivers. For other domains, with no control over a 
IOMMU, we''ll probably fall back to bounce buffers with an associated 
drop in performance.

Most people run drivers in dom0 anyway.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Jul-14 15:26 UTC

head link

RE: [Xen-devel] high memory dma update: up against a wall

> > I saw this problem on x86_64 with 6gigs ram, if i made dom0 
> too big, 
> > the allocator put it in high memory, the linux kernel 
> booted fine, but 
> > the partition scan failed, and it couldn''t mount root.
> 
> Why not have the allocator force all driver domains to be in 
> memory < 4GB?
It''s irelevant whether the driver domains are in memory below 4GB --
they are passed pages by other domains which they want to DMA into.

It''s clear that privileged domains need to support bounce buffers for
hardware that can''t DMA above 4GB.

We could try and optimise the situation by giving each domain some
memory below 4GB so that it can maintain a zone to use in preference for
skb''s etc. It can''t help for most block IO, since pretty much
any of the
domain''s pages can be a target.

However, I''m not convinced that its worth implementing such a
soloution.

Keir and I just looked in Linux''s driver directory and found that
pretty
much all the chips used in server hardware over the last few years
are>4GB capable: tg3, e1000, mpt_fusion, aacraid, megaraid, aic7xxx etc.The only exception seems to be ide/sata controllers. 

For the latter, having sperate memory zones won''t help. We need to use
the gart or other io mmu to translate the DMA in the driver domain.

I think we just go with bounce buffers for the moment, and add io mmu
support once we''ve had a chance to discuss it further. I suspect that
on
most server hardware we won''t need it anyway.

[Is there much extant hardware with >4GB of memory that doesn''t have
disk or network hardware that are capable of DMA above 4GB? My guess
would be no, but can anyone put forward hard data?]

Ian
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2005 - high memory dma update: up against a wall

[Xen-devel] high memory dma update: up against a wall

Re: [Xen-devel] high memory dma update: up against a wall

RE: [Xen-devel] high memory dma update: up against a wall

Re: [Xen-devel] high memory dma update: up against a wall

RE: [Xen-devel] high memory dma update: up against a wall