thr3ads.net - Xen devel - [Xen-devel] [PATCH] TTM DMA pool v2.1 [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

[Xen-devel] [PATCH] TTM DMA pool v2.1

[.. and this is what I said in v1 post]:

Way back in January this patchset:
http://lists.freedesktop.org/archives/dri-devel/2011-January/006905.html
was merged in, but pieces of it had to be reverted b/c they did not
work properly under PowerPC, ARM, and when swapping out pages to disk.

After a bit of discussion on the mailing list
http://marc.info/?i=4D769726.2030307@shipmail.org I started working on it, but
got waylaid by other things .. and finally I am able to post the RFC patches.

There was a lot of discussion about it and I am not sure if I captured
everybody''s thoughts - if I did not - that is _not_ intentional - it
has just
been quite some time..

Anyhow .. the patches explore what the "lib/dmapool.c" does - which is
to have a
DMA pool that the device has associated with. I kind of married that code
along with drivers/gpu/drm/ttm/ttm_page_alloc.c to create a TTM DMA pool code.
The end result is DMA pool with extra features: can do write-combine, uncached,
writeback (and tracks them and sets back to WB when freed); tracks
"cached"
pages that don''t really need to be returned to a pool; and hooks up to
the shrinker code so that the pools can be shrunk.

If you guys think this set of patches make sense  - my future plans were
 1) Get this in large crowd of testing .. and if it works for a kernel release
 2) to move a bulk of this in the lib/dmapool.c (I spoke with Matthew Wilcox
    about it and he is OK as long as I don''t introduce performance
regressions).

But before I do any of that a second set of eyes taking a look at these
patches would be most welcome.

In regards to testing, I''ve been running them non-stop for the last
month.
(and found some issues which I''ve fixed up) - and been quite happy with
how
they work.

Michel (thanks!) took a spin of the patches on his PowerPC and they did not
cause any regressions (wheew).

The patches are also located in a git tree:

 git://oss.oracle.com/git/kwilk/xen.git devel/ttm.dma_pool.v2.1


Konrad Rzeszutek Wilk (11):
      swiotlb: Expose swiotlb_nr_tlb function to modules
      nouveau/radeon: Set coherent DMA mask
      ttm/radeon/nouveau: Check the DMA address from TTM against known value.
      ttm: Wrap ttm_[put|get]_pages and extract GFP_* and caching states from
''struct ttm_tt''
      ttm: Get rid of temporary scaffolding
      ttm/driver: Expand ttm_backend_func to include two overrides for TTM page
pool.
      ttm: Do not set the ttm->be to NULL before calling the TTM page pool to
free pages.
      ttm: Provide DMA aware TTM page pool code.
      ttm: Add ''no_dma'' parameter to turn the TTM DMA pool off
during runtime.
      nouveau/ttm/dma: Enable the TTM DMA pool if device can only do 32-bit DMA.
      radeon/ttm/dma: Enable the TTM DMA pool if the device can only do 32-bit.

 drivers/gpu/drm/nouveau/nouveau_debugfs.c |    1 +
 drivers/gpu/drm/nouveau/nouveau_mem.c     |    5 +
 drivers/gpu/drm/nouveau/nouveau_sgdma.c   |    8 +-
 drivers/gpu/drm/radeon/radeon_device.c    |    6 +
 drivers/gpu/drm/radeon/radeon_gart.c      |    4 +-
 drivers/gpu/drm/radeon/radeon_ttm.c       |   19 +-
 drivers/gpu/drm/ttm/Makefile              |    3 +
 drivers/gpu/drm/ttm/ttm_memory.c          |    5 +
 drivers/gpu/drm/ttm/ttm_page_alloc.c      |  108 ++-
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c  | 1446 +++++++++++++++++++++++++++++
 drivers/gpu/drm/ttm/ttm_tt.c              |   21 +-
 drivers/xen/swiotlb-xen.c                 |    2 +-
 include/drm/ttm/ttm_bo_driver.h           |   31 +
 include/drm/ttm/ttm_page_alloc.h          |   53 +-
 include/linux/swiotlb.h                   |    2 +-
 lib/swiotlb.c                             |    5 +-
 16 files changed, 1637 insertions(+), 82 deletions(-)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 01/11] swiotlb: Expose swiotlb_nr_tlb function to modules

As a mechanism to detect whether SWIOTLB is enabled or not.
We also fix the spelling - it was swioltb instead of
swiotlb.

CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
[v1: Ripped out swiotlb_enabled]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/xen/swiotlb-xen.c |    2 +-
 include/linux/swiotlb.h   |    2 +-
 lib/swiotlb.c             |    5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 6e8c15a..cbcd8cc 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -149,7 +149,7 @@ void __init xen_swiotlb_init(int verbose)
 	int rc;
 	unsigned long nr_tbl;
 
-	nr_tbl = swioltb_nr_tbl();
+	nr_tbl = swiotlb_nr_tbl();
 	if (nr_tbl)
 		xen_io_tlb_nslabs = nr_tbl;
 	else {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 445702c..e872526 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -24,7 +24,7 @@ extern int swiotlb_force;
 
 extern void swiotlb_init(int verbose);
 extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int
verbose);
-extern unsigned long swioltb_nr_tbl(void);
+extern unsigned long swiotlb_nr_tbl(void);
 
 /*
  * Enumeration for sync targets
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 99093b3..058935e 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -110,11 +110,11 @@ setup_io_tlb_npages(char *str)
 __setup("swiotlb=", setup_io_tlb_npages);
 /* make io_tlb_overflow tunable too? */
 
-unsigned long swioltb_nr_tbl(void)
+unsigned long swiotlb_nr_tbl(void)
 {
 	return io_tlb_nslabs;
 }
-
+EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
 /* Note that this doesn''t work with highmem page */
 static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
 				      volatile void *address)
@@ -321,6 +321,7 @@ void __init swiotlb_free(void)
 		free_bootmem_late(__pa(io_tlb_start),
 				  PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
 	}
+	io_tlb_nslabs = 0;
 }
 
 static int is_swiotlb_buffer(phys_addr_t paddr)
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 02/11] nouveau/radeon: Set coherent DMA mask

All the storage devices that use the dmapool set the coherent DMA
mask so they can properly use the dmapool. Since the TTM DMA pool
code is based on that and dma_alloc_coherent checks the
''coherent_dma_mask'' and not ''dma_mask'' we
want to set it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/nouveau/nouveau_mem.c  |    5 +++++
 drivers/gpu/drm/radeon/radeon_device.c |    6 ++++++
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_mem.c
b/drivers/gpu/drm/nouveau/nouveau_mem.c
index 5ee14d2..8b39520 100644
--- a/drivers/gpu/drm/nouveau/nouveau_mem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_mem.c
@@ -408,6 +408,11 @@ nouveau_mem_vram_init(struct drm_device *dev)
 	if (ret)
 		return ret;
 
+	ret = pci_set_consistent_dma_mask(dev->pdev, DMA_BIT_MASK(dma_bits));
+	if (ret) {
+		/* Reset to default value. */
+		pci_set_consistent_dma_mask(dev->pdev, DMA_BIT_MASK(32));
+	}
 	dev_priv->fb_phys = pci_resource_start(dev->pdev, 1);
 
 	ret = nouveau_ttm_global_init(dev_priv);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c
b/drivers/gpu/drm/radeon/radeon_device.c
index 7cfaa7e..0c0a970 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -757,8 +757,14 @@ int radeon_device_init(struct radeon_device *rdev,
 	r = pci_set_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits));
 	if (r) {
 		rdev->need_dma32 = true;
+		dma_bits = 32;
 		printk(KERN_WARNING "radeon: No suitable DMA available.\n");
 	}
+	r = pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(dma_bits));
+	if (r) {
+		pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32));
+		printk(KERN_WARNING "radeon: No coherent DMA available.\n");
+	}
 
 	/* Registers mapping */
 	/* TODO: block userspace mapping of io register */
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 03/11] ttm/radeon/nouveau: Check the DMA address from TTM against known value.

. instead of checking against the DMA_ERROR_CODE value which is
per-platform specific. The zero value is a known invalid value
that the TTM layer sets on the dma_address array if it is not
used (ttm_tt_alloc_page_directory calls drm_calloc_large which
creates a page with GFP_ZERO).

We can''t use pci_dma_mapping_error as that is IOMMU
specific (some check for a specific physical address, some
for ranges, some just do a check against zero).

Also update the comments in the header about the true state
of that parameter.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/nouveau/nouveau_sgdma.c |    3 +--
 drivers/gpu/drm/radeon/radeon_gart.c    |    4 +---
 include/drm/ttm/ttm_page_alloc.h        |    4 ++--
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index 82fad91..9b570c3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -42,8 +42,7 @@ nouveau_sgdma_populate(struct ttm_backend *be, unsigned long
num_pages,
 
 	nvbe->nr_pages = 0;
 	while (num_pages--) {
-		/* this code path isn''t called and is incorrect anyways */
-		if (0) { /*dma_addrs[nvbe->nr_pages] != DMA_ERROR_CODE)*/
+		if (dma_addrs[nvbe->nr_pages] != 0) {
 			nvbe->pages[nvbe->nr_pages]  					dma_addrs[nvbe->nr_pages];
 		 	nvbe->ttm_alloced[nvbe->nr_pages] = true;
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c
b/drivers/gpu/drm/radeon/radeon_gart.c
index a533f52..068ba09 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -181,9 +181,7 @@ int radeon_gart_bind(struct radeon_device *rdev, unsigned
offset,
 	p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
 
 	for (i = 0; i < pages; i++, p++) {
-		/* we reverted the patch using dma_addr in TTM for now but this
-		 * code stops building on alpha so just comment it out for now */
-		if (0) { /*dma_addr[i] != DMA_ERROR_CODE) */
+		if (dma_addr[i] != 0) {
 			rdev->gart.ttm_alloced[p] = true;
 			rdev->gart.pages_addr[p] = dma_addr[i];
 		} else {
diff --git a/include/drm/ttm/ttm_page_alloc.h b/include/drm/ttm/ttm_page_alloc.h
index 8062890..0017b17 100644
--- a/include/drm/ttm/ttm_page_alloc.h
+++ b/include/drm/ttm/ttm_page_alloc.h
@@ -36,7 +36,7 @@
  * @flags: ttm flags for page allocation.
  * @cstate: ttm caching state for the page.
  * @count: number of pages to allocate.
- * @dma_address: The DMA (bus) address of pages (if TTM_PAGE_FLAG_DMA32 set).
+ * @dma_address: The DMA (bus) address of pages - (by default zero).
  */
 int ttm_get_pages(struct list_head *pages,
 		  int flags,
@@ -51,7 +51,7 @@ int ttm_get_pages(struct list_head *pages,
  * count.
  * @flags: ttm flags for page allocation.
  * @cstate: ttm caching state.
- * @dma_address: The DMA (bus) address of pages (if TTM_PAGE_FLAG_DMA32 set).
+ * @dma_address: The DMA (bus) address of pages (by default zero).
  */
 void ttm_put_pages(struct list_head *pages,
 		   unsigned page_count,
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 04/11] ttm: Wrap ttm_[put|get]_pages and extract GFP_* and caching states from ''struct ttm_tt''

Instead of passing the ''int flags'' and ''enum
caching_state caching_state''
as parameters, pass in the ''struct ttm_tt'' and let the
ttm_[put|get]_pages
extract those parameters.

We also wrap the ttm_[put|get]_pages so that we can extract those two
parameters from the ''struct ttm_tt''. The reason for wrapping
instead
of just changing the two functions declerations outright is to
support the next set of patches which will provide an
override mechanism for ''ttm_[put|get]_pages'' functions.

Temporarily we put in a function declerations for the __ttm_put_pages,
which later on we will remove (by moving the __ttm_put_pages).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   29 +++++++++++++++++++++++------
 drivers/gpu/drm/ttm/ttm_tt.c         |    6 ++----
 include/drm/ttm/ttm_page_alloc.h     |   16 ++++++----------
 3 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index d948575..c30d62b 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -660,13 +660,16 @@ out:
 	return count;
 }
 
+static void __ttm_put_pages(struct list_head *pages, unsigned page_count,
+			    int flags, enum ttm_caching_state cstate,
+			    dma_addr_t *dma_address);
 /*
  * On success pages list will hold count number of correctly
  * cached pages.
  */
-int ttm_get_pages(struct list_head *pages, int flags,
-		  enum ttm_caching_state cstate, unsigned count,
-		  dma_addr_t *dma_address)
+static int __ttm_get_pages(struct list_head *pages, int flags,
+			   enum ttm_caching_state cstate, unsigned count,
+			   dma_addr_t *dma_address)
 {
 	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
 	struct page *p = NULL;
@@ -724,7 +727,7 @@ int ttm_get_pages(struct list_head *pages, int flags,
 			printk(KERN_ERR TTM_PFX
 			       "Failed to allocate extra pages "
 			       "for large request.");
-			ttm_put_pages(pages, 0, flags, cstate, NULL);
+			__ttm_put_pages(pages, 0, flags, cstate, NULL);
 			return r;
 		}
 	}
@@ -734,8 +737,9 @@ int ttm_get_pages(struct list_head *pages, int flags,
 }
 
 /* Put all pages in pages list to correct pool to wait for reuse */
-void ttm_put_pages(struct list_head *pages, unsigned page_count, int flags,
-		   enum ttm_caching_state cstate, dma_addr_t *dma_address)
+static void __ttm_put_pages(struct list_head *pages, unsigned page_count,
+			    int flags, enum ttm_caching_state cstate,
+			    dma_addr_t *dma_address)
 {
 	unsigned long irq_flags;
 	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
@@ -857,3 +861,16 @@ int ttm_page_alloc_debugfs(struct seq_file *m, void *data)
 	return 0;
 }
 EXPORT_SYMBOL(ttm_page_alloc_debugfs);
+int ttm_get_pages(struct ttm_tt *ttm, struct list_head *pages,
+		  unsigned count, dma_addr_t *dma_address)
+{
+	return __ttm_get_pages(pages, ttm->page_flags, ttm->caching_state,
+				count, dma_address);
+}
+{
+void ttm_put_pages(struct ttm_tt *ttm, struct list_head *pages,
+		   unsigned page_count, dma_addr_t *dma_address)
+{
+	__ttm_put_pages(pages, page_count, ttm->page_flags, ttm->caching_state,
+			dma_address);
+}
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 58c271e..76c982f 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -110,8 +110,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm,
int index)
 
 		INIT_LIST_HEAD(&h);
 
-		ret = ttm_get_pages(&h, ttm->page_flags, ttm->caching_state, 1,
-				    &ttm->dma_address[index]);
+		ret = ttm_get_pages(ttm, &h, 1, &ttm->dma_address[index]);
 
 		if (ret != 0)
 			return NULL;
@@ -304,8 +303,7 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm)
 			count++;
 		}
 	}
-	ttm_put_pages(&h, count, ttm->page_flags, ttm->caching_state,
-		      ttm->dma_address);
+	ttm_put_pages(ttm, &h, count, ttm->dma_address);
 	ttm->state = tt_unpopulated;
 	ttm->first_himem_page = ttm->num_pages;
 	ttm->last_lomem_page = -1;
diff --git a/include/drm/ttm/ttm_page_alloc.h b/include/drm/ttm/ttm_page_alloc.h
index 0017b17..0aaac39 100644
--- a/include/drm/ttm/ttm_page_alloc.h
+++ b/include/drm/ttm/ttm_page_alloc.h
@@ -32,31 +32,27 @@
 /**
  * Get count number of pages from pool to pages list.
  *
+ * @ttm: ttm which contains flags for page allocation and caching state.
  * @pages: heado of empty linked list where pages are filled.
- * @flags: ttm flags for page allocation.
- * @cstate: ttm caching state for the page.
  * @count: number of pages to allocate.
  * @dma_address: The DMA (bus) address of pages - (by default zero).
  */
-int ttm_get_pages(struct list_head *pages,
-		  int flags,
-		  enum ttm_caching_state cstate,
+int ttm_get_pages(struct ttm_tt *ttm,
+		  struct list_head *pages,
 		  unsigned count,
 		  dma_addr_t *dma_address);
 /**
  * Put linked list of pages to pool.
  *
+ * @ttm: ttm which contains flags for page allocation and caching state.
  * @pages: list of pages to free.
  * @page_count: number of pages in the list. Zero can be passed for unknown
  * count.
- * @flags: ttm flags for page allocation.
- * @cstate: ttm caching state.
  * @dma_address: The DMA (bus) address of pages (by default zero).
  */
-void ttm_put_pages(struct list_head *pages,
+void ttm_put_pages(struct ttm_tt *ttm,
+		   struct list_head *pages,
 		   unsigned page_count,
-		   int flags,
-		   enum ttm_caching_state cstate,
 		   dma_addr_t *dma_address);
 /**
  * Initialize pool allocator.
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 05/11] ttm: Get rid of temporary scaffolding

which was used in the "ttm: Wrap ttm_[put|get]_pages and
extract GFP_* and caching states from ''struct ttm_tt" patch.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   83 ++++++++++++++++-----------------
 1 files changed, 40 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index c30d62b..24c0340 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -660,9 +660,48 @@ out:
 	return count;
 }
 
+/* Put all pages in pages list to correct pool to wait for reuse */
 static void __ttm_put_pages(struct list_head *pages, unsigned page_count,
 			    int flags, enum ttm_caching_state cstate,
-			    dma_addr_t *dma_address);
+			    dma_addr_t *dma_address)
+{
+	unsigned long irq_flags;
+	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
+	struct page *p, *tmp;
+
+	if (pool == NULL) {
+		/* No pool for this memory type so free the pages */
+
+		list_for_each_entry_safe(p, tmp, pages, lru) {
+			__free_page(p);
+		}
+		/* Make the pages list empty */
+		INIT_LIST_HEAD(pages);
+		return;
+	}
+	if (page_count == 0) {
+		list_for_each_entry_safe(p, tmp, pages, lru) {
+			++page_count;
+		}
+	}
+
+	spin_lock_irqsave(&pool->lock, irq_flags);
+	list_splice_init(pages, &pool->list);
+	pool->npages += page_count;
+	/* Check that we don''t go over the pool limit */
+	page_count = 0;
+	if (pool->npages > _manager->options.max_size) {
+		page_count = pool->npages - _manager->options.max_size;
+		/* free at least NUM_PAGES_TO_ALLOC number of pages
+		 * to reduce calls to set_memory_wb */
+		if (page_count < NUM_PAGES_TO_ALLOC)
+			page_count = NUM_PAGES_TO_ALLOC;
+	}
+	spin_unlock_irqrestore(&pool->lock, irq_flags);
+	if (page_count)
+		ttm_page_pool_free(pool, page_count);
+}
+
 /*
  * On success pages list will hold count number of correctly
  * cached pages.
@@ -736,48 +775,6 @@ static int __ttm_get_pages(struct list_head *pages, int
flags,
 	return 0;
 }
 
-/* Put all pages in pages list to correct pool to wait for reuse */
-static void __ttm_put_pages(struct list_head *pages, unsigned page_count,
-			    int flags, enum ttm_caching_state cstate,
-			    dma_addr_t *dma_address)
-{
-	unsigned long irq_flags;
-	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
-	struct page *p, *tmp;
-
-	if (pool == NULL) {
-		/* No pool for this memory type so free the pages */
-
-		list_for_each_entry_safe(p, tmp, pages, lru) {
-			__free_page(p);
-		}
-		/* Make the pages list empty */
-		INIT_LIST_HEAD(pages);
-		return;
-	}
-	if (page_count == 0) {
-		list_for_each_entry_safe(p, tmp, pages, lru) {
-			++page_count;
-		}
-	}
-
-	spin_lock_irqsave(&pool->lock, irq_flags);
-	list_splice_init(pages, &pool->list);
-	pool->npages += page_count;
-	/* Check that we don''t go over the pool limit */
-	page_count = 0;
-	if (pool->npages > _manager->options.max_size) {
-		page_count = pool->npages - _manager->options.max_size;
-		/* free at least NUM_PAGES_TO_ALLOC number of pages
-		 * to reduce calls to set_memory_wb */
-		if (page_count < NUM_PAGES_TO_ALLOC)
-			page_count = NUM_PAGES_TO_ALLOC;
-	}
-	spin_unlock_irqrestore(&pool->lock, irq_flags);
-	if (page_count)
-		ttm_page_pool_free(pool, page_count);
-}
-
 static void ttm_page_pool_init_locked(struct ttm_page_pool *pool, int flags,
 		char *name)
 {
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

The two overrides will be choosen by the backends whether they
want to use a different TTM page pool than the default.

If the backend does not choose a new override, the default one
will be used.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c |   10 +++++++---
 include/drm/ttm/ttm_bo_driver.h      |   31 +++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 24c0340..360afb3 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -861,13 +861,17 @@ EXPORT_SYMBOL(ttm_page_alloc_debugfs);
 int ttm_get_pages(struct ttm_tt *ttm, struct list_head *pages,
 		  unsigned count, dma_addr_t *dma_address)
 {
+	if (ttm->be && ttm->be->func &&
ttm->be->func->get_pages)
+		return ttm->be->func->get_pages(ttm, pages, count, dma_address);
 	return __ttm_get_pages(pages, ttm->page_flags, ttm->caching_state,
 				count, dma_address);
 }
-{
 void ttm_put_pages(struct ttm_tt *ttm, struct list_head *pages,
 		   unsigned page_count, dma_addr_t *dma_address)
 {
-	__ttm_put_pages(pages, page_count, ttm->page_flags, ttm->caching_state,
-			dma_address);
+	if (ttm->be && ttm->be->func &&
ttm->be->func->put_pages)
+		ttm->be->func->put_pages(ttm, pages, page_count, dma_address);
+	else
+		__ttm_put_pages(pages, page_count, ttm->page_flags,
+				ttm->caching_state, dma_address);
 }
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 09af2d7..1826c3b 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -100,6 +100,34 @@ struct ttm_backend_func {
 	 * Destroy the backend.
 	 */
 	void (*destroy) (struct ttm_backend *backend);
+
+	/**
+	 * ttm_get_pages override. The backend can override the default
+	 * TTM page pool code with a different one.
+	 *
+	 * Get count number of pages from pool to pages list.
+	 *
+	 * @ttm: ttm which contains flags for page allocation and caching state.
+	 * @pages: head of empty linked list where pages are filled.
+	 * @dma_address: The DMA (bus) address of pages
+	 */
+	int (*get_pages) (struct ttm_tt *ttm, struct list_head *pages,
+			  unsigned count, dma_addr_t *dma_address);
+
+	/**
+	 * ttm_put_pages override. The backend can override the default
+	 * TTM page pool code with a different implementation.
+	 *
+	 * Put linked list of pages to pool.
+	 *
+	 * @ttm: ttm which contains flags for page allocation and caching state.
+	 * @pages: list of pages to free.
+	 * @page_count: number of pages in the list. Zero can be passed for
+	 * unknown count.
+	 * @dma_address: The DMA (bus) address of pages
+	 */
+	void (*put_pages) (struct ttm_tt *ttm, struct list_head *pages,
+			   unsigned page_count, dma_addr_t *dma_address);
 };
 
 /**
@@ -109,6 +137,8 @@ struct ttm_backend_func {
  * @flags: For driver use.
  * @func: Pointer to a struct ttm_backend_func that describes
  * the backend methods.
+ * @dev: Pointer to a struct device which can be used by the TTM
+ *  [get|put)_pages overrides in ''struct ttm_backend_func''.
  *
  */
 
@@ -116,6 +146,7 @@ struct ttm_backend {
 	struct ttm_bo_device *bdev;
 	uint32_t flags;
 	struct ttm_backend_func *func;
+	struct device *dev;
 };
 
 #define TTM_PAGE_FLAG_USER            (1 << 1)
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 07/11] ttm: Do not set the ttm->be to NULL before calling the TTM page pool to free pages.

. as the ttm->be->func->[get|put]_pages can be called and they would
dereference on ttm->be which was set to NULL.

Instead of clearing it there, pass in a flag to the ttm_tt_free_allocated_pages
whether to clear the pages or not (you are not suppose to clear the pages
when destroying them).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c |   15 +++++++--------
 1 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 76c982f..31ae359 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -276,7 +276,7 @@ int ttm_tt_set_placement_caching(struct ttm_tt *ttm,
uint32_t placement)
 }
 EXPORT_SYMBOL(ttm_tt_set_placement_caching);
 
-static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm)
+static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm, bool call_clear)
 {
 	int i;
 	unsigned count = 0;
@@ -286,7 +286,7 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm)
 
 	INIT_LIST_HEAD(&h);
 
-	if (be)
+	if (be && call_clear)
 		be->func->clear(be);
 	for (i = 0; i < ttm->num_pages; ++i) {
 
@@ -317,16 +317,14 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
 		return;
 
 	be = ttm->be;
-	if (likely(be != NULL)) {
+	if (likely(be != NULL))
 		be->func->destroy(be);
-		ttm->be = NULL;
-	}
 
 	if (likely(ttm->pages != NULL)) {
 		if (ttm->page_flags & TTM_PAGE_FLAG_USER)
 			ttm_tt_free_user_pages(ttm);
 		else
-			ttm_tt_free_alloced_pages(ttm);
+			ttm_tt_free_alloced_pages(ttm, false);
 
 		ttm_tt_free_page_directory(ttm);
 	}
@@ -335,6 +333,7 @@ void ttm_tt_destroy(struct ttm_tt *ttm)
 	    ttm->swap_storage)
 		fput(ttm->swap_storage);
 
+	ttm->be = NULL;
 	kfree(ttm);
 }
 
@@ -509,7 +508,7 @@ static int ttm_tt_swapin(struct ttm_tt *ttm)
 
 	return 0;
 out_err:
-	ttm_tt_free_alloced_pages(ttm);
+	ttm_tt_free_alloced_pages(ttm, true);
 	return ret;
 }
 
@@ -573,7 +572,7 @@ int ttm_tt_swapout(struct ttm_tt *ttm, struct file
*persistent_swap_storage)
 		page_cache_release(to_page);
 	}
 
-	ttm_tt_free_alloced_pages(ttm);
+	ttm_tt_free_alloced_pages(ttm, true);
 	ttm->swap_storage = swap_storage;
 	ttm->page_flags |= TTM_PAGE_FLAG_SWAPPED;
 	if (persistent_swap_storage)
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 08/11] ttm: Provide DMA aware TTM page pool code.

In TTM world the pages for the graphic drivers are kept in three different
pools: write combined, uncached, and cached (write-back). When the pages
are used by the graphic driver the graphic adapter via its built in MMU
(or AGP) programs these pages in. The programming requires the virtual address
(from the graphic adapter perspective) and the physical address (either System
RAM
or the memory on the card) which is obtained using the pci_map_* calls (which
does the
virtual to physical - or bus address translation). During the graphic
application''s
"life" those pages can be shuffled around, swapped out to disk, moved
from the
VRAM to System RAM or vice-versa. This all works with the existing TTM pool code
- except when we want to use the software IOTLB (SWIOTLB) code to
"map" the physical
addresses to the graphic adapter MMU. We end up programming the bounce
buffer''s
physical address instead of the TTM pool memory''s and get a non-worky
driver.
There are two solutions:
1) using the DMA API to allocate pages that are screened by the DMA API, or
2) using the pci_sync_* calls to copy the pages from the bounce-buffer and back.

This patch fixes the issue by allocating pages using the DMA API. The second
is a viable option - but it has performance drawbacks and potential correctness
issues - think of the write cache page being bounced (SWIOTLB->TTM), the
WC is set on the TTM page and the copy from SWIOTLB not making it to the TTM
page until the page has been recycled in the pool (and used by another
application).

The bounce buffer does not get activated often - only in cases where we have
a 32-bit capable card and we want to use a page that is allocated above the
4GB limit. The bounce buffer offers the solution of copying the contents
of that 4GB page to an location below 4GB and then back when the operation has
been
completed (or vice-versa). This is done by using the
''pci_sync_*'' calls.
Note: If you look carefully enough in the existing TTM page pool code you will
notice the GFP_DMA32 flag is used  - which should guarantee that the provided
page
is under 4GB. It certainly is the case, except this gets ignored in two cases:
 - If user specifies ''swiotlb=force'' which bounces _every_
page.
 - If user is using a Xen''s PV Linux guest (which uses the SWIOTLB and
the
   underlaying PFN''s aren''t necessarily under 4GB).

To not have this extra copying done the other option is to allocate the pages
using the DMA API so that there is not need to map the page and perform the
expensive ''pci_sync_*'' calls.

This DMA API capable TTM pool requires for this the ''struct
device'' to
properly call the DMA API. It also has to track the virtual and bus address of
the page being handed out in case it ends up being swapped out or de-allocated -
to make sure it is de-allocated using the proper''s ''struct
device''.

Implementation wise the code keeps two lists: one that is attached to the
''struct device'' (via the dev->dma_pools list) and a global
one to be used when
the ''struct device'' is unavailable (think shrinker code). The
global list can
iterate over all of the ''struct device'' and its associated
dma_pool. The list
in dev->dma_pools can only iterate the device''s dma_pool.
                                                            /[struct
device_pool]\
        /---------------------------------------------------| dev               
|
       /                                            +-------| dma_pool          
|
 /-----+------\                                    /       
\--------------------/
 |struct device|     /-->[struct dma_pool for WC]</         /[struct
device_pool]\
 | dma_pools   +----+                                     /-| dev               
|
 |  ...        |    \--->[struct dma_pool for uncached]<-/--| dma_pool    
|
 \-----+------/                                         /  
\--------------------/
        \----------------------------------------------/
[Two pools associated with the device (WC and UC), and the parallel list
containing the ''struct dev'' and ''struct
dma_pool'' entries]

The maximum amount of dma pools a device can have is six: write-combined,
uncached, and cached; then there are the DMA32 variants which are:
write-combined dma32, uncached dma32, and cached dma32.

Currently this code only gets activated when any variant of the SWIOTLB IOMMU
code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xen PV
with PCI devices).

Tested-by: Michel Dänzer <michel@daenzer.net>
[v1: Using swiotlb_nr_tbl instead of swiotlb_enabled]
[v2: Major overhaul - added ''inuse_list'' to seperate used from
inuse and reorder
the order of lists to get better performance.]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/Makefile             |    3 +
 drivers/gpu/drm/ttm/ttm_memory.c         |    2 +
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 1394 ++++++++++++++++++++++++++++++
 include/drm/ttm/ttm_page_alloc.h         |   31 +
 4 files changed, 1430 insertions(+), 0 deletions(-)
 create mode 100644 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c

diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile
index f3cf6f0..8300bc0 100644
--- a/drivers/gpu/drm/ttm/Makefile
+++ b/drivers/gpu/drm/ttm/Makefile
@@ -7,4 +7,7 @@ ttm-y := ttm_agp_backend.o ttm_memory.o ttm_tt.o ttm_bo.o \
 	ttm_object.o ttm_lock.o ttm_execbuf_util.o ttm_page_alloc.o \
 	ttm_bo_manager.o
 
+ifeq ($(CONFIG_SWIOTLB),y)
+ttm-y += ttm_page_alloc_dma.o
+endif
 obj-$(CONFIG_DRM_TTM) += ttm.o
diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index e70ddd8..6d24fe2 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -395,6 +395,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
 		       zone->name, (unsigned long long) zone->max_mem >> 10);
 	}
 	ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
+	ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
 	return 0;
 out_no_zone:
 	ttm_mem_global_release(glob);
@@ -410,6 +411,7 @@ void ttm_mem_global_release(struct ttm_mem_global *glob)
 	/* let the page allocator first stop the shrink work. */
 	ttm_page_alloc_fini();
 
+	ttm_dma_page_alloc_fini();
 	flush_workqueue(glob->swap_queue);
 	destroy_workqueue(glob->swap_queue);
 	glob->swap_queue = NULL;
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
new file mode 100644
index 0000000..d6d8240
--- /dev/null
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -0,0 +1,1394 @@
+/*
+ * Copyright 2011 (c) Oracle Corp.
+
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
"Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sub license,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
+ */
+
+/*
+ * A simple DMA pool losely based on dmapool.c. It has certain advantages
+ * over the DMA pools:
+ * - Pool collects resently freed pages for reuse (and hooks up to
+ *   the shrinker).
+ * - Tracks currently in use pages
+ * - Tracks whether the page is UC, WB or cached (and reverts to WB
+ *   when freed).
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/list.h>
+#include <linux/seq_file.h> /* for seq_printf */
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/highmem.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/mm.h>
+#include <linux/atomic.h>
+#include <linux/device.h>
+#include <linux/kthread.h>
+#include "ttm/ttm_bo_driver.h"
+#include "ttm/ttm_page_alloc.h"
+#ifdef TTM_HAS_AGP
+#include <asm/agp.h>
+#endif
+
+#define NUM_PAGES_TO_ALLOC		(PAGE_SIZE/sizeof(struct page *))
+#define SMALL_ALLOCATION		16
+#define FREE_ALL_PAGES			(~0U)
+/* times are in msecs */
+#define IS_UNDEFINED			(0)
+#define IS_WC				(1<<1)
+#define IS_UC				(1<<2)
+#define IS_CACHED			(1<<3)
+#define IS_DMA32			(1<<4)
+
+enum pool_type {
+	POOL_IS_UNDEFINED,
+	POOL_IS_WC = IS_WC,
+	POOL_IS_UC = IS_UC,
+	POOL_IS_CACHED = IS_CACHED,
+	POOL_IS_WC_DMA32 = IS_WC | IS_DMA32,
+	POOL_IS_UC_DMA32 = IS_UC | IS_DMA32,
+	POOL_IS_CACHED_DMA32 = IS_CACHED | IS_DMA32,
+};
+/*
+ * The pool structure. There are usually six pools:
+ *  - generic (not restricted to DMA32):
+ *      - write combined, uncached, cached.
+ *  - dma32 (up to 2^32 - so up 4GB):
+ *      - write combined, uncached, cached.
+ * for each ''struct device''. The ''cached''
is for pages that are actively used.
+ * The other ones can be shrunk by the shrinker API if neccessary.
+ * @pools: The ''struct device->dma_pools'' link.
+ * @type: Type of the pool
+ * @lock: Protects the inuse_list and free_list from concurrnet access. Must be
+ * used with irqsave/irqrestore variants because pool allocator maybe called
+ * from delayed work.
+ * @inuse_list: Pool of pages that are in use. The order is very important and
+ *   it is in the order that the TTM pages that are put back are in.
+ * @free_list: Pool of pages that are free to be used. No order requirements.
+ * @dev: The device that is associated with these pools.
+ * @size: Size used during DMA allocation.
+ * @npages_free: Count of available pages for re-use.
+ * @npages_in_use: Count of pages that are in use (each of them
+ *   is marked in_use.
+ * @nfrees: Stats when pool is shrinking.
+ * @nrefills: Stats when the pool is grown.
+ * @gfp_flags: Flags to pass for alloc_page.
+ * @fill_lock: Allows only one pool fill operation at time.
+ * @name: Name of the pool.
+ * @dev_name: Name derieved from dev - similar to how dev_info works.
+ *   Used during shutdown as the dev_info during release is unavailable.
+ */
+struct dma_pool {
+	struct list_head pools; /* The ''struct device->dma_pools link */
+	enum pool_type type;
+	spinlock_t lock;
+	struct list_head inuse_list;
+	struct list_head free_list;
+	struct device *dev;
+	unsigned size;
+	unsigned npages_free;
+	unsigned npages_in_use;
+	unsigned long nfrees; /* Stats when shrunk. */
+	unsigned long nrefills; /* Stats when grown. */
+	gfp_t gfp_flags;
+	bool fill_lock;
+	char name[13]; /* "cached dma32" */
+	char dev_name[64]; /* Constructed from dev */
+};
+
+/*
+ * The accounting page keeping track of the allocated page along with
+ * the DMA address.
+ * @page_list: The link to the ''page_list'' in
''struct dma_pool''.
+ * @vaddr: The virtual address of the page
+ * @dma: The bus address of the page. If the page is not allocated
+ *   via the DMA API, it will be -1.
+ * @in_use: Set to true if in use. Should not be freed.
+ */
+struct dma_page {
+	struct list_head page_list;
+	void *vaddr;
+	struct page *p;
+	dma_addr_t dma;
+};
+
+/*
+ * Limits for the pool. They are handled without locks because only place where
+ * they may change is in sysfs store. They won''t have immediate effect
anyway
+ * so forcing serialization to access them is pointless.
+ */
+
+struct ttm_pool_opts {
+	unsigned	alloc_size;
+	unsigned	max_size;
+	unsigned	small;
+};
+
+/*
+ * Contains the list of all of the ''struct device'' and their
corresponding
+ * DMA pools. Guarded by _mutex->lock.
+ * @pools: The link to ''struct ttm_pool_manager->pools''
+ * @dev: The ''struct device'' associated with the
''pool''
+ * @pool: The ''struct dma_pool'' associated with the
''dev''
+ */
+struct device_pools {
+	struct list_head pools;
+	struct device *dev;
+	struct dma_pool *pool;
+};
+
+/*
+ * struct ttm_pool_manager - Holds memory pools for fast allocation
+ *
+ * @lock: Lock used when adding/removing from pools
+ * @pools: List of ''struct device'' and ''struct
dma_pool'' tuples.
+ * @options: Limits for the pool.
+ * @npools: Total amount of pools in existence.
+ * @shrinker: The structure used by [un|]register_shrinker
+ */
+struct ttm_pool_manager {
+	struct mutex		lock;
+	struct list_head	pools;
+	struct ttm_pool_opts	options;
+	unsigned		npools;
+	struct shrinker		mm_shrink;
+	struct kobject		kobj;
+};
+
+static struct ttm_pool_manager *_manager;
+
+static struct attribute ttm_page_pool_max = {
+	.name = "pool_max_size",
+	.mode = S_IRUGO | S_IWUSR
+};
+static struct attribute ttm_page_pool_small = {
+	.name = "pool_small_allocation",
+	.mode = S_IRUGO | S_IWUSR
+};
+static struct attribute ttm_page_pool_alloc_size = {
+	.name = "pool_allocation_size",
+	.mode = S_IRUGO | S_IWUSR
+};
+
+static struct attribute *ttm_pool_attrs[] = {
+	&ttm_page_pool_max,
+	&ttm_page_pool_small,
+	&ttm_page_pool_alloc_size,
+	NULL
+};
+
+static void ttm_pool_kobj_release(struct kobject *kobj)
+{
+	struct ttm_pool_manager *m +		container_of(kobj, struct ttm_pool_manager,
kobj);
+	kfree(m);
+}
+
+static ssize_t ttm_pool_store(struct kobject *kobj, struct attribute *attr,
+			      const char *buffer, size_t size)
+{
+	struct ttm_pool_manager *m +		container_of(kobj, struct ttm_pool_manager,
kobj);
+	int chars;
+	unsigned val;
+	chars = sscanf(buffer, "%u", &val);
+	if (chars == 0)
+		return size;
+
+	/* Convert kb to number of pages */
+	val = val / (PAGE_SIZE >> 10);
+
+	if (attr == &ttm_page_pool_max)
+		m->options.max_size = val;
+	else if (attr == &ttm_page_pool_small)
+		m->options.small = val;
+	else if (attr == &ttm_page_pool_alloc_size) {
+		if (val > NUM_PAGES_TO_ALLOC*8) {
+			printk(KERN_ERR TTM_PFX
+			       "Setting allocation size to %lu "
+			       "is not allowed. Recommended size is "
+			       "%lu\n",
+			       NUM_PAGES_TO_ALLOC*(PAGE_SIZE >> 7),
+			       NUM_PAGES_TO_ALLOC*(PAGE_SIZE >> 10));
+			return size;
+		} else if (val > NUM_PAGES_TO_ALLOC) {
+			printk(KERN_WARNING TTM_PFX
+			       "Setting allocation size to "
+			       "larger than %lu is not recommended.\n",
+			       NUM_PAGES_TO_ALLOC*(PAGE_SIZE >> 10));
+		}
+		m->options.alloc_size = val;
+	}
+
+	return size;
+}
+
+static ssize_t ttm_pool_show(struct kobject *kobj, struct attribute *attr,
+			     char *buffer)
+{
+	struct ttm_pool_manager *m +		container_of(kobj, struct ttm_pool_manager,
kobj);
+	unsigned val = 0;
+
+	if (attr == &ttm_page_pool_max)
+		val = m->options.max_size;
+	else if (attr == &ttm_page_pool_small)
+		val = m->options.small;
+	else if (attr == &ttm_page_pool_alloc_size)
+		val = m->options.alloc_size;
+
+	val = val * (PAGE_SIZE >> 10);
+
+	return snprintf(buffer, PAGE_SIZE, "%u\n", val);
+}
+
+static const struct sysfs_ops ttm_pool_sysfs_ops = {
+	.show = &ttm_pool_show,
+	.store = &ttm_pool_store,
+};
+
+static struct kobj_type ttm_pool_kobj_type = {
+	.release = &ttm_pool_kobj_release,
+	.sysfs_ops = &ttm_pool_sysfs_ops,
+	.default_attrs = ttm_pool_attrs,
+};
+
+#ifndef CONFIG_X86
+static int set_pages_array_wb(struct page **pages, int addrinarray)
+{
+#ifdef TTM_HAS_AGP
+	int i;
+
+	for (i = 0; i < addrinarray; i++)
+		unmap_page_from_agp(pages[i]);
+#endif
+	return 0;
+}
+
+static int set_pages_array_wc(struct page **pages, int addrinarray)
+{
+#ifdef TTM_HAS_AGP
+	int i;
+
+	for (i = 0; i < addrinarray; i++)
+		map_page_into_agp(pages[i]);
+#endif
+	return 0;
+}
+
+static int set_pages_array_uc(struct page **pages, int addrinarray)
+{
+#ifdef TTM_HAS_AGP
+	int i;
+
+	for (i = 0; i < addrinarray; i++)
+		map_page_into_agp(pages[i]);
+#endif
+	return 0;
+}
+#endif /* for !CONFIG_X86 */
+
+static int ttm_set_pages_caching(struct dma_pool *pool,
+				 struct page **pages, unsigned cpages)
+{
+	int r = 0;
+	/* Set page caching */
+	if (pool->type & IS_UC) {
+		r = set_pages_array_uc(pages, cpages);
+		if (r)
+			pr_err(TTM_PFX
+			       "%s: Failed to set %d pages to uc!\n",
+			       pool->dev_name, cpages);
+	}
+	if (pool->type & IS_WC) {
+		r = set_pages_array_wc(pages, cpages);
+		if (r)
+			pr_err(TTM_PFX
+			       "%s: Failed to set %d pages to wc!\n",
+			       pool->dev_name, cpages);
+	}
+	return r;
+}
+
+static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page *d_page)
+{
+	dma_addr_t dma = d_page->dma;
+	dma_free_coherent(pool->dev, pool->size, d_page->vaddr, dma);
+
+	kfree(d_page);
+	d_page = NULL;
+}
+static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool)
+{
+	struct dma_page *d_page;
+
+	d_page = kmalloc(sizeof(struct dma_page), GFP_KERNEL);
+	if (!d_page)
+		return NULL;
+
+	d_page->vaddr = dma_alloc_coherent(pool->dev, pool->size,
+					   &d_page->dma,
+					   pool->gfp_flags);
+	d_page->p = virt_to_page(d_page->vaddr);
+	if (!d_page->vaddr) {
+		kfree(d_page);
+		d_page = NULL;
+	}
+	return d_page;
+}
+static enum pool_type ttm_to_type(int flags, enum ttm_caching_state cstate)
+{
+	enum pool_type type = IS_UNDEFINED;
+
+	if (flags & TTM_PAGE_FLAG_DMA32)
+		type |= IS_DMA32;
+	if (cstate == tt_cached)
+		type |= IS_CACHED;
+	else if (cstate == tt_uncached)
+		type |= IS_UC;
+	else
+		type |= IS_WC;
+
+	return type;
+}
+static void ttm_pool_update_free_locked(struct dma_pool *pool,
+					unsigned freed_pages)
+{
+	pool->npages_free -= freed_pages;
+	pool->nfrees += freed_pages;
+
+}
+/* set memory back to wb and free the pages. */
+static void ttm_dma_pages_put(struct dma_pool *pool, struct list_head *d_pages,
+			struct page *pages[], unsigned npages)
+{
+	struct dma_page *d_page, *tmp;
+
+	if (npages && set_pages_array_wb(pages, npages))
+		pr_err(TTM_PFX "%s: Failed to set %d pages to wb!\n",
+			pool->dev_name, npages);
+
+	if (npages > 1) {
+		pr_debug("%s: (%s:%d) Freeing %d pages at once (lockless).\n",
+			pool->dev_name, pool->name, current->pid, npages);
+	}
+
+	list_for_each_entry_safe(d_page, tmp, d_pages, page_list) {
+		list_del(&d_page->page_list);
+		__ttm_dma_free_page(pool, d_page);
+	}
+}
+/*
+ * Free pages from pool.
+ *
+ * To prevent hogging the ttm_swap process we only free NUM_PAGES_TO_ALLOC
+ * number of pages in one go.
+ *
+ * @pool: to free the pages from
+ * @nr_free: If set to true will free all pages in pool
+ **/
+static unsigned ttm_dma_page_pool_free(struct dma_pool *pool, unsigned nr_free)
+{
+	unsigned long irq_flags;
+	struct dma_page *dma_p, *tmp;
+	struct page **pages_to_free;
+	struct list_head d_pages;
+	unsigned freed_pages = 0,
+		 npages_to_free = nr_free;
+
+	if (NUM_PAGES_TO_ALLOC < nr_free)
+		npages_to_free = NUM_PAGES_TO_ALLOC;
+#if 0
+	if (nr_free > 1) {
+		pr_debug("%s: (%s:%d) Attempting to free %d (%d) pages\n",
+			pool->dev_name, pool->name, current->pid,
+			npages_to_free, nr_free);
+	}
+#endif
+	pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
+			GFP_KERNEL);
+
+	if (!pages_to_free) {
+		pr_err(TTM_PFX
+		       "%s: Failed to allocate memory for pool free operation.\n",
+			pool->dev_name);
+		return 0;
+	}
+	INIT_LIST_HEAD(&d_pages);
+restart:
+	spin_lock_irqsave(&pool->lock, irq_flags);
+
+	/* We picking the oldest ones off the list */
+	list_for_each_entry_safe_reverse(dma_p, tmp, &pool->free_list,
+					 page_list) {
+		if (freed_pages >= npages_to_free)
+			break;
+
+		/* Move the dma_page from one list to another. */
+		list_move(&dma_p->page_list, &d_pages);
+
+		pages_to_free[freed_pages++] = dma_p->p;
+		/* We can only remove NUM_PAGES_TO_ALLOC at a time. */
+		if (freed_pages >= NUM_PAGES_TO_ALLOC) {
+
+			ttm_pool_update_free_locked(pool, freed_pages);
+			/**
+			 * Because changing page caching is costly
+			 * we unlock the pool to prevent stalling.
+			 */
+			spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+			ttm_dma_pages_put(pool, &d_pages, pages_to_free,
+				      freed_pages);
+
+			INIT_LIST_HEAD(&d_pages);
+
+			if (likely(nr_free != FREE_ALL_PAGES))
+				nr_free -= freed_pages;
+
+			if (NUM_PAGES_TO_ALLOC >= nr_free)
+				npages_to_free = nr_free;
+			else
+				npages_to_free = NUM_PAGES_TO_ALLOC;
+
+			freed_pages = 0;
+
+			/* free all so restart the processing */
+			if (nr_free)
+				goto restart;
+
+			/* Not allowed to fall through or break because
+			 * following context is inside spinlock while we are
+			 * outside here.
+			 */
+			goto out;
+
+		}
+	}
+
+	/* remove range of pages from the pool */
+	if (freed_pages) {
+		ttm_pool_update_free_locked(pool, freed_pages);
+		nr_free -= freed_pages;
+	}
+
+	spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+	if (freed_pages)
+		ttm_dma_pages_put(pool, &d_pages, pages_to_free, freed_pages);
+out:
+	kfree(pages_to_free);
+	return nr_free;
+}
+
+static void ttm_dma_free_pool(struct device *dev, enum pool_type type)
+{
+	struct device_pools *p;
+	struct dma_pool *pool;
+	struct dma_page *d_page, *d_tmp;
+
+	if (!dev)
+		return;
+
+	mutex_lock(&_manager->lock);
+	list_for_each_entry_reverse(p, &_manager->pools, pools) {
+		if (p->dev != dev)
+			continue;
+		pool = p->pool;
+		if (pool->type != type)
+			continue;
+
+		list_del(&p->pools);
+		kfree(p);
+		_manager->npools--;
+		break;
+	}
+	list_for_each_entry_reverse(pool, &dev->dma_pools, pools) {
+		unsigned long irq_save;
+		if (pool->type != type)
+			continue;
+		/* Takes a spinlock.. */
+		ttm_dma_page_pool_free(pool, FREE_ALL_PAGES);
+		/* .. but afterwards we can take it too */
+		spin_lock_irqsave(&pool->lock, irq_save);
+		list_for_each_entry_safe(d_page, d_tmp, &pool->inuse_list,
+					 page_list) {
+			pr_err("%s: (%s:%d) %p (%p DMA:0x%lx) busy!\n",
+				pool->dev_name, pool->name,
+				current->pid, d_page->vaddr,
+				virt_to_page(d_page->vaddr),
+				(unsigned long)d_page->dma);
+			list_del(&d_page->page_list);
+			kfree(d_page);
+			pool->npages_in_use--;
+		}
+		spin_unlock_irqrestore(&pool->lock, irq_save);
+		WARN_ON(((pool->npages_in_use + pool->npages_free) != 0));
+		/* This code path is called after _all_ references to the
+		 * struct device has been dropped - so nobody should be
+		 * touching it. In case somebody is trying to _add_ we are
+		 * guarded by the mutex. */
+		list_del(&pool->pools);
+		kfree(pool);
+		break;
+	}
+	mutex_unlock(&_manager->lock);
+}
+/*
+ * On free-ing of the ''struct device'' this deconstructor is
run.
+ * Albeit the pool might have already been freed earlier.
+ */
+static void ttm_dma_pool_release(struct device *dev, void *res)
+{
+	struct dma_pool *pool = *(struct dma_pool **)res;
+
+	if (pool)
+		ttm_dma_free_pool(dev, pool->type);
+}
+
+static int ttm_dma_pool_match(struct device *dev, void *res, void *match_data)
+{
+	return *(struct dma_pool **)res == match_data;
+}
+
+static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags,
+					  enum pool_type type)
+{
+	char *n[] = {"wc", "uc", "cached", "
dma32", "unknown",};
+	enum pool_type t[] = {IS_WC, IS_UC, IS_CACHED, IS_DMA32, IS_UNDEFINED};
+	struct device_pools *sec_pool = NULL;
+	struct dma_pool *pool = NULL, **ptr;
+	unsigned i;
+	int ret = -ENODEV;
+	char *p;
+
+	if (!dev)
+		return NULL;
+
+	ptr = devres_alloc(ttm_dma_pool_release, sizeof(*ptr), GFP_KERNEL);
+	if (!ptr)
+		return NULL;
+
+	ret = -ENOMEM;
+
+	pool = kmalloc_node(sizeof(struct dma_pool), GFP_KERNEL,
+			    dev_to_node(dev));
+	if (!pool)
+		goto err_mem;
+
+	sec_pool = kmalloc_node(sizeof(struct device_pools), GFP_KERNEL,
+				dev_to_node(dev));
+	if (!sec_pool)
+		goto err_mem;
+
+	INIT_LIST_HEAD(&sec_pool->pools);
+	sec_pool->dev = dev;
+	sec_pool->pool =  pool;
+
+	INIT_LIST_HEAD(&pool->free_list);
+	INIT_LIST_HEAD(&pool->inuse_list);
+	INIT_LIST_HEAD(&pool->pools);
+	spin_lock_init(&pool->lock);
+	pool->dev = dev;
+	pool->npages_free = pool->npages_in_use = 0;
+	pool->nfrees = 0;
+	pool->gfp_flags = flags;
+	pool->size = PAGE_SIZE;
+	pool->type = type;
+	pool->nrefills = 0;
+	pool->fill_lock = false;
+	p = pool->name;
+	for (i = 0; i < 5; i++) {
+		if (type & t[i]) {
+			p += snprintf(p, sizeof(pool->name) - (p - pool->name),
+				      "%s", n[i]);
+		}
+	}
+	*p = 0;
+	/* We copy the name for pr_ calls b/c when dma_pool_destroy is called
+	 * - the kobj->name has already been deallocated.*/
+	snprintf(pool->dev_name, sizeof(pool->dev_name), "%s %s",
+		 dev_driver_string(dev), dev_name(dev));
+	mutex_lock(&_manager->lock);
+	/* You can get the dma_pool from either the global: */
+	list_add(&sec_pool->pools, &_manager->pools);
+	_manager->npools++;
+	/* or from ''struct device'': */
+	list_add(&pool->pools, &dev->dma_pools);
+	mutex_unlock(&_manager->lock);
+
+	*ptr = pool;
+	devres_add(dev, ptr);
+
+	return pool;
+err_mem:
+	devres_free(ptr);
+	kfree(sec_pool);
+	kfree(pool);
+	return ERR_PTR(ret);
+}
+static struct dma_pool *ttm_dma_find_pool(struct device *dev,
+					  enum pool_type type)
+{
+	struct dma_pool *pool, *tmp, *found = NULL;
+
+	if (type == IS_UNDEFINED)
+		return found;
+	/* NB: We iterate on the ''struct dev'' which has no spinlock,
but
+	 * it does have a kref which we have taken. */
+	list_for_each_entry_safe(pool, tmp, &dev->dma_pools, pools) {
+		if (pool->type != type)
+			continue;
+		found = pool;
+		break;
+	}
+	return found;
+}
+
+/*
+ * Free pages the pages that failed to change the caching state. If there
+ * are pages that have changed their caching state already put them to the
+ * pool.
+ */
+static void ttm_dma_handle_caching_state_failure(struct dma_pool *pool,
+						 struct list_head *d_pages,
+						 struct page **failed_pages,
+						 unsigned cpages)
+{
+	struct dma_page *d_page, *tmp;
+	struct page *p;
+	unsigned i = 0;
+
+	p = failed_pages[0];
+	if (!p)
+		return;
+	/* Find the failed page. */
+	list_for_each_entry_safe(d_page, tmp, d_pages, page_list) {
+		if (d_page->p != p)
+			continue;
+		/* .. and then progress over the full list. */
+		list_del(&d_page->page_list);
+		__ttm_dma_free_page(pool, d_page);
+		if (++i < cpages)
+			p = failed_pages[i];
+		else
+			break;
+	}
+
+}
+/*
+ * Allocate ''count'' pages, and put ''need''
number of them on the
+ * ''pages'' and as well on the ''dma_address''
starting at ''dma_offset'' offset.
+ * The full list of pages should also be on ''d_pages''.
+ * We return zero for success, and negative numbers as errors.
+ */
+static int ttm_dma_pool_alloc_new_pages(struct dma_pool *pool,
+					struct list_head *d_pages,
+					unsigned count)
+{
+	struct page **caching_array;
+	struct dma_page *dma_p;
+	struct page *p;
+	int r = 0;
+	unsigned i, cpages;
+	unsigned max_cpages = min(count,
+			(unsigned)(PAGE_SIZE/sizeof(struct page *)));
+
+	/* allocate array for page caching change */
+	caching_array = kmalloc(max_cpages*sizeof(struct page *), GFP_KERNEL);
+
+	if (!caching_array) {
+		pr_err(TTM_PFX
+		       "%s: Unable to allocate table for new pages.",
+			pool->dev_name);
+		return -ENOMEM;
+	}
+
+	if (count > 1) {
+		pr_debug("%s: (%s:%d) Getting %d pages\n",
+			pool->dev_name, pool->name, current->pid,
+			count);
+	}
+
+	for (i = 0, cpages = 0; i < count; ++i) {
+		dma_p = __ttm_dma_alloc_page(pool);
+		if (!dma_p) {
+			pr_err(TTM_PFX "%s: Unable to get page %u.\n",
+				pool->dev_name, i);
+
+			/* store already allocated pages in the pool after
+			 * setting the caching state */
+			if (cpages) {
+				r = ttm_set_pages_caching(pool, caching_array,
+							  cpages);
+				if (r)
+					ttm_dma_handle_caching_state_failure(
+						pool, d_pages, caching_array,
+						cpages);
+			}
+			r = -ENOMEM;
+			goto out;
+		}
+		p = dma_p->p;
+#ifdef CONFIG_HIGHMEM
+		/* gfp flags of highmem page should never be dma32 so we
+		 * we should be fine in such case
+		 */
+		if (!PageHighMem(p))
+#endif
+		{
+			caching_array[cpages++] = p;
+			if (cpages == max_cpages) {
+				/* Note: Cannot hold the spinlock */
+				r = ttm_set_pages_caching(pool, caching_array,
+						 cpages);
+				if (r) {
+					ttm_dma_handle_caching_state_failure(
+						pool, d_pages, caching_array,
+						cpages);
+					goto out;
+				}
+				cpages = 0;
+			}
+		}
+		list_add(&dma_p->page_list, d_pages);
+	}
+
+	if (cpages) {
+		r = ttm_set_pages_caching(pool, caching_array, cpages);
+		if (r)
+			ttm_dma_handle_caching_state_failure(pool, d_pages,
+					caching_array, cpages);
+	}
+out:
+	kfree(caching_array);
+	return r;
+}
+static bool ttm_dma_iterate_reverse(struct dma_pool *pool,
+				    struct dma_page *d_page,
+				    struct page *p)
+{
+
+	/* Note: When TTM layer gets pages - it gets them one page at a time
+	 * and puts them on an array (so most recently allocated page is at
+	 * at the back). The inuse_list is a copy of those pages, but in the
+	 * exact opposite order. This is b/c when TTM puts pages back, it
+	 * constructs a stack with the oldest element on the top. Hence the
+	 * inuse_list is constructed with the same order so that it will
+	 * efficiently be matched against the stack.
+	 * But, just in case the pages are not in that order, we double check
+	 * the ''pages'' against our inuse_list in case we have to go
in reverse.
+	 */
+	struct page *p_next;
+	struct dma_page *tmp;
+
+	tmp = list_entry(d_page->page_list.prev, struct dma_page, page_list);
+	if (&tmp->page_list != &pool->inuse_list) {
+		p_next = list_entry(p->lru.next, struct page, lru);
+		if (tmp->p == p_next)
+			return true;
+	}
+	return false;
+}
+
+/*
+ * Iterate forward (or backwards if ''reverse'' is true) by one
element
+ * in the pool->in_use list. We use ''d_page'' as the
starting point.
+ * The ''d_page'' upon completion of the iteration, is moved to
the
+ * ''d_pages'' list.
+ */
+static struct dma_page *ttm_dma_iterate_next(struct dma_pool *pool,
+					     struct dma_page *d_page,
+					     struct list_head *d_pages,
+					     bool reverse)
+{
+	struct dma_page *next = NULL;
+
+	if (unlikely(reverse)) {
+		if (&d_page->page_list != &pool->inuse_list)
+			next = list_entry(d_page->page_list.prev,
+					  struct dma_page,
+					  page_list);
+		list_move(&d_page->page_list, d_pages);
+	} else {
+		if (&d_page->page_list != &pool->inuse_list)
+			next = list_entry(d_page->page_list.next,
+					  struct dma_page,
+					  page_list);
+		list_move_tail(&d_page->page_list, d_pages);
+	}
+	return next;
+}
+/*
+ * Iterate forward (or backwards if ''reverse'' is true),
looking
+ * for page ''p'' in the pool->inuse_list, starting at
''start''.
+ */
+static struct dma_page *ttm_dma_iterate_forward(struct dma_pool *pool,
+						struct dma_page *start,
+						struct page *p,
+						bool reverse)
+{
+	struct dma_page *tmp = start;
+
+	if (unlikely(reverse)) {
+		list_for_each_entry_continue_reverse(tmp, &pool->inuse_list,
+						     page_list) {
+			if (p == tmp->p)
+				return tmp;
+		}
+	} else {
+		list_for_each_entry_continue(tmp, &pool->inuse_list,
+					     page_list) {
+			if (p == tmp->p)
+				return tmp;
+		}
+	}
+	return NULL;
+}
+/*
+ * Recycle (or delete) the ''pages'' that are on the
''pool''.
+ * @pool: The pool that the pages are associated with.
+ * @pages: The list of pages we are done with.
+ * @page_count: Count of how many pages (or zero if all).
+ * @erase: Instead of recycling - just free them.
+ */
+static unsigned int ttm_dma_put_pages_in_pool(struct dma_pool *pool,
+					      struct list_head *pages,
+					      unsigned page_count,
+					      bool erase)
+{
+	unsigned long uninitialized_var(irq_flags);
+	struct list_head uninitialized_var(d_pages);
+	struct page **uninitialized_var(array_pages);
+	unsigned uninitialized_var(freed_pages);
+	struct page *p, *tmp;
+	unsigned count = 0;
+	struct dma_page *d_tmp, *d_page = NULL;
+	bool rev = false;
+	if (unlikely(WARN_ON(list_empty(pages))))
+		return 0;
+
+	if (page_count == 0) {
+		list_for_each_entry(p, pages, lru)
+			++page_count;
+
+	}
+	if (page_count > 1) {
+		pr_debug("%s: (%s:%d) %s %d pages\n",
+			pool->dev_name, pool->name, current->pid,
+			erase ? "Destroying" : "Recycling", page_count);
+	}
+
+	/* d_pages is the list of ''struct dma_page'' */
+	INIT_LIST_HEAD(&d_pages);
+
+	if (erase) {
+		/* and pages_to_free is used for cache reset */
+		array_pages = kmalloc(page_count * sizeof(struct page *),
+				GFP_KERNEL);
+		if (!array_pages) {
+			dev_err(pool->dev, TTM_PFX
+			"Failed to allocate memory for pool free operation.\n");
+			return 0;
+		}
+		freed_pages = 0;
+	}
+
+	/* Find the first page of the "chunk" of pages. */
+	p = list_first_entry(pages, struct page, lru);
+	spin_lock_irqsave(&pool->lock, irq_flags);
+restart:
+	list_for_each_entry(d_tmp, &pool->inuse_list, page_list) {
+		if (p == d_tmp->p) {
+			d_page = d_tmp;
+			break;
+		}
+	}
+	/* The pages are _not_ in this pool. */
+	if (!d_page) {
+		spin_unlock_irqrestore(&pool->lock, irq_flags);
+		return 0;
+	}
+	rev = ttm_dma_iterate_reverse(pool, d_page, p);
+	if (rev)
+		pr_debug("%s: (%s:%d) Traversing %d in reverse order\n",
+			pool->dev_name, pool->name, current->pid, page_count);
+	/* Continue iterating on both lists. */
+	list_for_each_entry_safe(p, tmp, pages, lru) {
+		if (d_page->p != p && count != page_count) {
+			/* Yikes! The inuse stack is swiss cheese. Have to
+			   start looking.*/
+			d_page = ttm_dma_iterate_forward(pool, d_page, p, rev);
+			if (!d_page)
+				goto restart;
+		}
+		/* Do not advance past what we were asked to delete. */
+		if (d_page->p != p)
+			break;
+		list_del(&p->lru);
+
+		if (erase)
+			array_pages[freed_pages++] = d_page->p;
+		d_page = ttm_dma_iterate_next(pool, d_page, &d_pages, rev);
+		if (!d_page)
+			break;
+		count++;
+		/* Check if we should iterate. */
+		if (count == page_count)
+			break;
+	}
+	if (!erase) /* And stick ''em on the free pool. */
+		list_splice(&d_pages, &pool->free_list);
+
+	spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+	if (erase) {
+		/* Note: The caller of us updates the pool accounting. */
+		ttm_dma_pages_put(pool, &d_pages, array_pages /* to set WB */,
+				  freed_pages);
+		kfree(array_pages);
+	}
+	if (count > 1) {
+		pr_debug("%s: (%s:%d) %d/%d pages %s pool.\n",
+			pool->dev_name, pool->name, current->pid,
+			count, page_count,
+			erase ? "erased from inuse" : "put in free");
+	}
+	return count;
+}
+/*
+ * @return count of pages still required to fulfill the request.
+*/
+static int ttm_dma_page_pool_fill_locked(struct dma_pool *pool,
+					 unsigned count,
+					 unsigned long *irq_flags)
+{
+	int r = count;
+
+	if (pool->fill_lock)
+		return r;
+
+	pool->fill_lock = true;
+	if (count < _manager->options.small &&
+	    count > pool->npages_free) {
+		struct list_head d_pages;
+		unsigned alloc_size =  _manager->options.alloc_size;
+
+		INIT_LIST_HEAD(&d_pages);
+
+		spin_unlock_irqrestore(&pool->lock, *irq_flags);
+
+		/* Returns how many more are neccessary to fulfill the
+		 * request. */
+		r = ttm_dma_pool_alloc_new_pages(pool, &d_pages, alloc_size);
+
+		spin_lock_irqsave(&pool->lock, *irq_flags);
+		if (!r) {
+			/* Add the fresh to the end.. */
+			list_splice(&d_pages, &pool->free_list);
+			++pool->nrefills;
+			pool->npages_free += alloc_size;
+		} else {
+			struct dma_page *d_page;
+			unsigned cpages = 0;
+
+			pr_err(TTM_PFX "%s: Failed to fill %s pool (r:%d)!\n",
+				pool->dev_name, pool->name, r);
+
+			list_for_each_entry(d_page, &d_pages, page_list) {
+				cpages++;
+			}
+			list_splice_tail(&d_pages, &pool->free_list);
+			pool->npages_free += cpages;
+		}
+	}
+	pool->fill_lock = false;
+	return r;
+
+}
+
+/*
+ * @return count of pages still required to fulfill the request.
+ * The populate list is actually a stack (not that is matters as TTM
+ * allocates one page at a time.
+ */
+static int ttm_dma_pool_get_pages(struct dma_pool *pool,
+				  struct list_head *pages,
+				  dma_addr_t *dma_address, unsigned count)
+{
+	unsigned long irq_flags;
+	int r;
+	unsigned i;
+	struct dma_page *d_page, *tmp;
+	struct list_head d_pages;
+
+	spin_lock_irqsave(&pool->lock, irq_flags);
+	r = ttm_dma_page_pool_fill_locked(pool, count, &irq_flags);
+	if (r < 0) {
+		pr_debug("%s: (%s:%d) Asked for %d, got %d %s.\n",
+			pool->dev_name, pool->name, current->pid, count, r,
+			(r < 0) ? "err:" : "pages");
+		goto out;
+	}
+	if (!pool->npages_free)
+		goto out;
+	if (count > 1) {
+		pr_debug("%s: (%s:%d) Looking in free list for %d pages. "\
+			 "(have %d pages free)\n",
+			 pool->dev_name, pool->name, current->pid, count,
+			 pool->npages_free);
+	}
+	i = 0;
+	/* We are holding the spinlock.. */
+	INIT_LIST_HEAD(&d_pages);
+	/* Note: The  the ''pages'' (and inuse_list) is expected to be
a stack,
+	 * so we put the entries in the right order (and on the inuse list
+	 * in the reverse order to compenstate for freeing - which inverts the
+	 * ''pages'' order).
+	 */
+	list_for_each_entry_safe(d_page, tmp, &pool->free_list, page_list) {
+		list_add_tail(&d_page->p->lru, pages);
+		dma_address[i++] = d_page->dma;
+		list_move(&d_page->page_list, &d_pages);
+		if (i == count)
+			break;
+	}
+	/* Note: The ''inuse_list'' must have the same order as the
''pages''
+	 * to be effective when pages are put back. And since
''pages'' is
+	 * as stack, ergo inuse_list is a stack too. */
+	list_splice(&d_pages, &pool->inuse_list);
+	count -= i;
+	pool->npages_in_use += i;
+	pool->npages_free -= i;
+out:
+	spin_unlock_irqrestore(&pool->lock, irq_flags);
+	if (count)
+		pr_debug("%s: (%s:%d) Need %d more.\n",
+			pool->dev_name, pool->name, current->pid, count);
+	return count;
+}
+/*
+ * On success pages list will hold count number of correctly
+ * cached pages. On failure will hold the negative return value (-ENOMEM, etc).
+ */
+int ttm_dma_get_pages(struct ttm_tt *ttm, struct list_head *pages,
+		      unsigned count, dma_addr_t *dma_address)
+
+{
+	int r = -ENOMEM;
+	struct dma_pool *pool;
+	gfp_t gfp_flags;
+	enum pool_type type;
+	struct device *dev = ttm->be->dev;
+
+	type = ttm_to_type(ttm->page_flags, ttm->caching_state);
+
+	if (ttm->page_flags & TTM_PAGE_FLAG_DMA32)
+		gfp_flags = GFP_USER | GFP_DMA32;
+	else
+		gfp_flags = GFP_HIGHUSER;
+
+	if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)
+		gfp_flags |= __GFP_ZERO;
+
+	pool = ttm_dma_find_pool(dev, type);
+	if (!pool) {
+		pool = ttm_dma_pool_init(dev, gfp_flags, type);
+		if (IS_ERR_OR_NULL(pool))
+			return -ENOMEM;
+	}
+#if 0
+	if (count > 1) {
+		pr_debug("%s (%s:%d) Attempting to get %d pages type %x\n",
+			pool->dev_name, pool->name, current->pid, count,
+			cstate);
+	}
+#endif
+	/* Take pages out of a pool (if applicable) */
+	r = ttm_dma_pool_get_pages(pool, pages, dma_address, count);
+	/* clear the pages coming from the pool if requested */
+	if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC) {
+		struct page *p;
+		list_for_each_entry(p, pages, lru) {
+			clear_page(page_address(p));
+		}
+	}
+	/* If pool didn''t have enough pages allocate new one. */
+	if (r > 0) {
+		struct list_head d_pages;
+		unsigned pages_need = r;
+		unsigned long irq_flags;
+
+		INIT_LIST_HEAD(&d_pages);
+
+		/* Note, we are running without locking here..
+		 * and we have to manually add the stack to the inuse pool. */
+		r = ttm_dma_pool_alloc_new_pages(pool, &d_pages, pages_need);
+
+		if (r == 0) {
+			struct dma_page *d_page;
+			int i = count - 1;
+
+			/* Since the pages are directly going to the inuse_list
+			 * which is stack based, lets treat it as a stack.
+			 */
+			list_for_each_entry(d_page,  &d_pages, page_list) {
+				list_add(&d_page->p->lru, pages);
+				BUG_ON(i < 0);
+				dma_address[i--] = d_page->dma;
+			}
+			spin_lock_irqsave(&pool->lock, irq_flags);
+			pool->npages_in_use += pages_need;
+			list_splice(&d_pages, &pool->inuse_list);
+			spin_unlock_irqrestore(&pool->lock, irq_flags);
+		} else {
+			/* If there is any pages in the list put them back to
+			 * the pool. */
+			pr_err(TTM_PFX
+			       "%s: Failed to allocate extra pages "
+			       "for large request.",
+				pool->dev_name);
+			spin_lock_irqsave(&pool->lock, irq_flags);
+			pool->npages_free += r;
+			/* We don''t care about ordering on the free_list. */
+			list_splice(&d_pages, &pool->free_list);
+			spin_unlock_irqrestore(&pool->lock, irq_flags);
+			return count;
+		}
+	}
+	return r;
+}
+
+/* Get good estimation how many pages are free in pools */
+static int ttm_dma_pool_get_num_unused_pages(void)
+{
+	struct device_pools *p;
+	unsigned total = 0;
+
+	mutex_lock(&_manager->lock);
+	list_for_each_entry(p, &_manager->pools, pools) {
+		if (p)
+			total += p->pool->npages_free;
+	}
+	mutex_unlock(&_manager->lock);
+	return total;
+}
+
+/* Put all pages in pages list to correct pool to wait for reuse */
+void ttm_dma_put_pages(struct ttm_tt *ttm, struct list_head *pages,
+		       unsigned page_count, dma_addr_t *dma_address)
+{
+	struct dma_pool *pool;
+	enum pool_type type;
+	bool is_cached = false;
+	unsigned count = 0, i;
+	unsigned long irq_flags;
+	struct device *dev = ttm->be->dev;
+
+	if (list_empty(pages))
+		return;
+
+	type = ttm_to_type(ttm->page_flags, ttm->caching_state);
+	pool = ttm_dma_find_pool(dev, type);
+	if (!pool) {
+		WARN_ON(!pool);
+		return;
+	}
+	is_cached = (ttm_dma_find_pool(pool->dev,
+		     ttm_to_type(ttm->page_flags, tt_cached)) == pool);
+
+	if (page_count > 1) {
+		dev_dbg(pool->dev, "(%s:%d) Attempting to %s %d pages.\n",
+			pool->name, current->pid,
+			(is_cached) ?  "destroy" : "recycle", page_count);
+	}
+
+	count = ttm_dma_put_pages_in_pool(pool, pages, page_count, is_cached);
+
+	for (i = 0; i < count; i++)
+		dma_address[i] = 0;
+
+	spin_lock_irqsave(&pool->lock, irq_flags);
+	pool->npages_in_use -= count;
+	if (is_cached)
+		pool->nfrees += count;
+	else
+		pool->npages_free += count;
+	spin_unlock_irqrestore(&pool->lock, irq_flags);
+
+	page_count -= count;
+	WARN(page_count != 0,
+		"Only freed %d page(s) in %s. Could not free the other %d!\n",
+		count, pool->name, page_count);
+
+	page_count = 0;
+	if (pool->npages_free > _manager->options.max_size) {
+		page_count = pool->npages_free - _manager->options.max_size;
+		if (page_count < NUM_PAGES_TO_ALLOC)
+			page_count = NUM_PAGES_TO_ALLOC;
+	}
+	if (page_count)
+		ttm_dma_page_pool_free(pool, page_count);
+}
+
+/**
+ * Callback for mm to request pool to reduce number of page held.
+ */
+static int ttm_dma_pool_mm_shrink(struct shrinker *shrink,
+				  struct shrink_control *sc)
+{
+	static atomic_t start_pool = ATOMIC_INIT(0);
+	unsigned idx = 0;
+	unsigned pool_offset = atomic_add_return(1, &start_pool);
+	unsigned shrink_pages = sc->nr_to_scan;
+	struct device_pools *p;
+
+	if (list_empty(&_manager->pools))
+		return 0;
+
+	mutex_lock(&_manager->lock);
+	pool_offset = pool_offset % _manager->npools;
+	list_for_each_entry(p, &_manager->pools, pools) {
+		unsigned nr_free;
+
+		if (!p && !p->dev)
+			continue;
+		if (shrink_pages == 0)
+			break;
+		/* Do it in round-robin fashion. */
+		if (++idx < pool_offset)
+			continue;
+		nr_free = shrink_pages;
+		shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free);
+		pr_debug("%s: (%s:%d) Asked to shrink %d, have %d more to go\n",
+			p->pool->dev_name, p->pool->name, current->pid, nr_free,
+			shrink_pages);
+	}
+	mutex_unlock(&_manager->lock);
+	/* return estimated number of unused pages in pool */
+	return ttm_dma_pool_get_num_unused_pages();
+}
+
+static void ttm_dma_pool_mm_shrink_init(struct ttm_pool_manager *manager)
+{
+	manager->mm_shrink.shrink = &ttm_dma_pool_mm_shrink;
+	manager->mm_shrink.seeks = 1;
+	register_shrinker(&manager->mm_shrink);
+}
+static void ttm_dma_pool_mm_shrink_fini(struct ttm_pool_manager *manager)
+{
+	unregister_shrinker(&manager->mm_shrink);
+}
+int ttm_dma_page_alloc_init(struct ttm_mem_global *glob,
+				   unsigned max_pages)
+{
+	int ret = -ENOMEM;
+
+	WARN_ON(_manager);
+
+	printk(KERN_INFO TTM_PFX "Initializing DMA pool allocator.\n");
+
+	_manager = kzalloc(sizeof(*_manager), GFP_KERNEL);
+	if (!_manager)
+		goto err_manager;
+
+	mutex_init(&_manager->lock);
+	INIT_LIST_HEAD(&_manager->pools);
+
+	_manager->options.max_size = max_pages;
+	_manager->options.small = SMALL_ALLOCATION;
+	_manager->options.alloc_size = NUM_PAGES_TO_ALLOC;
+
+	/* This takes care of auto-freeing the _manager */
+	ret = kobject_init_and_add(&_manager->kobj, &ttm_pool_kobj_type,
+				   &glob->kobj, "dma_pool");
+	if (unlikely(ret != 0)) {
+		kobject_put(&_manager->kobj);
+		goto err;
+	}
+	ttm_dma_pool_mm_shrink_init(_manager);
+	return 0;
+err_manager:
+	kfree(_manager);
+	_manager = NULL;
+err:
+	return ret;
+}
+void ttm_dma_page_alloc_fini(void)
+{
+	struct device_pools *p, *t;
+
+	printk(KERN_INFO TTM_PFX "Finalizing DMA pool allocator.\n");
+	ttm_dma_pool_mm_shrink_fini(_manager);
+
+	list_for_each_entry_safe_reverse(p, t, &_manager->pools, pools) {
+		dev_dbg(p->dev, "(%s:%d) Freeing.\n", p->pool->name,
+			current->pid);
+		WARN_ON(devres_destroy(p->dev, ttm_dma_pool_release,
+			ttm_dma_pool_match, p->pool));
+		ttm_dma_free_pool(p->dev, p->pool->type);
+	}
+	kobject_put(&_manager->kobj);
+	_manager = NULL;
+}
+
+int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data)
+{
+	struct device_pools *p;
+	struct dma_pool *pool = NULL;
+	char *h[] = {"pool", "refills", "pages freed",
"inuse", "available",
+		     "name", "virt", "busaddr"};
+
+	if (!_manager) {
+		seq_printf(m, "No pool allocator running.\n");
+		return 0;
+	}
+	seq_printf(m, "%13s %12s %13s %8s %8s %8s\n",
+		   h[0], h[1], h[2], h[3], h[4], h[5]);
+	mutex_lock(&_manager->lock);
+	list_for_each_entry(p, &_manager->pools, pools) {
+		struct device *dev = p->dev;
+		if (!dev)
+			continue;
+		pool = p->pool;
+		seq_printf(m, "%13s %12ld %13ld %8d %8d %8s\n",
+				pool->name, pool->nrefills,
+				pool->nfrees, pool->npages_in_use,
+				pool->npages_free,
+				pool->dev_name);
+	}
+	mutex_unlock(&_manager->lock);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ttm_dma_page_alloc_debugfs);
+bool ttm_dma_override(struct ttm_backend_func *be)
+{
+	if (swiotlb_nr_tbl() && be) {
+		be->get_pages = &ttm_dma_get_pages;
+		be->put_pages = &ttm_dma_put_pages;
+		return true;
+	}
+	return false;
+}
+EXPORT_SYMBOL_GPL(ttm_dma_override);
diff --git a/include/drm/ttm/ttm_page_alloc.h b/include/drm/ttm/ttm_page_alloc.h
index 0aaac39..9c52fb7 100644
--- a/include/drm/ttm/ttm_page_alloc.h
+++ b/include/drm/ttm/ttm_page_alloc.h
@@ -29,6 +29,37 @@
 #include "ttm_bo_driver.h"
 #include "ttm_memory.h"
 
+#ifdef CONFIG_SWIOTLB
+extern bool ttm_dma_override(struct ttm_backend_func *be);
+
+/**
+ * Initialize pool allocator.
+ */
+int ttm_dma_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages);
+/**
+ * Free pool allocator.
+ */
+void ttm_dma_page_alloc_fini(void);
+/**
+ * Output the state of pools to debugfs file
+ */
+extern int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data);
+#else
+static inline bool ttm_dma_override(struct ttm_backend_func *be)
+{
+	return false;
+}
+static inline int ttm_dma_page_alloc_init(struct ttm_mem_global *glob,
+					  unsigned max_pages)
+{
+	return -ENODEV;
+}
+static inline void ttm_dma_page_alloc_fini(void) { return; }
+static inline int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data)
+{
+	return 0;
+}
+#endif
 /**
  * Get count number of pages from pool to pages list.
  *
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 09/11] ttm: Add ''no_dma'' parameter to turn the TTM DMA pool off during runtime.

The TTM DMA only gets turned on when the SWIOTLB is enabled - but
we might also want to turn it off when SWIOTLB is on to
use the non-DMA TTM pool code.

In the future this parameter can be removed.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/ttm/ttm_memory.c         |    7 +++++--
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c |    6 +++++-
 include/drm/ttm/ttm_page_alloc.h         |    2 ++
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_memory.c b/drivers/gpu/drm/ttm/ttm_memory.c
index 6d24fe2..f883a28 100644
--- a/drivers/gpu/drm/ttm/ttm_memory.c
+++ b/drivers/gpu/drm/ttm/ttm_memory.c
@@ -395,7 +395,9 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
 		       zone->name, (unsigned long long) zone->max_mem >> 10);
 	}
 	ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
-	ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
+	if (!ttm_dma_disable)
+		ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem /
+					(2*PAGE_SIZE));
 	return 0;
 out_no_zone:
 	ttm_mem_global_release(glob);
@@ -411,7 +413,8 @@ void ttm_mem_global_release(struct ttm_mem_global *glob)
 	/* let the page allocator first stop the shrink work. */
 	ttm_page_alloc_fini();
 
-	ttm_dma_page_alloc_fini();
+	if (!ttm_dma_disable)
+		ttm_dma_page_alloc_fini();
 	flush_workqueue(glob->swap_queue);
 	destroy_workqueue(glob->swap_queue);
 	glob->swap_queue = NULL;
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index d6d8240..a5be62e 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -51,6 +51,10 @@
 #include <asm/agp.h>
 #endif
 
+int __read_mostly ttm_dma_disable;
+MODULE_PARM_DESC(no_dma, "Disable TTM DMA pool");
+module_param_named(no_dma, ttm_dma_disable, bool, S_IRUGO);
+
 #define NUM_PAGES_TO_ALLOC		(PAGE_SIZE/sizeof(struct page *))
 #define SMALL_ALLOCATION		16
 #define FREE_ALL_PAGES			(~0U)
@@ -1384,7 +1388,7 @@ int ttm_dma_page_alloc_debugfs(struct seq_file *m, void
*data)
 EXPORT_SYMBOL_GPL(ttm_dma_page_alloc_debugfs);
 bool ttm_dma_override(struct ttm_backend_func *be)
 {
-	if (swiotlb_nr_tbl() && be) {
+	if (swiotlb_nr_tbl() && be && !ttm_dma_disable) {
 		be->get_pages = &ttm_dma_get_pages;
 		be->put_pages = &ttm_dma_put_pages;
 		return true;
diff --git a/include/drm/ttm/ttm_page_alloc.h b/include/drm/ttm/ttm_page_alloc.h
index 9c52fb7..daf5db6 100644
--- a/include/drm/ttm/ttm_page_alloc.h
+++ b/include/drm/ttm/ttm_page_alloc.h
@@ -32,6 +32,7 @@
 #ifdef CONFIG_SWIOTLB
 extern bool ttm_dma_override(struct ttm_backend_func *be);
 
+extern int ttm_dma_disable;
 /**
  * Initialize pool allocator.
  */
@@ -45,6 +46,7 @@ void ttm_dma_page_alloc_fini(void);
  */
 extern int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data);
 #else
+#define ttm_dma_disable (1)
 static inline bool ttm_dma_override(struct ttm_backend_func *be)
 {
 	return false;
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 10/11] nouveau/ttm/dma: Enable the TTM DMA pool if device can only do 32-bit DMA.

If the card is capable of more than 32-bit, then use the default
TTM page pool code which allocates from anywhere in the memory.

Note: If the ''ttm.no_dma'' parameter is set, the override is
ignored
and the default TTM pool is used.

CC: Ben Skeggs <bskeggs@redhat.com>
CC: Francisco Jerez <currojerez@riseup.net>
CC: Dave Airlie <airlied@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/nouveau/nouveau_debugfs.c |    1 +
 drivers/gpu/drm/nouveau/nouveau_sgdma.c   |    5 +++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
index 8e15923..f52c2db 100644
--- a/drivers/gpu/drm/nouveau/nouveau_debugfs.c
+++ b/drivers/gpu/drm/nouveau/nouveau_debugfs.c
@@ -178,6 +178,7 @@ static struct drm_info_list nouveau_debugfs_list[] = {
 	{ "memory", nouveau_debugfs_memory_info, 0, NULL },
 	{ "vbios.rom", nouveau_debugfs_vbios_image, 0, NULL },
 	{ "ttm_page_pool", ttm_page_alloc_debugfs, 0, NULL },
+	{ "ttm_dma_page_pool", ttm_dma_page_alloc_debugfs, 0, NULL },
 };
 #define NOUVEAU_DEBUGFS_ENTRIES ARRAY_SIZE(nouveau_debugfs_list)
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index 9b570c3..e0d4474 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -2,6 +2,7 @@
 #include "nouveau_drv.h"
 #include <linux/pagemap.h>
 #include <linux/slab.h>
+#include <ttm/ttm_page_alloc.h>
 
 #define NV_CTXDMA_PAGE_SHIFT 12
 #define NV_CTXDMA_PAGE_SIZE  (1 << NV_CTXDMA_PAGE_SHIFT)
@@ -417,6 +418,10 @@ nouveau_sgdma_init_ttm(struct drm_device *dev)
 	nvbe->dev = dev;
 
 	nvbe->backend.func = dev_priv->gart_info.func;
+	if ((dev->dev) && (dma_get_mask(dev->dev) <=
DMA_BIT_MASK(32))) {
+		if (ttm_dma_override(nvbe->backend.func))
+			nvbe->backend.dev = dev->dev;
+	}
 	return &nvbe->backend;
 }
 
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-19 22:19 UTC

head link

[Xen-devel] [PATCH 11/11] radeon/ttm/dma: Enable the TTM DMA pool if the device can only do 32-bit.

. with the exception that we do not handle the AGP case. We only
deal with PCIe cards such as ATI ES1000 or HD3200 that have been
detected to only do DMA up to 32-bits.

Note: If the ttm.no_dma is set, this operation will not override
the TTM page pool to use the DMA one.

CC: Dave Airlie <airlied@redhat.com>
CC: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 drivers/gpu/drm/radeon/radeon_ttm.c |   19 +++++++++++++++----
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 60125dd..2e7419f 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -761,6 +761,10 @@ struct ttm_backend *radeon_ttm_backend_create(struct
radeon_device *rdev)
 	gtt->backend.bdev = &rdev->mman.bdev;
 	gtt->backend.flags = 0;
 	gtt->backend.func = &radeon_backend_func;
+	if (rdev->need_dma32) {
+		if (ttm_dma_override(gtt->backend.func))
+			gtt->backend.dev = rdev->dev;
+	}
 	gtt->rdev = rdev;
 	gtt->pages = NULL;
 	gtt->num_pages = 0;
@@ -792,8 +796,8 @@ static int radeon_mm_dump_table(struct seq_file *m, void
*data)
 static int radeon_ttm_debugfs_init(struct radeon_device *rdev)
 {
 #if defined(CONFIG_DEBUG_FS)
-	static struct drm_info_list radeon_mem_types_list[RADEON_DEBUGFS_MEM_TYPES+1];
-	static char radeon_mem_types_names[RADEON_DEBUGFS_MEM_TYPES+1][32];
+	static struct drm_info_list radeon_mem_types_list[RADEON_DEBUGFS_MEM_TYPES+2];
+	static char radeon_mem_types_names[RADEON_DEBUGFS_MEM_TYPES+2][32];
 	unsigned i;
 
 	for (i = 0; i < RADEON_DEBUGFS_MEM_TYPES; i++) {
@@ -815,8 +819,15 @@ static int radeon_ttm_debugfs_init(struct radeon_device
*rdev)
 	radeon_mem_types_list[i].name = radeon_mem_types_names[i];
 	radeon_mem_types_list[i].show = &ttm_page_alloc_debugfs;
 	radeon_mem_types_list[i].driver_features = 0;
-	radeon_mem_types_list[i].data = NULL;
-	return radeon_debugfs_add_files(rdev, radeon_mem_types_list,
RADEON_DEBUGFS_MEM_TYPES+1);
+	radeon_mem_types_list[i++].data = NULL;
+	if (rdev->need_dma32) {
+		sprintf(radeon_mem_types_names[i], "ttm_dma_page_pool");
+		radeon_mem_types_list[i].name = radeon_mem_types_names[i];
+		radeon_mem_types_list[i].show = &ttm_dma_page_alloc_debugfs;
+		radeon_mem_types_list[i].driver_features = 0;
+		radeon_mem_types_list[i++].data = NULL;
+	}
+	return radeon_debugfs_add_files(rdev, radeon_mem_types_list, i);
 
 #endif
 	return 0;
-- 
1.7.6.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-20 01:38 UTC

head link

[Xen-devel] Re: [PATCH] TTM DMA pool v2.1

On Wed, Oct 19, 2011 at 06:19:21PM -0400, Konrad Rzeszutek Wilk wrote:

Hmm, seems a part of this got eaten by the Internet monsters.

Since v2.0: [not posted]
 - Redid the registration/override to be tightly integrated with the
   ''struct ttm_backend_func'' per Thomas''s suggestion.
Since v1.9: [not posted]
 - Performance improvements - it was doing O(n^2) instead of O(n) on certain
   workloads.
Since v1.8: [lwn.net/Articles/458724/]
 - Removed swiotlb_enabled and used swiotlb_nr_tbl.
 - Added callback for changing cache types.
Since v1.7: [https://lkml.org/lkml/2011/8/30/460]
 - Fixed checking the DMA address in radeon/nouveau code.
Since v1: [http://lwn.net/Articles/456246/]
 - Ran it through the gauntlet of SubmitChecklist and fixed issues
 - Made radeon/nouveau driver set coherent_dma (which is required for dmapool)
> [.. and this is what I said in v1 post]:
> 
> Way back in January this patchset:
> http://lists.freedesktop.org/archives/dri-devel/2011-January/006905.html
> was merged in, but pieces of it had to be reverted b/c they did not
> work properly under PowerPC, ARM, and when swapping out pages to disk.
> 
> After a bit of discussion on the mailing list
> http://marc.info/?i=4D769726.2030307@shipmail.org I started working on it,
but
> got waylaid by other things .. and finally I am able to post the RFC
patches.
> 
> There was a lot of discussion about it and I am not sure if I captured
> everybody''s thoughts - if I did not - that is _not_ intentional -
it has just
> been quite some time..
> 
> Anyhow .. the patches explore what the "lib/dmapool.c" does -
which is to have a
> DMA pool that the device has associated with. I kind of married that code
> along with drivers/gpu/drm/ttm/ttm_page_alloc.c to create a TTM DMA pool
code.
> The end result is DMA pool with extra features: can do write-combine,
uncached,
> writeback (and tracks them and sets back to WB when freed); tracks
"cached"
> pages that don''t really need to be returned to a pool; and hooks
up to
> the shrinker code so that the pools can be shrunk.
> 
> If you guys think this set of patches make sense  - my future plans were
>  1) Get this in large crowd of testing .. and if it works for a kernel
release
>  2) to move a bulk of this in the lib/dmapool.c (I spoke with Matthew
Wilcox
>     about it and he is OK as long as I don''t introduce performance
regressions).
> 
> But before I do any of that a second set of eyes taking a look at these
> patches would be most welcome.
> 
> In regards to testing, I''ve been running them non-stop for the
last month.
> (and found some issues which I''ve fixed up) - and been quite happy
with how
> they work.
> 
> Michel (thanks!) took a spin of the patches on his PowerPC and they did not
> cause any regressions (wheew).
> 
> The patches are also located in a git tree:
> 
>  git://oss.oracle.com/git/kwilk/xen.git devel/ttm.dma_pool.v2.1
> 
> 
> Konrad Rzeszutek Wilk (11):
>       swiotlb: Expose swiotlb_nr_tlb function to modules
>       nouveau/radeon: Set coherent DMA mask
>       ttm/radeon/nouveau: Check the DMA address from TTM against known
value.
>       ttm: Wrap ttm_[put|get]_pages and extract GFP_* and caching states
from ''struct ttm_tt''
>       ttm: Get rid of temporary scaffolding
>       ttm/driver: Expand ttm_backend_func to include two overrides for TTM
page pool.
>       ttm: Do not set the ttm->be to NULL before calling the TTM page
pool to free pages.
>       ttm: Provide DMA aware TTM page pool code.
>       ttm: Add ''no_dma'' parameter to turn the TTM DMA
pool off during runtime.
>       nouveau/ttm/dma: Enable the TTM DMA pool if device can only do 32-bit
DMA.
>       radeon/ttm/dma: Enable the TTM DMA pool if the device can only do
32-bit.
> 
>  drivers/gpu/drm/nouveau/nouveau_debugfs.c |    1 +
>  drivers/gpu/drm/nouveau/nouveau_mem.c     |    5 +
>  drivers/gpu/drm/nouveau/nouveau_sgdma.c   |    8 +-
>  drivers/gpu/drm/radeon/radeon_device.c    |    6 +
>  drivers/gpu/drm/radeon/radeon_gart.c      |    4 +-
>  drivers/gpu/drm/radeon/radeon_ttm.c       |   19 +-
>  drivers/gpu/drm/ttm/Makefile              |    3 +
>  drivers/gpu/drm/ttm/ttm_memory.c          |    5 +
>  drivers/gpu/drm/ttm/ttm_page_alloc.c      |  108 ++-
>  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c  | 1446
+++++++++++++++++++++++++++++
>  drivers/gpu/drm/ttm/ttm_tt.c              |   21 +-
>  drivers/xen/swiotlb-xen.c                 |    2 +-
>  include/drm/ttm/ttm_bo_driver.h           |   31 +
>  include/drm/ttm/ttm_page_alloc.h          |   53 +-
>  include/linux/swiotlb.h                   |    2 +-
>  lib/swiotlb.c                             |    5 +-
>  16 files changed, 1637 insertions(+), 82 deletions(-)
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

FUJITA Tomonori

2011-Oct-22 04:49 UTC

head link

[Xen-devel] Re: [PATCH 01/11] swiotlb: Expose swiotlb_nr_tlb function to modules

On Wed, 19 Oct 2011 18:19:22 -0400
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> As a mechanism to detect whether SWIOTLB is enabled or not.
> We also fix the spelling - it was swioltb instead of
> swiotlb.
> 
> CC: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
> [v1: Ripped out swiotlb_enabled]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/xen/swiotlb-xen.c |    2 +-
>  include/linux/swiotlb.h   |    2 +-
>  lib/swiotlb.c             |    5 +++--
>  3 files changed, 5 insertions(+), 4 deletions(-)
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Thomas Hellstrom

2011-Oct-22 09:40 UTC

head link

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

Konrad,

I was hoping that we could get rid of the dma_address shuffling into 
core TTM,
like I mentioned in the review. From what I can tell it''s now only used
in the backend and
core ttm doesn''t care about it.

Is there a particular reason we''re still passing it around?

Thanks,
/Thomas




On 10/20/2011 12:19 AM, Konrad Rzeszutek Wilk wrote:> The two overrides will be choosen by the backends whether they
> want to use a different TTM page pool than the default.
>
> If the backend does not choose a new override, the default one
> will be used.
>
> Signed-off-by: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com>
> ---
>   drivers/gpu/drm/ttm/ttm_page_alloc.c |   10 +++++++---
>   include/drm/ttm/ttm_bo_driver.h      |   31
+++++++++++++++++++++++++++++++
>   2 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index 24c0340..360afb3 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -861,13 +861,17 @@ EXPORT_SYMBOL(ttm_page_alloc_debugfs);
>   int ttm_get_pages(struct ttm_tt *ttm, struct list_head *pages,
>   		  unsigned count, dma_addr_t *dma_address)
>   {
> +	if (ttm->be&&  ttm->be->func&& 
ttm->be->func->get_pages)
> +		return ttm->be->func->get_pages(ttm, pages, count,
dma_address);
>   	return __ttm_get_pages(pages, ttm->page_flags, ttm->caching_state,
>   				count, dma_address);
>   }
> -{
>   void ttm_put_pages(struct ttm_tt *ttm, struct list_head *pages,
>   		   unsigned page_count, dma_addr_t *dma_address)
>   {
> -	__ttm_put_pages(pages, page_count, ttm->page_flags,
ttm->caching_state,
> -			dma_address);
> +	if (ttm->be&&  ttm->be->func&& 
ttm->be->func->put_pages)
> +		ttm->be->func->put_pages(ttm, pages, page_count, dma_address);
> +	else
> +		__ttm_put_pages(pages, page_count, ttm->page_flags,
> +				ttm->caching_state, dma_address);
>   }
> diff --git a/include/drm/ttm/ttm_bo_driver.h
b/include/drm/ttm/ttm_bo_driver.h
> index 09af2d7..1826c3b 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -100,6 +100,34 @@ struct ttm_backend_func {
>   	 * Destroy the backend.
>   	 */
>   	void (*destroy) (struct ttm_backend *backend);
> +
> +	/**
> +	 * ttm_get_pages override. The backend can override the default
> +	 * TTM page pool code with a different one.
> +	 *
> +	 * Get count number of pages from pool to pages list.
> +	 *
> +	 * @ttm: ttm which contains flags for page allocation and caching state.
> +	 * @pages: head of empty linked list where pages are filled.
> +	 * @dma_address: The DMA (bus) address of pages
> +	 */
> +	int (*get_pages) (struct ttm_tt *ttm, struct list_head *pages,
> +			  unsigned count, dma_addr_t *dma_address);
> +
> +	/**
> +	 * ttm_put_pages override. The backend can override the default
> +	 * TTM page pool code with a different implementation.
> +	 *
> +	 * Put linked list of pages to pool.
> +	 *
> +	 * @ttm: ttm which contains flags for page allocation and caching state.
> +	 * @pages: list of pages to free.
> +	 * @page_count: number of pages in the list. Zero can be passed for
> +	 * unknown count.
> +	 * @dma_address: The DMA (bus) address of pages
> +	 */
> +	void (*put_pages) (struct ttm_tt *ttm, struct list_head *pages,
> +			   unsigned page_count, dma_addr_t *dma_address);
>   };
>
>   /**
> @@ -109,6 +137,8 @@ struct ttm_backend_func {
>    * @flags: For driver use.
>    * @func: Pointer to a struct ttm_backend_func that describes
>    * the backend methods.
> + * @dev: Pointer to a struct device which can be used by the TTM
> + *  [get|put)_pages overrides in ''struct
ttm_backend_func''.
>    *
>    */
>
> @@ -116,6 +146,7 @@ struct ttm_backend {
>   	struct ttm_bo_device *bdev;
>   	uint32_t flags;
>   	struct ttm_backend_func *func;
> +	struct device *dev;
>   };
>
>   #define TTM_PAGE_FLAG_USER            (1<<  1)
>    



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-24 17:27 UTC

head link

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

On Sat, Oct 22, 2011 at 11:40:54AM +0200, Thomas Hellstrom
wrote:> Konrad,
> 
> I was hoping that we could get rid of the dma_address shuffling into
> core TTM,
> like I mentioned in the review. From what I can tell it''s now only
> used in the backend and
> core ttm doesn''t care about it.
> 
> Is there a particular reason we''re still passing it around?
Yes - and I should have addressed that in the writeup but forgot, sorry about
that.

So initially I thought you meant this:

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 360afb3..06ef048 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -662,8 +662,7 @@ out:
 
 /* Put all pages in pages list to correct pool to wait for reuse */
 static void __ttm_put_pages(struct list_head *pages, unsigned page_count,
-			    int flags, enum ttm_caching_state cstate,
-			    dma_addr_t *dma_address)
+			    int flags, enum ttm_caching_state cstate)
 {
 	unsigned long irq_flags;
 	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
@@ -707,8 +706,7 @@ static void __ttm_put_pages(struct list_head *pages,
unsigned page_count,
  * cached pages.
  */
 static int __ttm_get_pages(struct list_head *pages, int flags,
-			   enum ttm_caching_state cstate, unsigned count,
-			   dma_addr_t *dma_address)
+			   enum ttm_caching_state cstate, unsigned count)
 {
 	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
 	struct page *p = NULL;
@@ -864,7 +862,7 @@ int ttm_get_pages(struct ttm_tt *ttm, struct list_head
*pages,
 	if (ttm->be && ttm->be->func &&
ttm->be->func->get_pages)
 		return ttm->be->func->get_pages(ttm, pages, count, dma_address);
 	return __ttm_get_pages(pages, ttm->page_flags, ttm->caching_state,
-				count, dma_address);
+				count)
 }
 void ttm_put_pages(struct ttm_tt *ttm, struct list_head *pages,
 		   unsigned page_count, dma_addr_t *dma_address)
@@ -873,5 +871,5 @@ void ttm_put_pages(struct ttm_tt *ttm, struct list_head
*pages,
 		ttm->be->func->put_pages(ttm, pages, page_count, dma_address);
 	else
 		__ttm_put_pages(pages, page_count, ttm->page_flags,
-				ttm->caching_state, dma_address);
+				ttm->caching_state)
 }
which is trivial (thought I have not compile tested it), but it should do it.

But I think you mean eliminate the dma_address handling completly in
ttm_page_alloc.c and ttm_tt.c.

For that there are couple of architectural issues I am not sure how to solve. 

There has to be some form of TTM<->[Radeon|Nouveau] lookup mechanism
to say: "here is a ''struct page *'', give me the bus
address". Currently
this is solved by keeping an array of DMA addresses along with the list
of pages. And passing the list and DMA address up the stack (and down)
from TTM up to the driver (when ttm->be->func->populate is called and
they
are handed off) does it. It does not break any API layering .. and the internal
TTM pool (non-DMA) can just ignore the dma_address altogether (see patch above).

But if we wanted to rip all mention of dma_addr from TTM, one immediate way
that comes to my mind is:

 1). Provide a new function in the ttm->be->func that would be called
''get_dma'' of:
     (int)( *get_dma)(struct list_head *pages, unsigned page_count, dma_addr_t
*dma_address)

     which would call the TTM DMA to search the internal list and find
''pages*''
     (which were just a microsecond ago allocated by calling
ttm->be->func->get_pages)
     and stick the bus address on the ''dma_address'' array.

 2). The radeon|nouveau driver would both call this if they decided to use the
     TTM DMA API. They would need to provide the newly allocated dma_address for
this
     call.
 3). Not sure how to wrap this in macros though - it looks as if both drivers
will
     be riddled with ''if (ttm->be->func->get_pages) {
private->dma_addr=kzalloc(...) } else {}''.
     But that is more an implemention problem..

.. While this idea looks correct, I am struck that it looks like it is breaking
the layering
of APIs, where the driver is reaching behind the TTM API and calling this extra
function?

Another idea is to transform the ''struct dma_addr *dma_addr''
to a ''void *override_p'' in
the ''struct ttm_tt''. That means still keeping the TTM API
layers seperate, and "passing"
the array of DMA address through the ''override_p'' array (which
would be allocated by TTM DMA
code). Something along these lines (not tested):


I like this more, but I haven''t actually tested it so not sure if it
works right?

diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index e0d4474..8760a04 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -23,7 +23,7 @@ struct nouveau_sgdma_be {
 static int
 nouveau_sgdma_populate(struct ttm_backend *be, unsigned long num_pages,
 		       struct page **pages, struct page *dummy_read_page,
-		       dma_addr_t *dma_addrs)
+		       void *override_p)
 {
 	struct nouveau_sgdma_be *nvbe = (struct nouveau_sgdma_be *)be;
 	struct drm_device *dev = nvbe->dev;
@@ -43,9 +43,9 @@ nouveau_sgdma_populate(struct ttm_backend *be, unsigned long
num_pages,
 
 	nvbe->nr_pages = 0;
 	while (num_pages--) {
-		if (dma_addrs[nvbe->nr_pages] != 0) {
-			nvbe->pages[nvbe->nr_pages] -					dma_addrs[nvbe->nr_pages];
+		dma_addr_t *ttm_dma = (dma_addr_t *)override_p;
+		if (ttm_dma && ttm_dma[nvbe->nr_pages] != 0) {
+			nvbe->pages[nvbe->nr_pages] = ttm_dma[nvbe->nr_pages];
 		 	nvbe->ttm_alloced[nvbe->nr_pages] = true;
 		} else {
 			nvbe->pages[nvbe->nr_pages] diff --git
a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c
index 068ba09..dc700f4 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -181,7 +181,7 @@ int radeon_gart_bind(struct radeon_device *rdev, unsigned
offset,
 	p = t / (PAGE_SIZE / RADEON_GPU_PAGE_SIZE);
 
 	for (i = 0; i < pages; i++, p++) {
-		if (dma_addr[i] != 0) {
+		if (dma_addr && dma_addr[i] != 0) {
 			rdev->gart.ttm_alloced[p] = true;
 			rdev->gart.pages_addr[p] = dma_addr[i];
 		} else {
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 2e7419f..690545d 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -661,7 +661,7 @@ struct radeon_ttm_backend {
 	unsigned long			num_pages;
 	struct page			**pages;
 	struct page			*dummy_read_page;
-	dma_addr_t			*dma_addrs;
+	dma_addr_t			*dma_addrs; /* Can be NULL */
 	bool				populated;
 	bool				bound;
 	unsigned			offset;
@@ -671,13 +671,13 @@ static int radeon_ttm_backend_populate(struct ttm_backend
*backend,
 				       unsigned long num_pages,
 				       struct page **pages,
 				       struct page *dummy_read_page,
-				       dma_addr_t *dma_addrs)
+				       void *override_p)
 {
 	struct radeon_ttm_backend *gtt;
 
 	gtt = container_of(backend, struct radeon_ttm_backend, backend);
 	gtt->pages = pages;
-	gtt->dma_addrs = dma_addrs;
+	gtt->dma_addrs = (dma_addr_t *)override_p;
 	gtt->num_pages = num_pages;
 	gtt->dummy_read_page = dummy_read_page;
 	gtt->populated = true;
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index 360afb3..458727a 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -662,8 +662,7 @@ out:
 
 /* Put all pages in pages list to correct pool to wait for reuse */
 static void __ttm_put_pages(struct list_head *pages, unsigned page_count,
-			    int flags, enum ttm_caching_state cstate,
-			    dma_addr_t *dma_address)
+			    int flags, enum ttm_caching_state cstate)
 {
 	unsigned long irq_flags;
 	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
@@ -707,8 +706,7 @@ static void __ttm_put_pages(struct list_head *pages,
unsigned page_count,
  * cached pages.
  */
 static int __ttm_get_pages(struct list_head *pages, int flags,
-			   enum ttm_caching_state cstate, unsigned count,
-			   dma_addr_t *dma_address)
+			   enum ttm_caching_state cstate, unsigned count)
 {
 	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
 	struct page *p = NULL;
@@ -766,7 +764,7 @@ static int __ttm_get_pages(struct list_head *pages, int
flags,
 			printk(KERN_ERR TTM_PFX
 			       "Failed to allocate extra pages "
 			       "for large request.");
-			__ttm_put_pages(pages, 0, flags, cstate, NULL);
+			__ttm_put_pages(pages, 0, flags, cstate);
 			return r;
 		}
 	}
@@ -859,19 +857,19 @@ int ttm_page_alloc_debugfs(struct seq_file *m, void *data)
 }
 EXPORT_SYMBOL(ttm_page_alloc_debugfs);
 int ttm_get_pages(struct ttm_tt *ttm, struct list_head *pages,
-		  unsigned count, dma_addr_t *dma_address)
+		  unsigned count, int index)
 {
 	if (ttm->be && ttm->be->func &&
ttm->be->func->get_pages)
-		return ttm->be->func->get_pages(ttm, pages, count, dma_address);
+		return ttm->be->func->get_pages(ttm, pages, count, index);
 	return __ttm_get_pages(pages, ttm->page_flags, ttm->caching_state,
-				count, dma_address);
+				count);
 }
 void ttm_put_pages(struct ttm_tt *ttm, struct list_head *pages,
-		   unsigned page_count, dma_addr_t *dma_address)
+		   unsigned page_count, int index)
 {
 	if (ttm->be && ttm->be->func &&
ttm->be->func->put_pages)
-		ttm->be->func->put_pages(ttm, pages, page_count, dma_address);
+		ttm->be->func->put_pages(ttm, pages, page_count, index);
 	else
 		__ttm_put_pages(pages, page_count, ttm->page_flags,
-				ttm->caching_state, dma_address);
+				ttm->caching_state);
 }
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index a5be62e..08e182f 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -45,6 +45,7 @@
 #include <linux/atomic.h>
 #include <linux/device.h>
 #include <linux/kthread.h>
+#include <drm/drm_mem_util.h>
 #include "ttm/ttm_bo_driver.h"
 #include "ttm/ttm_page_alloc.h"
 #ifdef TTM_HAS_AGP
@@ -1097,7 +1098,7 @@ out:
  * cached pages. On failure will hold the negative return value (-ENOMEM, etc).
  */
 int ttm_dma_get_pages(struct ttm_tt *ttm, struct list_head *pages,
-		      unsigned count, dma_addr_t *dma_address)
+		      unsigned count, int idx)
 
 {
 	int r = -ENOMEM;
@@ -1105,6 +1106,11 @@ int ttm_dma_get_pages(struct ttm_tt *ttm, struct
list_head *pages,
 	gfp_t gfp_flags;
 	enum pool_type type;
 	struct device *dev = ttm->be->dev;
+	dma_addr_t *dma_address;
+
+	/* We _MUST_ have a proper index value. */
+	if (WARN_ON(idx < 0));
+		return -EINVAL;
 
 	type = ttm_to_type(ttm->page_flags, ttm->caching_state);
 
@@ -1129,6 +1135,7 @@ int ttm_dma_get_pages(struct ttm_tt *ttm, struct list_head
*pages,
 			cstate);
 	}
 #endif
+ 	dma_address = &((dma_addr_t *)ttm->override_p)[idx];
 	/* Take pages out of a pool (if applicable) */
 	r = ttm_dma_pool_get_pages(pool, pages, dma_address, count);
 	/* clear the pages coming from the pool if requested */
@@ -1201,7 +1208,7 @@ static int ttm_dma_pool_get_num_unused_pages(void)
 
 /* Put all pages in pages list to correct pool to wait for reuse */
 void ttm_dma_put_pages(struct ttm_tt *ttm, struct list_head *pages,
-		       unsigned page_count, dma_addr_t *dma_address)
+		       unsigned page_count, int idx)
 {
 	struct dma_pool *pool;
 	enum pool_type type;
@@ -1230,9 +1237,12 @@ void ttm_dma_put_pages(struct ttm_tt *ttm, struct
list_head *pages,
 
 	count = ttm_dma_put_pages_in_pool(pool, pages, page_count, is_cached);
 
-	for (i = 0; i < count; i++)
-		dma_address[i] = 0;
-
+	/* Optional. */
+	if (idx >= 0) {
+ 		dma_addr_t *dma_address = &((dma_addr_t *)ttm->override_p)[idx];
+		for (i = 0; i < count; i++)
+			dma_address[i] = 0;
+	}
 	spin_lock_irqsave(&pool->lock, irq_flags);
 	pool->npages_in_use -= count;
 	if (is_cached)
@@ -1386,11 +1396,27 @@ int ttm_dma_page_alloc_debugfs(struct seq_file *m, void
*data)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(ttm_dma_page_alloc_debugfs);
+
+static int ttm_dma_alloc_priv(struct ttm_tt *ttm)
+{
+	ttm->override_p = drm_calloc_large(ttm->num_pages, sizeof(dma_addr_t
*));
+	if (WARN_ON(!ttm->override_p))
+		return -ENOMEM;
+	return 0;
+
+}
+static void ttm_dma_free_priv(struct ttm_tt *ttm)
+{
+	drm_free_large(ttm->override_p);
+	ttm->override_p = NULL;
+}
 bool ttm_dma_override(struct ttm_backend_func *be)
 {
 	if (swiotlb_nr_tbl() && be && !ttm_dma_disable) {
 		be->get_pages = &ttm_dma_get_pages;
 		be->put_pages = &ttm_dma_put_pages;
+		be->alloc_priv = &ttm_dma_alloc_priv;
+		be->free_priv = &ttm_dma_free_priv;
 		return true;
 	}
 	return false;
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 31ae359..0f0d57f 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -50,16 +50,16 @@ static int ttm_tt_swapin(struct ttm_tt *ttm);
 static void ttm_tt_alloc_page_directory(struct ttm_tt *ttm)
 {
 	ttm->pages = drm_calloc_large(ttm->num_pages, sizeof(*ttm->pages));
-	ttm->dma_address = drm_calloc_large(ttm->num_pages,
-					    sizeof(*ttm->dma_address));
+	if (ttm->be && ttm->be->func &&
ttm->be->func->alloc_priv)
+		ttm->be->func->alloc_priv(ttm);
 }
 
 static void ttm_tt_free_page_directory(struct ttm_tt *ttm)
 {
 	drm_free_large(ttm->pages);
 	ttm->pages = NULL;
-	drm_free_large(ttm->dma_address);
-	ttm->dma_address = NULL;
+	if (ttm->be && ttm->be->func &&
ttm->be->func->free_priv)
+		ttm->be->func->free_priv(ttm);
 }
 
 static void ttm_tt_free_user_pages(struct ttm_tt *ttm)
@@ -110,7 +110,7 @@ static struct page *__ttm_tt_get_page(struct ttm_tt *ttm,
int index)
 
 		INIT_LIST_HEAD(&h);
 
-		ret = ttm_get_pages(ttm, &h, 1, &ttm->dma_address[index]);
+		ret = ttm_get_pages(ttm, &h, 1, index);
 
 		if (ret != 0)
 			return NULL;
@@ -169,7 +169,7 @@ int ttm_tt_populate(struct ttm_tt *ttm)
 	}
 
 	be->func->populate(be, ttm->num_pages, ttm->pages,
-			   ttm->dummy_read_page, ttm->dma_address);
+			   ttm->dummy_read_page, ttm->override_p);
 	ttm->state = tt_unbound;
 	return 0;
 }
@@ -303,7 +303,7 @@ static void ttm_tt_free_alloced_pages(struct ttm_tt *ttm,
bool call_clear)
 			count++;
 		}
 	}
-	ttm_put_pages(ttm, &h, count, ttm->dma_address);
+	ttm_put_pages(ttm, &h, count, 0 /* start at zero and go up to count */);
 	ttm->state = tt_unpopulated;
 	ttm->first_himem_page = ttm->num_pages;
 	ttm->last_lomem_page = -1;
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 1826c3b..771697a 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -58,7 +58,7 @@ struct ttm_backend_func {
 	int (*populate) (struct ttm_backend *backend,
 			 unsigned long num_pages, struct page **pages,
 			 struct page *dummy_read_page,
-			 dma_addr_t *dma_addrs);
+			 void *override_p);
 	/**
 	 * struct ttm_backend_func member clear
 	 *
@@ -109,10 +109,11 @@ struct ttm_backend_func {
 	 *
 	 * @ttm: ttm which contains flags for page allocation and caching state.
 	 * @pages: head of empty linked list where pages are filled.
-	 * @dma_address: The DMA (bus) address of pages
+	 * @idx: The current index in ttm->pages[] array. Negative means
+	 * don''t assume ttm->pages[idx] order matches the order in *pages.
 	 */
 	int (*get_pages) (struct ttm_tt *ttm, struct list_head *pages,
-			  unsigned count, dma_addr_t *dma_address);
+			  unsigned count, int idx);
 
 	/**
 	 * ttm_put_pages override. The backend can override the default
@@ -124,10 +125,17 @@ struct ttm_backend_func {
 	 * @pages: list of pages to free.
 	 * @page_count: number of pages in the list. Zero can be passed for
 	 * unknown count.
-	 * @dma_address: The DMA (bus) address of pages
+	 * @idx: The current index in the ttm->pages[] array. Negative means
+	 * don''t assume ttm->pages[idx] order matches the order in *pages.
 	 */
 	void (*put_pages) (struct ttm_tt *ttm, struct list_head *pages,
-			   unsigned page_count, dma_addr_t *dma_address);
+			   unsigned page_count, int idx);
+
+	/**
+	 * TODO: Flesh this out.
+	 */
+	int (*alloc_priv) (struct ttm_tt *ttm);
+	void (*free_priv) (struct ttm_tt *ttm);
 };
 
 /**
@@ -207,7 +215,7 @@ struct ttm_tt {
 		tt_unbound,
 		tt_unpopulated,
 	} state;
-	dma_addr_t *dma_address;
+	void *override_p;
 };
 
 #define TTM_MEMTYPE_FLAG_FIXED         (1 << 0)	/* Fixed (on-card) PCI
memory */
diff --git a/include/drm/ttm/ttm_page_alloc.h b/include/drm/ttm/ttm_page_alloc.h
index daf5db6..31e6079 100644
--- a/include/drm/ttm/ttm_page_alloc.h
+++ b/include/drm/ttm/ttm_page_alloc.h
@@ -68,12 +68,12 @@ static inline int ttm_dma_page_alloc_debugfs(struct seq_file
*m, void *data)
  * @ttm: ttm which contains flags for page allocation and caching state.
  * @pages: heado of empty linked list where pages are filled.
  * @count: number of pages to allocate.
- * @dma_address: The DMA (bus) address of pages - (by default zero).
+ * @idx: The current index in ttm->pages[idx]. Negative means ignore.
  */
 int ttm_get_pages(struct ttm_tt *ttm,
 		  struct list_head *pages,
 		  unsigned count,
-		  dma_addr_t *dma_address);
+		  int idx);
 /**
  * Put linked list of pages to pool.
  *
@@ -81,12 +81,12 @@ int ttm_get_pages(struct ttm_tt *ttm,
  * @pages: list of pages to free.
  * @page_count: number of pages in the list. Zero can be passed for unknown
  * count.
- * @dma_address: The DMA (bus) address of pages (by default zero).
+ * @idx: The current index in ttm->pages[idx]. Negative means ignore.
  */
 void ttm_put_pages(struct ttm_tt *ttm,
 		   struct list_head *pages,
 		   unsigned page_count,
-		   dma_addr_t *dma_address);
+		   int idx);
 /**
  * Initialize pool allocator.
  */

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Thomas Hellstrom

2011-Oct-24 17:42 UTC

head link

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

On 10/24/2011 07:27 PM, Konrad Rzeszutek Wilk wrote:> On Sat, Oct 22, 2011 at 11:40:54AM +0200, Thomas Hellstrom wrote:
>    
>> Konrad,
>>
>> I was hoping that we could get rid of the dma_address shuffling into
>> core TTM,
>> like I mentioned in the review. From what I can tell it''s now
only
>> used in the backend and
>> core ttm doesn''t care about it.
>>
>> Is there a particular reason we''re still passing it around?
>>      
> Yes - and I should have addressed that in the writeup but forgot, sorry
about that.
>
> So initially I thought you meant this:
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c
b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index 360afb3..06ef048 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -662,8 +662,7 @@ out:
>
>   /* Put all pages in pages list to correct pool to wait for reuse */
>   static void __ttm_put_pages(struct list_head *pages, unsigned page_count,
> -			    int flags, enum ttm_caching_state cstate,
> -			    dma_addr_t *dma_address)
> +			    int flags, enum ttm_caching_state cstate)
>   {
>   	unsigned long irq_flags;
>   	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
> @@ -707,8 +706,7 @@ static void __ttm_put_pages(struct list_head *pages,
unsigned page_count,
>    * cached pages.
>    */
>   static int __ttm_get_pages(struct list_head *pages, int flags,
> -			   enum ttm_caching_state cstate, unsigned count,
> -			   dma_addr_t *dma_address)
> +			   enum ttm_caching_state cstate, unsigned count)
>   {
>   	struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
>   	struct page *p = NULL;
> @@ -864,7 +862,7 @@ int ttm_get_pages(struct ttm_tt *ttm, struct list_head
*pages,
>   	if (ttm->be&&  ttm->be->func&& 
ttm->be->func->get_pages)
>   		return ttm->be->func->get_pages(ttm, pages, count,
dma_address);
>   	return __ttm_get_pages(pages, ttm->page_flags, ttm->caching_state,
> -				count, dma_address);
> +				count)
>   }
>   void ttm_put_pages(struct ttm_tt *ttm, struct list_head *pages,
>   		   unsigned page_count, dma_addr_t *dma_address)
> @@ -873,5 +871,5 @@ void ttm_put_pages(struct ttm_tt *ttm, struct list_head
*pages,
>   		ttm->be->func->put_pages(ttm, pages, page_count, dma_address);
>   	else
>   		__ttm_put_pages(pages, page_count, ttm->page_flags,
> -				ttm->caching_state, dma_address);
> +				ttm->caching_state)
>   }
> which is trivial (thought I have not compile tested it), but it should do
it.
>
> But I think you mean eliminate the dma_address handling completly in
> ttm_page_alloc.c and ttm_tt.c.
>
> For that there are couple of architectural issues I am not sure how to
solve.
>
> There has to be some form of TTM<->[Radeon|Nouveau] lookup mechanism
> to say: "here is a ''struct page *'', give me the bus
address". Currently
> this is solved by keeping an array of DMA addresses along with the list
> of pages. And passing the list and DMA address up the stack (and down)
> from TTM up to the driver (when ttm->be->func->populate is called
and they
> are handed off) does it. It does not break any API layering .. and the
internal
> TTM pool (non-DMA) can just ignore the dma_address altogether (see patch
above).
>
>    
I actually had something more simple in mind, but when tinking a bit 
deeper into it, it seems more complicated than I initially thought.

Namely that when we allocate pages from the ttm_backend, we actually 
populated it at the same time. be::populate would then not take a page 
array as an argument, and would actually be a no-op on many
drivers.

This makes us move towards struct ttm_tt consisting almost only of its 
backend, so that whole API should perhaps be looked at with new eyes.

So anyway, I''m fine with high level things as they are now, and the 
dma_addr issue can be looked at at a later time. If we could get a 
couple of extra eyes to review the code for style etc. would be great, 
because I have very little time the next couple of weeks.

/Thomas












_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Oct-24 18:18 UTC

head link

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

> >For that there are couple of architectural issues I am not sure how to
solve.
> >
> >There has to be some form of TTM<->[Radeon|Nouveau] lookup
mechanism
> >to say: "here is a ''struct page *'', give me the
bus address". Currently
> >this is solved by keeping an array of DMA addresses along with the list
> >of pages. And passing the list and DMA address up the stack (and down)
> >from TTM up to the driver (when ttm->be->func->populate is
called and they
> >are handed off) does it. It does not break any API layering .. and the
internal
> >TTM pool (non-DMA) can just ignore the dma_address altogether (see
patch above).
> >
> 
> I actually had something more simple in mind, but when tinking a bit
> deeper into it, it seems more complicated than I initially thought.
> 
> Namely that when we allocate pages from the ttm_backend, we actually
> populated it at the same time. be::populate would then not take a
> page array as an argument, and would actually be a no-op on many
> drivers.
The programming of the gfx''s MMU.. would be done via a new API call?
I think this needs a bit of whiteboarding for me to be sure I understand
you.> 
> This makes us move towards struct ttm_tt consisting almost only of
> its backend, so that whole API should perhaps be looked at with new
> eyes.
> 
> So anyway, I''m fine with high level things as they are now, and
the
Great!> dma_addr issue can be looked at at a later time. If we could get a
> couple of extra eyes to review the code for style etc. would be
Anybody in particular you can recommend that I can pester^H^H^H^H politely
ask :-)
> great, because I have very little time the next couple of weeks.
<nods> Understood. 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jerome Glisse

2011-Oct-31 19:37 UTC

head link

[Xen-devel] Re: [PATCH 08/11] ttm: Provide DMA aware TTM page pool code.

On Wed, Oct 19, 2011 at 06:19:29PM -0400, Konrad Rzeszutek Wilk
wrote:> In TTM world the pages for the graphic drivers are kept in three different
> pools: write combined, uncached, and cached (write-back). When the pages
> are used by the graphic driver the graphic adapter via its built in MMU
> (or AGP) programs these pages in. The programming requires the virtual
address
> (from the graphic adapter perspective) and the physical address (either
System RAM
> or the memory on the card) which is obtained using the pci_map_* calls
(which does the
> virtual to physical - or bus address translation). During the graphic
application''s
> "life" those pages can be shuffled around, swapped out to disk,
moved from the
> VRAM to System RAM or vice-versa. This all works with the existing TTM pool
code
> - except when we want to use the software IOTLB (SWIOTLB) code to
"map" the physical
> addresses to the graphic adapter MMU. We end up programming the bounce
buffer''s
> physical address instead of the TTM pool memory''s and get a
non-worky driver.
> There are two solutions:
> 1) using the DMA API to allocate pages that are screened by the DMA API, or
> 2) using the pci_sync_* calls to copy the pages from the bounce-buffer and
back.
> 
> This patch fixes the issue by allocating pages using the DMA API. The
second
> is a viable option - but it has performance drawbacks and potential
correctness
> issues - think of the write cache page being bounced (SWIOTLB->TTM), the
> WC is set on the TTM page and the copy from SWIOTLB not making it to the
TTM
> page until the page has been recycled in the pool (and used by another
application).
> 
> The bounce buffer does not get activated often - only in cases where we
have
> a 32-bit capable card and we want to use a page that is allocated above the
> 4GB limit. The bounce buffer offers the solution of copying the contents
> of that 4GB page to an location below 4GB and then back when the operation
has been
> completed (or vice-versa). This is done by using the
''pci_sync_*'' calls.
> Note: If you look carefully enough in the existing TTM page pool code you
will
> notice the GFP_DMA32 flag is used  - which should guarantee that the
provided page
> is under 4GB. It certainly is the case, except this gets ignored in two
cases:
>  - If user specifies ''swiotlb=force'' which bounces
_every_ page.
>  - If user is using a Xen''s PV Linux guest (which uses the SWIOTLB
and the
>    underlaying PFN''s aren''t necessarily under 4GB).
> 
> To not have this extra copying done the other option is to allocate the
pages
> using the DMA API so that there is not need to map the page and perform the
> expensive ''pci_sync_*'' calls.
> 
> This DMA API capable TTM pool requires for this the ''struct
device'' to
> properly call the DMA API. It also has to track the virtual and bus address
of
> the page being handed out in case it ends up being swapped out or
de-allocated -
> to make sure it is de-allocated using the proper''s
''struct device''.
> 
> Implementation wise the code keeps two lists: one that is attached to the
> ''struct device'' (via the dev->dma_pools list) and a
global one to be used when
> the ''struct device'' is unavailable (think shrinker code).
The global list can
> iterate over all of the ''struct device'' and its
associated dma_pool. The list
> in dev->dma_pools can only iterate the device''s dma_pool.
>                                                             /[struct
device_pool]\
>         /---------------------------------------------------| dev          
|
>        /                                            +-------| dma_pool     
|
>  /-----+------\                                    /       
\--------------------/
>  |struct device|     /-->[struct dma_pool for WC]</         /[struct
device_pool]\
>  | dma_pools   +----+                                     /-| dev          
|
>  |  ...        |    \--->[struct dma_pool for uncached]<-/--|
dma_pool           |
>  \-----+------/                                         /  
\--------------------/
>         \----------------------------------------------/
> [Two pools associated with the device (WC and UC), and the parallel list
> containing the ''struct dev'' and ''struct
dma_pool'' entries]
> 
> The maximum amount of dma pools a device can have is six: write-combined,
> uncached, and cached; then there are the DMA32 variants which are:
> write-combined dma32, uncached dma32, and cached dma32.
> 
> Currently this code only gets activated when any variant of the SWIOTLB
IOMMU
> code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xen
PV
> with PCI devices).
> 
> Tested-by: Michel Dänzer <michel@daenzer.net>
> [v1: Using swiotlb_nr_tbl instead of swiotlb_enabled]
> [v2: Major overhaul - added ''inuse_list'' to seperate used
from inuse and reorder
> the order of lists to get better performance.]
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  drivers/gpu/drm/ttm/Makefile             |    3 +
>  drivers/gpu/drm/ttm/ttm_memory.c         |    2 +
>  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 1394
++++++++++++++++++++++++++++++
>  include/drm/ttm/ttm_page_alloc.h         |   31 +
>  4 files changed, 1430 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> 
> diff --git a/drivers/gpu/drm/ttm/Makefile b/drivers/gpu/drm/ttm/Makefile
> index f3cf6f0..8300bc0 100644
> --- a/drivers/gpu/drm/ttm/Makefile
> +++ b/drivers/gpu/drm/ttm/Makefile
> @@ -7,4 +7,7 @@ ttm-y := ttm_agp_backend.o ttm_memory.o ttm_tt.o ttm_bo.o \
>  	ttm_object.o ttm_lock.o ttm_execbuf_util.o ttm_page_alloc.o \
>  	ttm_bo_manager.o
>  
> +ifeq ($(CONFIG_SWIOTLB),y)
> +ttm-y += ttm_page_alloc_dma.o
> +endif
>  obj-$(CONFIG_DRM_TTM) += ttm.o
> diff --git a/drivers/gpu/drm/ttm/ttm_memory.c
b/drivers/gpu/drm/ttm/ttm_memory.c
> index e70ddd8..6d24fe2 100644
> --- a/drivers/gpu/drm/ttm/ttm_memory.c
> +++ b/drivers/gpu/drm/ttm/ttm_memory.c
> @@ -395,6 +395,7 @@ int ttm_mem_global_init(struct ttm_mem_global *glob)
>  		       zone->name, (unsigned long long) zone->max_mem >>
10);
>  	}
>  	ttm_page_alloc_init(glob,
glob->zone_kernel->max_mem/(2*PAGE_SIZE));
> +	ttm_dma_page_alloc_init(glob,
glob->zone_kernel->max_mem/(2*PAGE_SIZE));
>  	return 0;
>  out_no_zone:
>  	ttm_mem_global_release(glob);
> @@ -410,6 +411,7 @@ void ttm_mem_global_release(struct ttm_mem_global
*glob)
>  	/* let the page allocator first stop the shrink work. */
>  	ttm_page_alloc_fini();
>  
> +	ttm_dma_page_alloc_fini();
>  	flush_workqueue(glob->swap_queue);
>  	destroy_workqueue(glob->swap_queue);
>  	glob->swap_queue = NULL;
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> new file mode 100644
> index 0000000..d6d8240
> --- /dev/null
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -0,0 +1,1394 @@
> +/*
> + * Copyright 2011 (c) Oracle Corp.
> +
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
"Software"),
> + * to deal in the Software without restriction, including without
limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sub
license,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> + * next paragraph) shall be included in all copies or substantial portions
> + * of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT
SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + *
> + * Author: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> + */
> +
> +/*
> + * A simple DMA pool losely based on dmapool.c. It has certain advantages
> + * over the DMA pools:
> + * - Pool collects resently freed pages for reuse (and hooks up to
> + *   the shrinker).
> + * - Tracks currently in use pages
> + * - Tracks whether the page is UC, WB or cached (and reverts to WB
> + *   when freed).
> + */
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/list.h>
> +#include <linux/seq_file.h> /* for seq_printf */
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/highmem.h>
> +#include <linux/mm_types.h>
> +#include <linux/module.h>
> +#include <linux/mm.h>
> +#include <linux/atomic.h>
> +#include <linux/device.h>
> +#include <linux/kthread.h>
> +#include "ttm/ttm_bo_driver.h"
> +#include "ttm/ttm_page_alloc.h"
> +#ifdef TTM_HAS_AGP
> +#include <asm/agp.h>
> +#endif
> +
> +#define NUM_PAGES_TO_ALLOC		(PAGE_SIZE/sizeof(struct page *))
> +#define SMALL_ALLOCATION		16
> +#define FREE_ALL_PAGES			(~0U)
> +/* times are in msecs */
> +#define IS_UNDEFINED			(0)
> +#define IS_WC				(1<<1)
> +#define IS_UC				(1<<2)
> +#define IS_CACHED			(1<<3)
> +#define IS_DMA32			(1<<4)
> +
> +enum pool_type {
> +	POOL_IS_UNDEFINED,
> +	POOL_IS_WC = IS_WC,
> +	POOL_IS_UC = IS_UC,
> +	POOL_IS_CACHED = IS_CACHED,
> +	POOL_IS_WC_DMA32 = IS_WC | IS_DMA32,
> +	POOL_IS_UC_DMA32 = IS_UC | IS_DMA32,
> +	POOL_IS_CACHED_DMA32 = IS_CACHED | IS_DMA32,
> +};
> +/*
> + * The pool structure. There are usually six pools:
> + *  - generic (not restricted to DMA32):
> + *      - write combined, uncached, cached.
> + *  - dma32 (up to 2^32 - so up 4GB):
> + *      - write combined, uncached, cached.
> + * for each ''struct device''. The
''cached'' is for pages that are actively used.
> + * The other ones can be shrunk by the shrinker API if neccessary.
> + * @pools: The ''struct device->dma_pools'' link.
> + * @type: Type of the pool
> + * @lock: Protects the inuse_list and free_list from concurrnet access.
Must be
> + * used with irqsave/irqrestore variants because pool allocator maybe
called
> + * from delayed work.
> + * @inuse_list: Pool of pages that are in use. The order is very important
and
> + *   it is in the order that the TTM pages that are put back are in.
> + * @free_list: Pool of pages that are free to be used. No order
requirements.
> + * @dev: The device that is associated with these pools.
> + * @size: Size used during DMA allocation.
> + * @npages_free: Count of available pages for re-use.
> + * @npages_in_use: Count of pages that are in use (each of them
> + *   is marked in_use.
> + * @nfrees: Stats when pool is shrinking.
> + * @nrefills: Stats when the pool is grown.
> + * @gfp_flags: Flags to pass for alloc_page.
> + * @fill_lock: Allows only one pool fill operation at time.
> + * @name: Name of the pool.
> + * @dev_name: Name derieved from dev - similar to how dev_info works.
> + *   Used during shutdown as the dev_info during release is unavailable.
> + */
> +struct dma_pool {
> +	struct list_head pools; /* The ''struct device->dma_pools link
*/
> +	enum pool_type type;
> +	spinlock_t lock;
> +	struct list_head inuse_list;
> +	struct list_head free_list;
> +	struct device *dev;
> +	unsigned size;
> +	unsigned npages_free;
> +	unsigned npages_in_use;
> +	unsigned long nfrees; /* Stats when shrunk. */
> +	unsigned long nrefills; /* Stats when grown. */
> +	gfp_t gfp_flags;
> +	bool fill_lock;
> +	char name[13]; /* "cached dma32" */
> +	char dev_name[64]; /* Constructed from dev */
> +};
> +
> +/*
> + * The accounting page keeping track of the allocated page along with
> + * the DMA address.
> + * @page_list: The link to the ''page_list'' in
''struct dma_pool''.
> + * @vaddr: The virtual address of the page
> + * @dma: The bus address of the page. If the page is not allocated
> + *   via the DMA API, it will be -1.
> + * @in_use: Set to true if in use. Should not be freed.
> + */
> +struct dma_page {
> +	struct list_head page_list;
> +	void *vaddr;
> +	struct page *p;
> +	dma_addr_t dma;
> +};
> +
> +/*
> + * Limits for the pool. They are handled without locks because only place
where
> + * they may change is in sysfs store. They won''t have immediate
effect anyway
> + * so forcing serialization to access them is pointless.
> + */
> +
> +struct ttm_pool_opts {
> +	unsigned	alloc_size;
> +	unsigned	max_size;
> +	unsigned	small;
> +};
> +
> +/*
> + * Contains the list of all of the ''struct device'' and
their corresponding
> + * DMA pools. Guarded by _mutex->lock.
> + * @pools: The link to ''struct
ttm_pool_manager->pools''
> + * @dev: The ''struct device'' associated with the
''pool''
> + * @pool: The ''struct dma_pool'' associated with the
''dev''
> + */
> +struct device_pools {
> +	struct list_head pools;
> +	struct device *dev;
> +	struct dma_pool *pool;
> +};
> +
> +/*
> + * struct ttm_pool_manager - Holds memory pools for fast allocation
> + *
> + * @lock: Lock used when adding/removing from pools
> + * @pools: List of ''struct device'' and ''struct
dma_pool'' tuples.
> + * @options: Limits for the pool.
> + * @npools: Total amount of pools in existence.
> + * @shrinker: The structure used by [un|]register_shrinker
> + */
> +struct ttm_pool_manager {
> +	struct mutex		lock;
> +	struct list_head	pools;
> +	struct ttm_pool_opts	options;
> +	unsigned		npools;
> +	struct shrinker		mm_shrink;
> +	struct kobject		kobj;
> +};
> +
> +static struct ttm_pool_manager *_manager;
> +
> +static struct attribute ttm_page_pool_max = {
> +	.name = "pool_max_size",
> +	.mode = S_IRUGO | S_IWUSR
> +};
> +static struct attribute ttm_page_pool_small = {
> +	.name = "pool_small_allocation",
> +	.mode = S_IRUGO | S_IWUSR
> +};
> +static struct attribute ttm_page_pool_alloc_size = {
> +	.name = "pool_allocation_size",
> +	.mode = S_IRUGO | S_IWUSR
> +};
> +
> +static struct attribute *ttm_pool_attrs[] = {
> +	&ttm_page_pool_max,
> +	&ttm_page_pool_small,
> +	&ttm_page_pool_alloc_size,
> +	NULL
> +};
> +
> +static void ttm_pool_kobj_release(struct kobject *kobj)
> +{
> +	struct ttm_pool_manager *m > +		container_of(kobj, struct
ttm_pool_manager, kobj);
> +	kfree(m);
> +}
> +
> +static ssize_t ttm_pool_store(struct kobject *kobj, struct attribute
*attr,
> +			      const char *buffer, size_t size)
> +{
> +	struct ttm_pool_manager *m > +		container_of(kobj, struct
ttm_pool_manager, kobj);
> +	int chars;
> +	unsigned val;
> +	chars = sscanf(buffer, "%u", &val);
> +	if (chars == 0)
> +		return size;
> +
> +	/* Convert kb to number of pages */
> +	val = val / (PAGE_SIZE >> 10);
> +
> +	if (attr == &ttm_page_pool_max)
> +		m->options.max_size = val;
> +	else if (attr == &ttm_page_pool_small)
> +		m->options.small = val;
> +	else if (attr == &ttm_page_pool_alloc_size) {
> +		if (val > NUM_PAGES_TO_ALLOC*8) {
> +			printk(KERN_ERR TTM_PFX
> +			       "Setting allocation size to %lu "
> +			       "is not allowed. Recommended size is "
> +			       "%lu\n",
> +			       NUM_PAGES_TO_ALLOC*(PAGE_SIZE >> 7),
> +			       NUM_PAGES_TO_ALLOC*(PAGE_SIZE >> 10));
> +			return size;
> +		} else if (val > NUM_PAGES_TO_ALLOC) {
> +			printk(KERN_WARNING TTM_PFX
> +			       "Setting allocation size to "
> +			       "larger than %lu is not recommended.\n",
> +			       NUM_PAGES_TO_ALLOC*(PAGE_SIZE >> 10));
> +		}
> +		m->options.alloc_size = val;
> +	}
> +
> +	return size;
> +}
> +
> +static ssize_t ttm_pool_show(struct kobject *kobj, struct attribute *attr,
> +			     char *buffer)
> +{
> +	struct ttm_pool_manager *m > +		container_of(kobj, struct
ttm_pool_manager, kobj);
> +	unsigned val = 0;
> +
> +	if (attr == &ttm_page_pool_max)
> +		val = m->options.max_size;
> +	else if (attr == &ttm_page_pool_small)
> +		val = m->options.small;
> +	else if (attr == &ttm_page_pool_alloc_size)
> +		val = m->options.alloc_size;
> +
> +	val = val * (PAGE_SIZE >> 10);
> +
> +	return snprintf(buffer, PAGE_SIZE, "%u\n", val);
> +}
> +
> +static const struct sysfs_ops ttm_pool_sysfs_ops = {
> +	.show = &ttm_pool_show,
> +	.store = &ttm_pool_store,
> +};
> +
> +static struct kobj_type ttm_pool_kobj_type = {
> +	.release = &ttm_pool_kobj_release,
> +	.sysfs_ops = &ttm_pool_sysfs_ops,
> +	.default_attrs = ttm_pool_attrs,
> +};
> +
> +#ifndef CONFIG_X86
> +static int set_pages_array_wb(struct page **pages, int addrinarray)
> +{
> +#ifdef TTM_HAS_AGP
> +	int i;
> +
> +	for (i = 0; i < addrinarray; i++)
> +		unmap_page_from_agp(pages[i]);
> +#endif
> +	return 0;
> +}
> +
> +static int set_pages_array_wc(struct page **pages, int addrinarray)
> +{
> +#ifdef TTM_HAS_AGP
> +	int i;
> +
> +	for (i = 0; i < addrinarray; i++)
> +		map_page_into_agp(pages[i]);
> +#endif
> +	return 0;
> +}
> +
> +static int set_pages_array_uc(struct page **pages, int addrinarray)
> +{
> +#ifdef TTM_HAS_AGP
> +	int i;
> +
> +	for (i = 0; i < addrinarray; i++)
> +		map_page_into_agp(pages[i]);
> +#endif
> +	return 0;
> +}
> +#endif /* for !CONFIG_X86 */
> +
> +static int ttm_set_pages_caching(struct dma_pool *pool,
> +				 struct page **pages, unsigned cpages)
> +{
> +	int r = 0;
> +	/* Set page caching */
> +	if (pool->type & IS_UC) {
> +		r = set_pages_array_uc(pages, cpages);
> +		if (r)
> +			pr_err(TTM_PFX
> +			       "%s: Failed to set %d pages to uc!\n",
> +			       pool->dev_name, cpages);
> +	}
> +	if (pool->type & IS_WC) {
> +		r = set_pages_array_wc(pages, cpages);
> +		if (r)
> +			pr_err(TTM_PFX
> +			       "%s: Failed to set %d pages to wc!\n",
> +			       pool->dev_name, cpages);
> +	}
> +	return r;
> +}
> +
> +static void __ttm_dma_free_page(struct dma_pool *pool, struct dma_page
*d_page)
> +{
> +	dma_addr_t dma = d_page->dma;
> +	dma_free_coherent(pool->dev, pool->size, d_page->vaddr, dma);
> +
> +	kfree(d_page);
> +	d_page = NULL;
> +}
> +static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool)
> +{
> +	struct dma_page *d_page;
> +
> +	d_page = kmalloc(sizeof(struct dma_page), GFP_KERNEL);
> +	if (!d_page)
> +		return NULL;
> +
> +	d_page->vaddr = dma_alloc_coherent(pool->dev, pool->size,
> +					   &d_page->dma,
> +					   pool->gfp_flags);
> +	d_page->p = virt_to_page(d_page->vaddr);
> +	if (!d_page->vaddr) {
> +		kfree(d_page);
> +		d_page = NULL;
> +	}
Move d_page->p = virt_to_page(d_page->vaddr); after if (!d_page->vaddr)
block.
> +	return d_page;
> +}
> +static enum pool_type ttm_to_type(int flags, enum ttm_caching_state
cstate)
> +{
> +	enum pool_type type = IS_UNDEFINED;
> +
> +	if (flags & TTM_PAGE_FLAG_DMA32)
> +		type |= IS_DMA32;
> +	if (cstate == tt_cached)
> +		type |= IS_CACHED;
> +	else if (cstate == tt_uncached)
> +		type |= IS_UC;
> +	else
> +		type |= IS_WC;
> +
> +	return type;
> +}
> +static void ttm_pool_update_free_locked(struct dma_pool *pool,
> +					unsigned freed_pages)
> +{
> +	pool->npages_free -= freed_pages;
> +	pool->nfrees += freed_pages;
> +
> +}
> +/* set memory back to wb and free the pages. */
> +static void ttm_dma_pages_put(struct dma_pool *pool, struct list_head
*d_pages,
> +			struct page *pages[], unsigned npages)
> +{
> +	struct dma_page *d_page, *tmp;
> +
> +	if (npages && set_pages_array_wb(pages, npages))
> +		pr_err(TTM_PFX "%s: Failed to set %d pages to wb!\n",
> +			pool->dev_name, npages);
> +
> +	if (npages > 1) {
> +		pr_debug("%s: (%s:%d) Freeing %d pages at once (lockless).\n",
> +			pool->dev_name, pool->name, current->pid, npages);
> +	}
> +
> +	list_for_each_entry_safe(d_page, tmp, d_pages, page_list) {
> +		list_del(&d_page->page_list);
> +		__ttm_dma_free_page(pool, d_page);
> +	}
> +}
> +/*
> + * Free pages from pool.
> + *
> + * To prevent hogging the ttm_swap process we only free NUM_PAGES_TO_ALLOC
> + * number of pages in one go.
> + *
> + * @pool: to free the pages from
> + * @nr_free: If set to true will free all pages in pool
> + **/
> +static unsigned ttm_dma_page_pool_free(struct dma_pool *pool, unsigned
nr_free)
> +{
> +	unsigned long irq_flags;
> +	struct dma_page *dma_p, *tmp;
> +	struct page **pages_to_free;
> +	struct list_head d_pages;
> +	unsigned freed_pages = 0,
> +		 npages_to_free = nr_free;
> +
> +	if (NUM_PAGES_TO_ALLOC < nr_free)
> +		npages_to_free = NUM_PAGES_TO_ALLOC;
> +#if 0
> +	if (nr_free > 1) {
> +		pr_debug("%s: (%s:%d) Attempting to free %d (%d) pages\n",
> +			pool->dev_name, pool->name, current->pid,
> +			npages_to_free, nr_free);
> +	}
> +#endif
> +	pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
> +			GFP_KERNEL);
> +
> +	if (!pages_to_free) {
> +		pr_err(TTM_PFX
> +		       "%s: Failed to allocate memory for pool free
operation.\n",
> +			pool->dev_name);
> +		return 0;
> +	}
> +	INIT_LIST_HEAD(&d_pages);
> +restart:
> +	spin_lock_irqsave(&pool->lock, irq_flags);
> +
> +	/* We picking the oldest ones off the list */
> +	list_for_each_entry_safe_reverse(dma_p, tmp, &pool->free_list,
> +					 page_list) {
> +		if (freed_pages >= npages_to_free)
> +			break;
> +
> +		/* Move the dma_page from one list to another. */
> +		list_move(&dma_p->page_list, &d_pages);
> +
> +		pages_to_free[freed_pages++] = dma_p->p;
> +		/* We can only remove NUM_PAGES_TO_ALLOC at a time. */
> +		if (freed_pages >= NUM_PAGES_TO_ALLOC) {
> +
> +			ttm_pool_update_free_locked(pool, freed_pages);
> +			/**
> +			 * Because changing page caching is costly
> +			 * we unlock the pool to prevent stalling.
> +			 */
> +			spin_unlock_irqrestore(&pool->lock, irq_flags);
> +
> +			ttm_dma_pages_put(pool, &d_pages, pages_to_free,
> +				      freed_pages);
> +
> +			INIT_LIST_HEAD(&d_pages);
> +
> +			if (likely(nr_free != FREE_ALL_PAGES))
> +				nr_free -= freed_pages;
> +
> +			if (NUM_PAGES_TO_ALLOC >= nr_free)
> +				npages_to_free = nr_free;
> +			else
> +				npages_to_free = NUM_PAGES_TO_ALLOC;
> +
> +			freed_pages = 0;
> +
> +			/* free all so restart the processing */
> +			if (nr_free)
> +				goto restart;
> +
> +			/* Not allowed to fall through or break because
> +			 * following context is inside spinlock while we are
> +			 * outside here.
> +			 */
> +			goto out;
> +
> +		}
> +	}
> +
> +	/* remove range of pages from the pool */
> +	if (freed_pages) {
> +		ttm_pool_update_free_locked(pool, freed_pages);
> +		nr_free -= freed_pages;
> +	}
> +
> +	spin_unlock_irqrestore(&pool->lock, irq_flags);
> +
> +	if (freed_pages)
> +		ttm_dma_pages_put(pool, &d_pages, pages_to_free, freed_pages);
> +out:
> +	kfree(pages_to_free);
> +	return nr_free;
> +}
> +
> +static void ttm_dma_free_pool(struct device *dev, enum pool_type type)
> +{
> +	struct device_pools *p;
> +	struct dma_pool *pool;
> +	struct dma_page *d_page, *d_tmp;
> +
> +	if (!dev)
> +		return;
> +
> +	mutex_lock(&_manager->lock);
> +	list_for_each_entry_reverse(p, &_manager->pools, pools) {
> +		if (p->dev != dev)
> +			continue;
> +		pool = p->pool;
> +		if (pool->type != type)
> +			continue;
> +
> +		list_del(&p->pools);
> +		kfree(p);
> +		_manager->npools--;
> +		break;
> +	}
> +	list_for_each_entry_reverse(pool, &dev->dma_pools, pools) {
> +		unsigned long irq_save;
> +		if (pool->type != type)
> +			continue;
> +		/* Takes a spinlock.. */
> +		ttm_dma_page_pool_free(pool, FREE_ALL_PAGES);
> +		/* .. but afterwards we can take it too */
> +		spin_lock_irqsave(&pool->lock, irq_save);
> +		list_for_each_entry_safe(d_page, d_tmp, &pool->inuse_list,
> +					 page_list) {
> +			pr_err("%s: (%s:%d) %p (%p DMA:0x%lx) busy!\n",
> +				pool->dev_name, pool->name,
> +				current->pid, d_page->vaddr,
> +				virt_to_page(d_page->vaddr),
> +				(unsigned long)d_page->dma);
> +			list_del(&d_page->page_list);
> +			kfree(d_page);
> +			pool->npages_in_use--;
> +		}
> +		spin_unlock_irqrestore(&pool->lock, irq_save);
> +		WARN_ON(((pool->npages_in_use + pool->npages_free) != 0));
> +		/* This code path is called after _all_ references to the
> +		 * struct device has been dropped - so nobody should be
> +		 * touching it. In case somebody is trying to _add_ we are
> +		 * guarded by the mutex. */
> +		list_del(&pool->pools);
> +		kfree(pool);
> +		break;
> +	}
> +	mutex_unlock(&_manager->lock);
> +}
> +/*
> + * On free-ing of the ''struct device'' this deconstructor
is run.
> + * Albeit the pool might have already been freed earlier.
> + */
> +static void ttm_dma_pool_release(struct device *dev, void *res)
> +{
> +	struct dma_pool *pool = *(struct dma_pool **)res;
> +
> +	if (pool)
> +		ttm_dma_free_pool(dev, pool->type);
> +}
> +
> +static int ttm_dma_pool_match(struct device *dev, void *res, void
*match_data)
> +{
> +	return *(struct dma_pool **)res == match_data;
> +}
> +
> +static struct dma_pool *ttm_dma_pool_init(struct device *dev, gfp_t flags,
> +					  enum pool_type type)
> +{
> +	char *n[] = {"wc", "uc", "cached", "
dma32", "unknown",};
> +	enum pool_type t[] = {IS_WC, IS_UC, IS_CACHED, IS_DMA32, IS_UNDEFINED};
> +	struct device_pools *sec_pool = NULL;
> +	struct dma_pool *pool = NULL, **ptr;
> +	unsigned i;
> +	int ret = -ENODEV;
> +	char *p;
> +
> +	if (!dev)
> +		return NULL;
> +
> +	ptr = devres_alloc(ttm_dma_pool_release, sizeof(*ptr), GFP_KERNEL);
> +	if (!ptr)
> +		return NULL;
> +
> +	ret = -ENOMEM;
> +
> +	pool = kmalloc_node(sizeof(struct dma_pool), GFP_KERNEL,
> +			    dev_to_node(dev));
> +	if (!pool)
> +		goto err_mem;
> +
> +	sec_pool = kmalloc_node(sizeof(struct device_pools), GFP_KERNEL,
> +				dev_to_node(dev));
> +	if (!sec_pool)
> +		goto err_mem;
> +
> +	INIT_LIST_HEAD(&sec_pool->pools);
> +	sec_pool->dev = dev;
> +	sec_pool->pool =  pool;
> +
> +	INIT_LIST_HEAD(&pool->free_list);
> +	INIT_LIST_HEAD(&pool->inuse_list);
> +	INIT_LIST_HEAD(&pool->pools);
> +	spin_lock_init(&pool->lock);
> +	pool->dev = dev;
> +	pool->npages_free = pool->npages_in_use = 0;
> +	pool->nfrees = 0;
> +	pool->gfp_flags = flags;
> +	pool->size = PAGE_SIZE;
> +	pool->type = type;
> +	pool->nrefills = 0;
> +	pool->fill_lock = false;
> +	p = pool->name;
> +	for (i = 0; i < 5; i++) {
> +		if (type & t[i]) {
> +			p += snprintf(p, sizeof(pool->name) - (p - pool->name),
> +				      "%s", n[i]);
> +		}
> +	}
> +	*p = 0;
> +	/* We copy the name for pr_ calls b/c when dma_pool_destroy is called
> +	 * - the kobj->name has already been deallocated.*/
> +	snprintf(pool->dev_name, sizeof(pool->dev_name), "%s %s",
> +		 dev_driver_string(dev), dev_name(dev));
> +	mutex_lock(&_manager->lock);
> +	/* You can get the dma_pool from either the global: */
> +	list_add(&sec_pool->pools, &_manager->pools);
> +	_manager->npools++;
> +	/* or from ''struct device'': */
> +	list_add(&pool->pools, &dev->dma_pools);
> +	mutex_unlock(&_manager->lock);
> +
> +	*ptr = pool;
> +	devres_add(dev, ptr);
> +
> +	return pool;
> +err_mem:
> +	devres_free(ptr);
> +	kfree(sec_pool);
> +	kfree(pool);
> +	return ERR_PTR(ret);
> +}
> +static struct dma_pool *ttm_dma_find_pool(struct device *dev,
> +					  enum pool_type type)
> +{
> +	struct dma_pool *pool, *tmp, *found = NULL;
> +
> +	if (type == IS_UNDEFINED)
> +		return found;
> +	/* NB: We iterate on the ''struct dev'' which has no
spinlock, but
> +	 * it does have a kref which we have taken. */
I fail to see where we kref dev.
> +	list_for_each_entry_safe(pool, tmp, &dev->dma_pools, pools) {
> +		if (pool->type != type)
> +			continue;
> +		found = pool;
> +		break;
> +	}
> +	return found;
> +}
> +
> +/*
> + * Free pages the pages that failed to change the caching state. If there
> + * are pages that have changed their caching state already put them to the
> + * pool.
> + */
> +static void ttm_dma_handle_caching_state_failure(struct dma_pool *pool,
> +						 struct list_head *d_pages,
> +						 struct page **failed_pages,
> +						 unsigned cpages)
> +{
> +	struct dma_page *d_page, *tmp;
> +	struct page *p;
> +	unsigned i = 0;
> +
> +	p = failed_pages[0];
> +	if (!p)
> +		return;
> +	/* Find the failed page. */
> +	list_for_each_entry_safe(d_page, tmp, d_pages, page_list) {
> +		if (d_page->p != p)
> +			continue;
> +		/* .. and then progress over the full list. */
> +		list_del(&d_page->page_list);
> +		__ttm_dma_free_page(pool, d_page);
> +		if (++i < cpages)
> +			p = failed_pages[i];
> +		else
> +			break;
> +	}
> +
> +}
> +/*
> + * Allocate ''count'' pages, and put
''need'' number of them on the
> + * ''pages'' and as well on the
''dma_address'' starting at ''dma_offset''
offset.
> + * The full list of pages should also be on ''d_pages''.
> + * We return zero for success, and negative numbers as errors.
> + */
> +static int ttm_dma_pool_alloc_new_pages(struct dma_pool *pool,
> +					struct list_head *d_pages,
> +					unsigned count)
> +{
> +	struct page **caching_array;
> +	struct dma_page *dma_p;
> +	struct page *p;
> +	int r = 0;
> +	unsigned i, cpages;
> +	unsigned max_cpages = min(count,
> +			(unsigned)(PAGE_SIZE/sizeof(struct page *)));
> +
> +	/* allocate array for page caching change */
> +	caching_array = kmalloc(max_cpages*sizeof(struct page *), GFP_KERNEL);
> +
> +	if (!caching_array) {
> +		pr_err(TTM_PFX
> +		       "%s: Unable to allocate table for new pages.",
> +			pool->dev_name);
> +		return -ENOMEM;
> +	}
> +
> +	if (count > 1) {
> +		pr_debug("%s: (%s:%d) Getting %d pages\n",
> +			pool->dev_name, pool->name, current->pid,
> +			count);
> +	}
> +
> +	for (i = 0, cpages = 0; i < count; ++i) {
> +		dma_p = __ttm_dma_alloc_page(pool);
> +		if (!dma_p) {
> +			pr_err(TTM_PFX "%s: Unable to get page %u.\n",
> +				pool->dev_name, i);
> +
> +			/* store already allocated pages in the pool after
> +			 * setting the caching state */
> +			if (cpages) {
> +				r = ttm_set_pages_caching(pool, caching_array,
> +							  cpages);
> +				if (r)
> +					ttm_dma_handle_caching_state_failure(
> +						pool, d_pages, caching_array,
> +						cpages);
> +			}
> +			r = -ENOMEM;
> +			goto out;
> +		}
> +		p = dma_p->p;
> +#ifdef CONFIG_HIGHMEM
> +		/* gfp flags of highmem page should never be dma32 so we
> +		 * we should be fine in such case
> +		 */
> +		if (!PageHighMem(p))
> +#endif
> +		{
> +			caching_array[cpages++] = p;
> +			if (cpages == max_cpages) {
> +				/* Note: Cannot hold the spinlock */
> +				r = ttm_set_pages_caching(pool, caching_array,
> +						 cpages);
> +				if (r) {
> +					ttm_dma_handle_caching_state_failure(
> +						pool, d_pages, caching_array,
> +						cpages);
> +					goto out;
> +				}
> +				cpages = 0;
> +			}
> +		}
> +		list_add(&dma_p->page_list, d_pages);
> +	}
> +
> +	if (cpages) {
> +		r = ttm_set_pages_caching(pool, caching_array, cpages);
> +		if (r)
> +			ttm_dma_handle_caching_state_failure(pool, d_pages,
> +					caching_array, cpages);
> +	}
> +out:
> +	kfree(caching_array);
> +	return r;
> +}
> +static bool ttm_dma_iterate_reverse(struct dma_pool *pool,
> +				    struct dma_page *d_page,
> +				    struct page *p)
> +{
> +
> +	/* Note: When TTM layer gets pages - it gets them one page at a time
> +	 * and puts them on an array (so most recently allocated page is at
> +	 * at the back). The inuse_list is a copy of those pages, but in the
> +	 * exact opposite order. This is b/c when TTM puts pages back, it
> +	 * constructs a stack with the oldest element on the top. Hence the
> +	 * inuse_list is constructed with the same order so that it will
> +	 * efficiently be matched against the stack.
> +	 * But, just in case the pages are not in that order, we double check
> +	 * the ''pages'' against our inuse_list in case we have
to go in reverse.
> +	 */
> +	struct page *p_next;
> +	struct dma_page *tmp;
> +
> +	tmp = list_entry(d_page->page_list.prev, struct dma_page, page_list);
> +	if (&tmp->page_list != &pool->inuse_list) {
> +		p_next = list_entry(p->lru.next, struct page, lru);
> +		if (tmp->p == p_next)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +/*
> + * Iterate forward (or backwards if ''reverse'' is true)
by one element
> + * in the pool->in_use list. We use ''d_page'' as the
starting point.
> + * The ''d_page'' upon completion of the iteration, is
moved to the
> + * ''d_pages'' list.
> + */
> +static struct dma_page *ttm_dma_iterate_next(struct dma_pool *pool,
> +					     struct dma_page *d_page,
> +					     struct list_head *d_pages,
> +					     bool reverse)
> +{
> +	struct dma_page *next = NULL;
> +
> +	if (unlikely(reverse)) {
> +		if (&d_page->page_list != &pool->inuse_list)
> +			next = list_entry(d_page->page_list.prev,
> +					  struct dma_page,
> +					  page_list);
> +		list_move(&d_page->page_list, d_pages);
> +	} else {
> +		if (&d_page->page_list != &pool->inuse_list)
> +			next = list_entry(d_page->page_list.next,
> +					  struct dma_page,
> +					  page_list);
> +		list_move_tail(&d_page->page_list, d_pages);
> +	}
> +	return next;
> +}
> +/*
> + * Iterate forward (or backwards if ''reverse'' is true),
looking
> + * for page ''p'' in the pool->inuse_list, starting at
''start''.
> + */
> +static struct dma_page *ttm_dma_iterate_forward(struct dma_pool *pool,
> +						struct dma_page *start,
> +						struct page *p,
> +						bool reverse)
> +{
> +	struct dma_page *tmp = start;
> +
> +	if (unlikely(reverse)) {
> +		list_for_each_entry_continue_reverse(tmp, &pool->inuse_list,
> +						     page_list) {
> +			if (p == tmp->p)
> +				return tmp;
> +		}
> +	} else {
> +		list_for_each_entry_continue(tmp, &pool->inuse_list,
> +					     page_list) {
> +			if (p == tmp->p)
> +				return tmp;
> +		}
> +	}
> +	return NULL;
> +}
> +/*
> + * Recycle (or delete) the ''pages'' that are on the
''pool''.
> + * @pool: The pool that the pages are associated with.
> + * @pages: The list of pages we are done with.
> + * @page_count: Count of how many pages (or zero if all).
> + * @erase: Instead of recycling - just free them.
> + */
> +static unsigned int ttm_dma_put_pages_in_pool(struct dma_pool *pool,
> +					      struct list_head *pages,
> +					      unsigned page_count,
> +					      bool erase)
> +{
> +	unsigned long uninitialized_var(irq_flags);
> +	struct list_head uninitialized_var(d_pages);
> +	struct page **uninitialized_var(array_pages);
> +	unsigned uninitialized_var(freed_pages);
> +	struct page *p, *tmp;
> +	unsigned count = 0;
> +	struct dma_page *d_tmp, *d_page = NULL;
> +	bool rev = false;
> +	if (unlikely(WARN_ON(list_empty(pages))))
> +		return 0;
> +
> +	if (page_count == 0) {
> +		list_for_each_entry(p, pages, lru)
> +			++page_count;
> +
> +	}
> +	if (page_count > 1) {
> +		pr_debug("%s: (%s:%d) %s %d pages\n",
> +			pool->dev_name, pool->name, current->pid,
> +			erase ? "Destroying" : "Recycling", page_count);
> +	}
> +
> +	/* d_pages is the list of ''struct dma_page'' */
> +	INIT_LIST_HEAD(&d_pages);
> +
> +	if (erase) {
> +		/* and pages_to_free is used for cache reset */
> +		array_pages = kmalloc(page_count * sizeof(struct page *),
> +				GFP_KERNEL);
> +		if (!array_pages) {
> +			dev_err(pool->dev, TTM_PFX
> +			"Failed to allocate memory for pool free operation.\n");
> +			return 0;
> +		}
> +		freed_pages = 0;
> +	}
> +
> +	/* Find the first page of the "chunk" of pages. */
> +	p = list_first_entry(pages, struct page, lru);
> +	spin_lock_irqsave(&pool->lock, irq_flags);
> +restart:
> +	list_for_each_entry(d_tmp, &pool->inuse_list, page_list) {
> +		if (p == d_tmp->p) {
> +			d_page = d_tmp;
> +			break;
> +		}
> +	}
> +	/* The pages are _not_ in this pool. */
> +	if (!d_page) {
> +		spin_unlock_irqrestore(&pool->lock, irq_flags);
> +		return 0;
> +	}
> +	rev = ttm_dma_iterate_reverse(pool, d_page, p);
> +	if (rev)
> +		pr_debug("%s: (%s:%d) Traversing %d in reverse order\n",
> +			pool->dev_name, pool->name, current->pid, page_count);
> +	/* Continue iterating on both lists. */
> +	list_for_each_entry_safe(p, tmp, pages, lru) {
> +		if (d_page->p != p && count != page_count) {
> +			/* Yikes! The inuse stack is swiss cheese. Have to
> +			   start looking.*/
> +			d_page = ttm_dma_iterate_forward(pool, d_page, p, rev);
> +			if (!d_page)
> +				goto restart;
> +		}
> +		/* Do not advance past what we were asked to delete. */
> +		if (d_page->p != p)
> +			break;
> +		list_del(&p->lru);
> +
> +		if (erase)
> +			array_pages[freed_pages++] = d_page->p;
> +		d_page = ttm_dma_iterate_next(pool, d_page, &d_pages, rev);
> +		if (!d_page)
> +			break;
> +		count++;
> +		/* Check if we should iterate. */
> +		if (count == page_count)
> +			break;
> +	}
> +	if (!erase) /* And stick ''em on the free pool. */
> +		list_splice(&d_pages, &pool->free_list);
> +
> +	spin_unlock_irqrestore(&pool->lock, irq_flags);
> +
> +	if (erase) {
> +		/* Note: The caller of us updates the pool accounting. */
> +		ttm_dma_pages_put(pool, &d_pages, array_pages /* to set WB */,
> +				  freed_pages);
> +		kfree(array_pages);
> +	}
> +	if (count > 1) {
> +		pr_debug("%s: (%s:%d) %d/%d pages %s pool.\n",
> +			pool->dev_name, pool->name, current->pid,
> +			count, page_count,
> +			erase ? "erased from inuse" : "put in free");
> +	}
> +	return count;
> +}
> +/*
> + * @return count of pages still required to fulfill the request.
> +*/
> +static int ttm_dma_page_pool_fill_locked(struct dma_pool *pool,
> +					 unsigned count,
> +					 unsigned long *irq_flags)
> +{
> +	int r = count;
> +
> +	if (pool->fill_lock)
> +		return r;
> +
> +	pool->fill_lock = true;
> +	if (count < _manager->options.small &&
> +	    count > pool->npages_free) {
> +		struct list_head d_pages;
> +		unsigned alloc_size =  _manager->options.alloc_size;
> +
> +		INIT_LIST_HEAD(&d_pages);
> +
> +		spin_unlock_irqrestore(&pool->lock, *irq_flags);
> +
> +		/* Returns how many more are neccessary to fulfill the
> +		 * request. */
> +		r = ttm_dma_pool_alloc_new_pages(pool, &d_pages, alloc_size);
> +
> +		spin_lock_irqsave(&pool->lock, *irq_flags);
> +		if (!r) {
> +			/* Add the fresh to the end.. */
> +			list_splice(&d_pages, &pool->free_list);
> +			++pool->nrefills;
> +			pool->npages_free += alloc_size;
> +		} else {
> +			struct dma_page *d_page;
> +			unsigned cpages = 0;
> +
> +			pr_err(TTM_PFX "%s: Failed to fill %s pool (r:%d)!\n",
> +				pool->dev_name, pool->name, r);
> +
> +			list_for_each_entry(d_page, &d_pages, page_list) {
> +				cpages++;
> +			}
> +			list_splice_tail(&d_pages, &pool->free_list);
> +			pool->npages_free += cpages;
> +		}
> +	}
> +	pool->fill_lock = false;
> +	return r;
> +
> +}
> +
> +/*
> + * @return count of pages still required to fulfill the request.
> + * The populate list is actually a stack (not that is matters as TTM
> + * allocates one page at a time.
> + */
> +static int ttm_dma_pool_get_pages(struct dma_pool *pool,
> +				  struct list_head *pages,
> +				  dma_addr_t *dma_address, unsigned count)
> +{
> +	unsigned long irq_flags;
> +	int r;
> +	unsigned i;
> +	struct dma_page *d_page, *tmp;
> +	struct list_head d_pages;
> +
> +	spin_lock_irqsave(&pool->lock, irq_flags);
> +	r = ttm_dma_page_pool_fill_locked(pool, count, &irq_flags);
> +	if (r < 0) {
> +		pr_debug("%s: (%s:%d) Asked for %d, got %d %s.\n",
> +			pool->dev_name, pool->name, current->pid, count, r,
> +			(r < 0) ? "err:" : "pages");
> +		goto out;
> +	}
> +	if (!pool->npages_free)
> +		goto out;
> +	if (count > 1) {
> +		pr_debug("%s: (%s:%d) Looking in free list for %d pages. "\
> +			 "(have %d pages free)\n",
> +			 pool->dev_name, pool->name, current->pid, count,
> +			 pool->npages_free);
> +	}
> +	i = 0;
> +	/* We are holding the spinlock.. */
> +	INIT_LIST_HEAD(&d_pages);
> +	/* Note: The  the ''pages'' (and inuse_list) is expected
to be a stack,
> +	 * so we put the entries in the right order (and on the inuse list
> +	 * in the reverse order to compenstate for freeing - which inverts the
> +	 * ''pages'' order).
> +	 */
> +	list_for_each_entry_safe(d_page, tmp, &pool->free_list, page_list)
{
> +		list_add_tail(&d_page->p->lru, pages);
> +		dma_address[i++] = d_page->dma;
> +		list_move(&d_page->page_list, &d_pages);
> +		if (i == count)
> +			break;
> +	}
> +	/* Note: The ''inuse_list'' must have the same order as
the ''pages''
> +	 * to be effective when pages are put back. And since
''pages'' is
> +	 * as stack, ergo inuse_list is a stack too. */
> +	list_splice(&d_pages, &pool->inuse_list);
> +	count -= i;
> +	pool->npages_in_use += i;
> +	pool->npages_free -= i;
> +out:
> +	spin_unlock_irqrestore(&pool->lock, irq_flags);
> +	if (count)
> +		pr_debug("%s: (%s:%d) Need %d more.\n",
> +			pool->dev_name, pool->name, current->pid, count);
> +	return count;
> +}
> +/*
> + * On success pages list will hold count number of correctly
> + * cached pages. On failure will hold the negative return value (-ENOMEM,
etc).
> + */
> +int ttm_dma_get_pages(struct ttm_tt *ttm, struct list_head *pages,
> +		      unsigned count, dma_addr_t *dma_address)
> +
> +{
> +	int r = -ENOMEM;
> +	struct dma_pool *pool;
> +	gfp_t gfp_flags;
> +	enum pool_type type;
> +	struct device *dev = ttm->be->dev;
> +
> +	type = ttm_to_type(ttm->page_flags, ttm->caching_state);
> +
> +	if (ttm->page_flags & TTM_PAGE_FLAG_DMA32)
> +		gfp_flags = GFP_USER | GFP_DMA32;
> +	else
> +		gfp_flags = GFP_HIGHUSER;
> +
> +	if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)
> +		gfp_flags |= __GFP_ZERO;
> +
> +	pool = ttm_dma_find_pool(dev, type);
> +	if (!pool) {
> +		pool = ttm_dma_pool_init(dev, gfp_flags, type);
> +		if (IS_ERR_OR_NULL(pool))
> +			return -ENOMEM;
> +	}
> +#if 0
> +	if (count > 1) {
> +		pr_debug("%s (%s:%d) Attempting to get %d pages type %x\n",
> +			pool->dev_name, pool->name, current->pid, count,
> +			cstate);
> +	}
> +#endif
> +	/* Take pages out of a pool (if applicable) */
> +	r = ttm_dma_pool_get_pages(pool, pages, dma_address, count);
> +	/* clear the pages coming from the pool if requested */
> +	if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC) {
> +		struct page *p;
> +		list_for_each_entry(p, pages, lru) {
> +			clear_page(page_address(p));
> +		}
> +	}
> +	/* If pool didn''t have enough pages allocate new one. */
> +	if (r > 0) {
> +		struct list_head d_pages;
> +		unsigned pages_need = r;
> +		unsigned long irq_flags;
> +
> +		INIT_LIST_HEAD(&d_pages);
> +
> +		/* Note, we are running without locking here..
> +		 * and we have to manually add the stack to the inuse pool. */
> +		r = ttm_dma_pool_alloc_new_pages(pool, &d_pages, pages_need);
> +
> +		if (r == 0) {
> +			struct dma_page *d_page;
> +			int i = count - 1;
> +
> +			/* Since the pages are directly going to the inuse_list
> +			 * which is stack based, lets treat it as a stack.
> +			 */
> +			list_for_each_entry(d_page,  &d_pages, page_list) {
> +				list_add(&d_page->p->lru, pages);
> +				BUG_ON(i < 0);
> +				dma_address[i--] = d_page->dma;
> +			}
> +			spin_lock_irqsave(&pool->lock, irq_flags);
> +			pool->npages_in_use += pages_need;
> +			list_splice(&d_pages, &pool->inuse_list);
> +			spin_unlock_irqrestore(&pool->lock, irq_flags);
> +		} else {
> +			/* If there is any pages in the list put them back to
> +			 * the pool. */
> +			pr_err(TTM_PFX
> +			       "%s: Failed to allocate extra pages "
> +			       "for large request.",
> +				pool->dev_name);
> +			spin_lock_irqsave(&pool->lock, irq_flags);
> +			pool->npages_free += r;
> +			/* We don''t care about ordering on the free_list. */
> +			list_splice(&d_pages, &pool->free_list);
> +			spin_unlock_irqrestore(&pool->lock, irq_flags);
> +			return count;
> +		}
> +	}
> +	return r;
> +}
> +
> +/* Get good estimation how many pages are free in pools */
> +static int ttm_dma_pool_get_num_unused_pages(void)
> +{
> +	struct device_pools *p;
> +	unsigned total = 0;
> +
> +	mutex_lock(&_manager->lock);
> +	list_for_each_entry(p, &_manager->pools, pools) {
> +		if (p)
> +			total += p->pool->npages_free;
> +	}
> +	mutex_unlock(&_manager->lock);
> +	return total;
> +}
> +
> +/* Put all pages in pages list to correct pool to wait for reuse */
> +void ttm_dma_put_pages(struct ttm_tt *ttm, struct list_head *pages,
> +		       unsigned page_count, dma_addr_t *dma_address)
> +{
> +	struct dma_pool *pool;
> +	enum pool_type type;
> +	bool is_cached = false;
> +	unsigned count = 0, i;
> +	unsigned long irq_flags;
> +	struct device *dev = ttm->be->dev;
> +
> +	if (list_empty(pages))
> +		return;
> +
> +	type = ttm_to_type(ttm->page_flags, ttm->caching_state);
> +	pool = ttm_dma_find_pool(dev, type);
> +	if (!pool) {
> +		WARN_ON(!pool);
> +		return;
> +	}
> +	is_cached = (ttm_dma_find_pool(pool->dev,
> +		     ttm_to_type(ttm->page_flags, tt_cached)) == pool);
> +
> +	if (page_count > 1) {
> +		dev_dbg(pool->dev, "(%s:%d) Attempting to %s %d pages.\n",
> +			pool->name, current->pid,
> +			(is_cached) ?  "destroy" : "recycle", page_count);
> +	}
> +
> +	count = ttm_dma_put_pages_in_pool(pool, pages, page_count, is_cached);
> +
> +	for (i = 0; i < count; i++)
> +		dma_address[i] = 0;
> +
> +	spin_lock_irqsave(&pool->lock, irq_flags);
> +	pool->npages_in_use -= count;
> +	if (is_cached)
> +		pool->nfrees += count;
> +	else
> +		pool->npages_free += count;
> +	spin_unlock_irqrestore(&pool->lock, irq_flags);
> +
> +	page_count -= count;
> +	WARN(page_count != 0,
> +		"Only freed %d page(s) in %s. Could not free the other %d!\n",
> +		count, pool->name, page_count);
> +
> +	page_count = 0;
> +	if (pool->npages_free > _manager->options.max_size) {
> +		page_count = pool->npages_free - _manager->options.max_size;
> +		if (page_count < NUM_PAGES_TO_ALLOC)
> +			page_count = NUM_PAGES_TO_ALLOC;
> +	}
> +	if (page_count)
> +		ttm_dma_page_pool_free(pool, page_count);
> +}
> +
> +/**
> + * Callback for mm to request pool to reduce number of page held.
> + */
> +static int ttm_dma_pool_mm_shrink(struct shrinker *shrink,
> +				  struct shrink_control *sc)
> +{
> +	static atomic_t start_pool = ATOMIC_INIT(0);
> +	unsigned idx = 0;
> +	unsigned pool_offset = atomic_add_return(1, &start_pool);
> +	unsigned shrink_pages = sc->nr_to_scan;
> +	struct device_pools *p;
> +
> +	if (list_empty(&_manager->pools))
> +		return 0;
> +
> +	mutex_lock(&_manager->lock);
> +	pool_offset = pool_offset % _manager->npools;
> +	list_for_each_entry(p, &_manager->pools, pools) {
> +		unsigned nr_free;
> +
> +		if (!p && !p->dev)
> +			continue;
> +		if (shrink_pages == 0)
> +			break;
> +		/* Do it in round-robin fashion. */
> +		if (++idx < pool_offset)
> +			continue;
> +		nr_free = shrink_pages;
> +		shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free);
> +		pr_debug("%s: (%s:%d) Asked to shrink %d, have %d more to
go\n",
> +			p->pool->dev_name, p->pool->name, current->pid, nr_free,
> +			shrink_pages);
> +	}
> +	mutex_unlock(&_manager->lock);
> +	/* return estimated number of unused pages in pool */
> +	return ttm_dma_pool_get_num_unused_pages();
> +}
> +
> +static void ttm_dma_pool_mm_shrink_init(struct ttm_pool_manager *manager)
> +{
> +	manager->mm_shrink.shrink = &ttm_dma_pool_mm_shrink;
> +	manager->mm_shrink.seeks = 1;
> +	register_shrinker(&manager->mm_shrink);
> +}
> +static void ttm_dma_pool_mm_shrink_fini(struct ttm_pool_manager *manager)
> +{
> +	unregister_shrinker(&manager->mm_shrink);
> +}
> +int ttm_dma_page_alloc_init(struct ttm_mem_global *glob,
> +				   unsigned max_pages)
> +{
> +	int ret = -ENOMEM;
> +
> +	WARN_ON(_manager);
> +
> +	printk(KERN_INFO TTM_PFX "Initializing DMA pool allocator.\n");
> +
> +	_manager = kzalloc(sizeof(*_manager), GFP_KERNEL);
> +	if (!_manager)
> +		goto err_manager;
> +
> +	mutex_init(&_manager->lock);
> +	INIT_LIST_HEAD(&_manager->pools);
> +
> +	_manager->options.max_size = max_pages;
> +	_manager->options.small = SMALL_ALLOCATION;
> +	_manager->options.alloc_size = NUM_PAGES_TO_ALLOC;
> +
> +	/* This takes care of auto-freeing the _manager */
> +	ret = kobject_init_and_add(&_manager->kobj,
&ttm_pool_kobj_type,
> +				   &glob->kobj, "dma_pool");
> +	if (unlikely(ret != 0)) {
> +		kobject_put(&_manager->kobj);
> +		goto err;
> +	}
> +	ttm_dma_pool_mm_shrink_init(_manager);
> +	return 0;
> +err_manager:
> +	kfree(_manager);
> +	_manager = NULL;
> +err:
> +	return ret;
> +}
> +void ttm_dma_page_alloc_fini(void)
> +{
> +	struct device_pools *p, *t;
> +
> +	printk(KERN_INFO TTM_PFX "Finalizing DMA pool allocator.\n");
> +	ttm_dma_pool_mm_shrink_fini(_manager);
> +
> +	list_for_each_entry_safe_reverse(p, t, &_manager->pools, pools) {
> +		dev_dbg(p->dev, "(%s:%d) Freeing.\n", p->pool->name,
> +			current->pid);
> +		WARN_ON(devres_destroy(p->dev, ttm_dma_pool_release,
> +			ttm_dma_pool_match, p->pool));
> +		ttm_dma_free_pool(p->dev, p->pool->type);
> +	}
> +	kobject_put(&_manager->kobj);
> +	_manager = NULL;
> +}
> +
> +int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data)
> +{
> +	struct device_pools *p;
> +	struct dma_pool *pool = NULL;
> +	char *h[] = {"pool", "refills", "pages
freed", "inuse", "available",
> +		     "name", "virt", "busaddr"};
> +
> +	if (!_manager) {
> +		seq_printf(m, "No pool allocator running.\n");
> +		return 0;
> +	}
> +	seq_printf(m, "%13s %12s %13s %8s %8s %8s\n",
> +		   h[0], h[1], h[2], h[3], h[4], h[5]);
> +	mutex_lock(&_manager->lock);
> +	list_for_each_entry(p, &_manager->pools, pools) {
> +		struct device *dev = p->dev;
> +		if (!dev)
> +			continue;
> +		pool = p->pool;
> +		seq_printf(m, "%13s %12ld %13ld %8d %8d %8s\n",
> +				pool->name, pool->nrefills,
> +				pool->nfrees, pool->npages_in_use,
> +				pool->npages_free,
> +				pool->dev_name);
> +	}
> +	mutex_unlock(&_manager->lock);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ttm_dma_page_alloc_debugfs);
> +bool ttm_dma_override(struct ttm_backend_func *be)
> +{
> +	if (swiotlb_nr_tbl() && be) {
> +		be->get_pages = &ttm_dma_get_pages;
> +		be->put_pages = &ttm_dma_put_pages;
> +		return true;
> +	}
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(ttm_dma_override);
> diff --git a/include/drm/ttm/ttm_page_alloc.h
b/include/drm/ttm/ttm_page_alloc.h
> index 0aaac39..9c52fb7 100644
> --- a/include/drm/ttm/ttm_page_alloc.h
> +++ b/include/drm/ttm/ttm_page_alloc.h
> @@ -29,6 +29,37 @@
>  #include "ttm_bo_driver.h"
>  #include "ttm_memory.h"
>  
> +#ifdef CONFIG_SWIOTLB
> +extern bool ttm_dma_override(struct ttm_backend_func *be);
> +
> +/**
> + * Initialize pool allocator.
> + */
> +int ttm_dma_page_alloc_init(struct ttm_mem_global *glob, unsigned
max_pages);
> +/**
> + * Free pool allocator.
> + */
> +void ttm_dma_page_alloc_fini(void);
> +/**
> + * Output the state of pools to debugfs file
> + */
> +extern int ttm_dma_page_alloc_debugfs(struct seq_file *m, void *data);
> +#else
> +static inline bool ttm_dma_override(struct ttm_backend_func *be)
> +{
> +	return false;
> +}
> +static inline int ttm_dma_page_alloc_init(struct ttm_mem_global *glob,
> +					  unsigned max_pages)
> +{
> +	return -ENODEV;
> +}
> +static inline void ttm_dma_page_alloc_fini(void) { return; }
> +static inline int ttm_dma_page_alloc_debugfs(struct seq_file *m, void
*data)
> +{
> +	return 0;
> +}
> +#endif
>  /**
>   * Get count number of pages from pool to pages list.
>   *
> -- 
> 1.7.6.4
> 
See comment above, otherwise:
Reviewed-by: Jerome Glisse <jglisse@redhat.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jerome Glisse

2011-Oct-31 22:05 UTC

head link

[Xen-devel] Re: [PATCH] TTM DMA pool v2.1

On Wed, Oct 19, 2011 at 06:19:21PM -0400, Konrad Rzeszutek Wilk
wrote:> [.. and this is what I said in v1 post]:
> 
> Way back in January this patchset:
> http://lists.freedesktop.org/archives/dri-devel/2011-January/006905.html
> was merged in, but pieces of it had to be reverted b/c they did not
> work properly under PowerPC, ARM, and when swapping out pages to disk.
> 
> After a bit of discussion on the mailing list
> http://marc.info/?i=4D769726.2030307@shipmail.org I started working on it,
but
> got waylaid by other things .. and finally I am able to post the RFC
patches.
> 
> There was a lot of discussion about it and I am not sure if I captured
> everybody''s thoughts - if I did not - that is _not_ intentional -
it has just
> been quite some time..
> 
> Anyhow .. the patches explore what the "lib/dmapool.c" does -
which is to have a
> DMA pool that the device has associated with. I kind of married that code
> along with drivers/gpu/drm/ttm/ttm_page_alloc.c to create a TTM DMA pool
code.
> The end result is DMA pool with extra features: can do write-combine,
uncached,
> writeback (and tracks them and sets back to WB when freed); tracks
"cached"
> pages that don''t really need to be returned to a pool; and hooks
up to
> the shrinker code so that the pools can be shrunk.
> 
> If you guys think this set of patches make sense  - my future plans were
>  1) Get this in large crowd of testing .. and if it works for a kernel
release
>  2) to move a bulk of this in the lib/dmapool.c (I spoke with Matthew
Wilcox
>     about it and he is OK as long as I don''t introduce performance
regressions).
> 
> But before I do any of that a second set of eyes taking a look at these
> patches would be most welcome.
> 
> In regards to testing, I''ve been running them non-stop for the
last month.
> (and found some issues which I''ve fixed up) - and been quite happy
with how
> they work.
> 
> Michel (thanks!) took a spin of the patches on his PowerPC and they did not
> cause any regressions (wheew).
> 
> The patches are also located in a git tree:
> 
Reviewed the patch series, looks good, already sent comment on
one of the patch. I am on the same side of Thomas for dma stuff,
lately i have been working on GPU virtual memory address space
and i believe having driver allocating the ttm_tt and merging
more stuff in the backend make sense, after all the backend
has better knowledge on both cache preference and dma mask.

So far my idea is to merge ttm_tt & ttm_backend, simplify the
backend function to bind/unbind/destroy where bind is
responsible to allocate or not a ttm_tt and pages that goes
along with it. I will try to sketch up patches for all this
in next few days.

Reviewed-by: Jerome Glisse <jglisse@redhat.com>

Cheers,
Jerome Glisse

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-Nov-01 13:51 UTC

head link

[Xen-devel] Re: [PATCH 08/11] ttm: Provide DMA aware TTM page pool code.

> > +static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool)
> > +{
> > +	struct dma_page *d_page;
> > +
> > +	d_page = kmalloc(sizeof(struct dma_page), GFP_KERNEL);
> > +	if (!d_page)
> > +		return NULL;
> > +
> > +	d_page->vaddr = dma_alloc_coherent(pool->dev, pool->size,
> > +					   &d_page->dma,
> > +					   pool->gfp_flags);
> > +	d_page->p = virt_to_page(d_page->vaddr);
> > +	if (!d_page->vaddr) {
> > +		kfree(d_page);
> > +		d_page = NULL;
> > +	}
> 
> Move d_page->p = virt_to_page(d_page->vaddr); after if
(!d_page->vaddr)
> block.
Duh! Yes.

.. snip..> > +#if 0
> > +	if (nr_free > 1) {
> > +		pr_debug("%s: (%s:%d) Attempting to free %d (%d)
pages\n",
> > +			pool->dev_name, pool->name, current->pid,
> > +			npages_to_free, nr_free);
> > +	}
> > +#endif
What is your feeling on those #if 0? I was not sure to keep them - they are
useful
when debugging, but not so much during run-time? Rip them out and I can just
keep them in my ''debug'' patch queue in case things go wrong?

Or perhas do it (rip ''em out) in 3 months time-frame?

.. snip..> > +static struct dma_pool *ttm_dma_find_pool(struct device *dev,
> > +					  enum pool_type type)
> > +{
> > +	struct dma_pool *pool, *tmp, *found = NULL;
> > +
> > +	if (type == IS_UNDEFINED)
> > +		return found;
> > +	/* NB: We iterate on the ''struct dev'' which has no
spinlock, but
> > +	 * it does have a kref which we have taken. */
> 
> I fail to see where we kref dev.
Ah, I should document that more extensivly. That is done way way
earlier. As in in the path of the initialization of the driver:

drm_pci_init
  for non-KMS calls pci_dev_get()

  for KMS calls pci_register_driver..
      which calls ''driver_register'' which called
''device_register''

And then during teardown (so unbind on sysfs), it ends up calling the devres
deconstructors which cleans up the ''struct device'' dev_res, -
in our case
ttm_dma_pool_release. However the nice thing is at that point of time all of
the calls to the TTM have quiseced so nobody is calling ttm for this device
anymore.

Let me stick this in the comment section.
> See comment above, otherwise:
> Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Great! Thank you! 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jerome Glisse

2011-Nov-01 14:37 UTC

head link

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

On Sat, Oct 22, 2011 at 11:40:54AM +0200, Thomas Hellstrom
wrote:> Konrad,
> 
> I was hoping that we could get rid of the dma_address shuffling into
> core TTM,
> like I mentioned in the review. From what I can tell it''s now only
> used in the backend and
> core ttm doesn''t care about it.
> 
> Is there a particular reason we''re still passing it around?
> 
> Thanks,
> /Thomas
> 
I am working on patchset on top of this that will move dma handling
back to driver and mostly out of ttm (the page alloc helper will
still do dma stuff on behalf of driver) So if my patchset is acceptable
the dma situation is transionary.

Cheers,
Jerome

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Thomas Hellstrom

2011-Nov-01 14:48 UTC

head link

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

On 11/01/2011 03:37 PM, Jerome Glisse wrote:> On Sat, Oct 22, 2011 at 11:40:54AM +0200, Thomas Hellstrom wrote:
>    
>> Konrad,
>>
>> I was hoping that we could get rid of the dma_address shuffling into
>> core TTM,
>> like I mentioned in the review. From what I can tell it''s now
only
>> used in the backend and
>> core ttm doesn''t care about it.
>>
>> Is there a particular reason we''re still passing it around?
>>
>> Thanks,
>> /Thomas
>>
>>      
> I am working on patchset on top of this that will move dma handling
> back to driver and mostly out of ttm (the page alloc helper will
> still do dma stuff on behalf of driver) So if my patchset is acceptable
> the dma situation is transionary.
>
> Cheers,
> Jerome
>    Cool.

Thanks, Jerome.

/Thomas


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Oct 2011 - [PATCH] TTM DMA pool v2.1

[Xen-devel] [PATCH] TTM DMA pool v2.1

[Xen-devel] [PATCH 01/11] swiotlb: Expose swiotlb_nr_tlb function to modules

[Xen-devel] [PATCH 02/11] nouveau/radeon: Set coherent DMA mask

[Xen-devel] [PATCH 03/11] ttm/radeon/nouveau: Check the DMA address from TTM against known value.

[Xen-devel] [PATCH 04/11] ttm: Wrap ttm_[put|get]_pages and extract GFP_* and caching states from ''struct ttm_tt''

[Xen-devel] [PATCH 05/11] ttm: Get rid of temporary scaffolding

[Xen-devel] [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

[Xen-devel] [PATCH 07/11] ttm: Do not set the ttm->be to NULL before calling the TTM page pool to free pages.

[Xen-devel] [PATCH 08/11] ttm: Provide DMA aware TTM page pool code.

[Xen-devel] [PATCH 09/11] ttm: Add ''no_dma'' parameter to turn the TTM DMA pool off during runtime.

[Xen-devel] [PATCH 10/11] nouveau/ttm/dma: Enable the TTM DMA pool if device can only do 32-bit DMA.

[Xen-devel] [PATCH 11/11] radeon/ttm/dma: Enable the TTM DMA pool if the device can only do 32-bit.

[Xen-devel] Re: [PATCH] TTM DMA pool v2.1

[Xen-devel] Re: [PATCH 01/11] swiotlb: Expose swiotlb_nr_tlb function to modules

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

[Xen-devel] Re: [PATCH 08/11] ttm: Provide DMA aware TTM page pool code.

[Xen-devel] Re: [PATCH] TTM DMA pool v2.1

[Xen-devel] Re: [PATCH 08/11] ttm: Provide DMA aware TTM page pool code.

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.

[Xen-devel] Re: [PATCH 06/11] ttm/driver: Expand ttm_backend_func to include two overrides for TTM page pool.