Jason Gunthorpe
2019-Jun-19 19:27 UTC
[Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken
On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote:> On 6/13/19 5:43 PM, Ira Weiny wrote: > > On Thu, Jun 13, 2019 at 07:58:29PM +0000, Jason Gunthorpe wrote: > >> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote: > >>> > ... > >> Hum, so the only thing this config does is short circuit here: > >> > >> static inline bool is_device_public_page(const struct page *page) > >> { > >> return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && > >> IS_ENABLED(CONFIG_DEVICE_PUBLIC) && > >> is_zone_device_page(page) && > >> page->pgmap->type == MEMORY_DEVICE_PUBLIC; > >> } > >> > >> Which is called all over the place.. > > > > <sigh> yes but the earlier patch: > > > > [PATCH 03/22] mm: remove hmm_devmem_add_resource > > > > Removes the only place type is set to MEMORY_DEVICE_PUBLIC. > > > > So I think it is ok. Frankly I was wondering if we should remove the public > > type altogether but conceptually it seems ok. But I don't see any users of it > > so... should we get rid of it in the code rather than turning the config off? > > > > Ira > > That seems reasonable. I recall that the hope was for those IBM Power 9 > systems to use _PUBLIC, as they have hardware-based coherent device (GPU) > memory, and so the memory really is visible to the CPU. And the IBM team > was thinking of taking advantage of it. But I haven't seen anything on > that front for a while.Does anyone know who those people are and can we encourage them to send some patches? :) Jason
Dan Williams
2019-Jun-19 19:46 UTC
[Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken
On Wed, Jun 19, 2019 at 12:42 PM Jason Gunthorpe <jgg at mellanox.com> wrote:> > On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote: > > On 6/13/19 5:43 PM, Ira Weiny wrote: > > > On Thu, Jun 13, 2019 at 07:58:29PM +0000, Jason Gunthorpe wrote: > > >> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote: > > >>> > > ... > > >> Hum, so the only thing this config does is short circuit here: > > >> > > >> static inline bool is_device_public_page(const struct page *page) > > >> { > > >> return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && > > >> IS_ENABLED(CONFIG_DEVICE_PUBLIC) && > > >> is_zone_device_page(page) && > > >> page->pgmap->type == MEMORY_DEVICE_PUBLIC; > > >> } > > >> > > >> Which is called all over the place.. > > > > > > <sigh> yes but the earlier patch: > > > > > > [PATCH 03/22] mm: remove hmm_devmem_add_resource > > > > > > Removes the only place type is set to MEMORY_DEVICE_PUBLIC. > > > > > > So I think it is ok. Frankly I was wondering if we should remove the public > > > type altogether but conceptually it seems ok. But I don't see any users of it > > > so... should we get rid of it in the code rather than turning the config off? > > > > > > Ira > > > > That seems reasonable. I recall that the hope was for those IBM Power 9 > > systems to use _PUBLIC, as they have hardware-based coherent device (GPU) > > memory, and so the memory really is visible to the CPU. And the IBM team > > was thinking of taking advantage of it. But I haven't seen anything on > > that front for a while. > > Does anyone know who those people are and can we encourage them to > send some patches? :)I expect marking it CONFIG_BROKEN with the threat of deleting it if no patches show up *is* the encouragement.
John Hubbard
2019-Jun-26 03:15 UTC
[Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken
On 6/19/19 12:27 PM, Jason Gunthorpe wrote:> On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote: >> On 6/13/19 5:43 PM, Ira Weiny wrote: >>> On Thu, Jun 13, 2019 at 07:58:29PM +0000, Jason Gunthorpe wrote: >>>> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote: >>>>> >> ... >>> So I think it is ok. Frankly I was wondering if we should remove the public >>> type altogether but conceptually it seems ok. But I don't see any users of it >>> so... should we get rid of it in the code rather than turning the config off? >>> >>> Ira >> >> That seems reasonable. I recall that the hope was for those IBM Power 9 >> systems to use _PUBLIC, as they have hardware-based coherent device (GPU) >> memory, and so the memory really is visible to the CPU. And the IBM team >> was thinking of taking advantage of it. But I haven't seen anything on >> that front for a while. > > Does anyone know who those people are and can we encourage them to > send some patches? :) >I asked about this, and it seems that the idea was: DEVICE_PUBLIC was there in order to provide an alternative way to do things (such as migrate memory to and from a device), in case the combination of existing and near-future NUMA APIs was insufficient. This probably came as a follow-up to the early 2017-ish conversations about NUMA, in which the linux-mm recommendation was "try using HMM mechanisms, and if those are inadequate, then maybe we can look at enhancing NUMA so that it has better handling of advanced (GPU-like) devices". In the end, however, _PUBLIC was never used, nor does anyone in the local (NVIDIA + IBM) kernel vicinity seem to have plans to use it. So it really does seem safe to remove, although of course it's good to start with BROKEN and see if anyone pops up and complains. thanks, -- John Hubbard NVIDIA
Michal Hocko
2019-Jun-26 05:45 UTC
[Nouveau] [PATCH 18/22] mm: mark DEVICE_PUBLIC as broken
On Tue 25-06-19 20:15:28, John Hubbard wrote:> On 6/19/19 12:27 PM, Jason Gunthorpe wrote: > > On Thu, Jun 13, 2019 at 06:23:04PM -0700, John Hubbard wrote: > >> On 6/13/19 5:43 PM, Ira Weiny wrote: > >>> On Thu, Jun 13, 2019 at 07:58:29PM +0000, Jason Gunthorpe wrote: > >>>> On Thu, Jun 13, 2019 at 12:53:02PM -0700, Ralph Campbell wrote: > >>>>> > >> ... > >>> So I think it is ok. Frankly I was wondering if we should remove the public > >>> type altogether but conceptually it seems ok. But I don't see any users of it > >>> so... should we get rid of it in the code rather than turning the config off? > >>> > >>> Ira > >> > >> That seems reasonable. I recall that the hope was for those IBM Power 9 > >> systems to use _PUBLIC, as they have hardware-based coherent device (GPU) > >> memory, and so the memory really is visible to the CPU. And the IBM team > >> was thinking of taking advantage of it. But I haven't seen anything on > >> that front for a while. > > > > Does anyone know who those people are and can we encourage them to > > send some patches? :) > > > > I asked about this, and it seems that the idea was: DEVICE_PUBLIC was there > in order to provide an alternative way to do things (such as migrate memory > to and from a device), in case the combination of existing and near-future > NUMA APIs was insufficient. This probably came as a follow-up to the early > 2017-ish conversations about NUMA, in which the linux-mm recommendation was > "try using HMM mechanisms, and if those are inadequate, then maybe we can > look at enhancing NUMA so that it has better handling of advanced (GPU-like) > devices".Yes that was the original idea. It sounds so much better to use a common framework rather than awkward special cased cpuless NUMA nodes with a weird semantic. User of the neither of the two has shown up so I guess that the envisioned HW just didn't materialized. Or has there been a completely different approach chosen?> In the end, however, _PUBLIC was never used, nor does anyone in the local > (NVIDIA + IBM) kernel vicinity seem to have plans to use it. So it really > does seem safe to remove, although of course it's good to start with > BROKEN and see if anyone pops up and complains.Well, I do not really see much of a difference. Preserving an unused code which doesn't have any user in sight just adds a maintenance burden whether the code depends on BROKEN or not. We can always revert patches which remove the code once a real user shows up. -- Michal Hocko SUSE Labs