Dan Williams
2021-Dec-12 14:44 UTC
[PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal at redhat.com> wrote:> > On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote: > > These methods indirect the actual DAX read/write path. In the end pmem > > uses magic flush and mc safe variants and fuse and dcssblk use plain ones > > while device mapper picks redirects to the underlying device. > > > > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these > > special variants, then use them everywhere as they fall back to the plain > > ones on s390 anyway and remove an indirect call from the read/write path > > as well as a lot of boilerplate code. > > > > Signed-off-by: Christoph Hellwig <hch at lst.de> > > --- > > drivers/dax/super.c | 36 ++++++++++++++-- > > drivers/md/dm-linear.c | 20 --------- > > drivers/md/dm-log-writes.c | 80 ----------------------------------- > > drivers/md/dm-stripe.c | 20 --------- > > drivers/md/dm.c | 50 ---------------------- > > drivers/nvdimm/pmem.c | 20 --------- > > drivers/s390/block/dcssblk.c | 14 ------ > > fs/dax.c | 5 --- > > fs/fuse/virtio_fs.c | 19 +-------- > > include/linux/dax.h | 9 ++-- > > include/linux/device-mapper.h | 4 -- > > 11 files changed, 37 insertions(+), 240 deletions(-) > > > > [..] > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > > index 5c03a0364a9bb..754319ce2a29b 100644 > > --- a/fs/fuse/virtio_fs.c > > +++ b/fs/fuse/virtio_fs.c > > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, > > return nr_pages > max_nr_pages ? max_nr_pages : nr_pages; > > } > > > > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev, > > - pgoff_t pgoff, void *addr, > > - size_t bytes, struct iov_iter *i) > > -{ > > - return copy_from_iter(addr, bytes, i); > > -} > > - > > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev, > > - pgoff_t pgoff, void *addr, > > - size_t bytes, struct iov_iter *i) > > -{ > > - return copy_to_iter(addr, bytes, i); > > -} > > - > > static int virtio_fs_zero_page_range(struct dax_device *dax_dev, > > pgoff_t pgoff, size_t nr_pages) > > { > > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev, > > > > static const struct dax_operations virtio_fs_dax_ops = { > > .direct_access = virtio_fs_direct_access, > > - .copy_from_iter = virtio_fs_copy_from_iter, > > - .copy_to_iter = virtio_fs_copy_to_iter, > > .zero_page_range = virtio_fs_zero_page_range, > > }; > > > > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) > > fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops); > > if (IS_ERR(fs->dax_dev)) > > return PTR_ERR(fs->dax_dev); > > - > > + set_dax_cached(fs->dax_dev); > > Looks good to me from virtiofs point of view. > > Reviewed-by: Vivek Goyal <vgoyal at redhat.com> > > Going forward, I am wondering should virtiofs use flushcache version as > well. What if host filesystem is using DAX and mapping persistent memory > pfn directly into qemu address space. I have never tested that. > > Right now we are relying on applications to do fsync/msync on virtiofs > for data persistence.This sounds like it would need coordination with a paravirtualized driver that can indicate whether the host side is pmem or not, like the virtio_pmem driver. However, if the guest sends any fsync/msync you would still need to go explicitly cache flush any dirty page because you can't necessarily trust that the guest did that already.> > > + set_dax_nomcsafe(fs->dax_dev); > > return devm_add_action_or_reset(&vdev->dev, virtio_fs_cleanup_dax, > > fs->dax_dev); > > } > > Thanks > Vivek > >
Christoph Hellwig
2021-Dec-13 08:23 UTC
[PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:> On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal at redhat.com> wrote: > > Going forward, I am wondering should virtiofs use flushcache version as > > well. What if host filesystem is using DAX and mapping persistent memory > > pfn directly into qemu address space. I have never tested that. > > > > Right now we are relying on applications to do fsync/msync on virtiofs > > for data persistence. > > This sounds like it would need coordination with a paravirtualized > driver that can indicate whether the host side is pmem or not, like > the virtio_pmem driver. However, if the guest sends any fsync/msync > you would still need to go explicitly cache flush any dirty page > because you can't necessarily trust that the guest did that already.Do we? The application can't really know what backend it is on, so it sounds like the current virtiofs implementation doesn't really, does it?
Vivek Goyal
2021-Dec-13 16:17 UTC
[PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote:> On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal at redhat.com> wrote: > > > > On Thu, Dec 09, 2021 at 07:38:27AM +0100, Christoph Hellwig wrote: > > > These methods indirect the actual DAX read/write path. In the end pmem > > > uses magic flush and mc safe variants and fuse and dcssblk use plain ones > > > while device mapper picks redirects to the underlying device. > > > > > > Add set_dax_virtual() and set_dax_nomcsafe() APIs for fuse to skip these > > > special variants, then use them everywhere as they fall back to the plain > > > ones on s390 anyway and remove an indirect call from the read/write path > > > as well as a lot of boilerplate code. > > > > > > Signed-off-by: Christoph Hellwig <hch at lst.de> > > > --- > > > drivers/dax/super.c | 36 ++++++++++++++-- > > > drivers/md/dm-linear.c | 20 --------- > > > drivers/md/dm-log-writes.c | 80 ----------------------------------- > > > drivers/md/dm-stripe.c | 20 --------- > > > drivers/md/dm.c | 50 ---------------------- > > > drivers/nvdimm/pmem.c | 20 --------- > > > drivers/s390/block/dcssblk.c | 14 ------ > > > fs/dax.c | 5 --- > > > fs/fuse/virtio_fs.c | 19 +-------- > > > include/linux/dax.h | 9 ++-- > > > include/linux/device-mapper.h | 4 -- > > > 11 files changed, 37 insertions(+), 240 deletions(-) > > > > > > > [..] > > > diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c > > > index 5c03a0364a9bb..754319ce2a29b 100644 > > > --- a/fs/fuse/virtio_fs.c > > > +++ b/fs/fuse/virtio_fs.c > > > @@ -753,20 +753,6 @@ static long virtio_fs_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, > > > return nr_pages > max_nr_pages ? max_nr_pages : nr_pages; > > > } > > > > > > -static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev, > > > - pgoff_t pgoff, void *addr, > > > - size_t bytes, struct iov_iter *i) > > > -{ > > > - return copy_from_iter(addr, bytes, i); > > > -} > > > - > > > -static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev, > > > - pgoff_t pgoff, void *addr, > > > - size_t bytes, struct iov_iter *i) > > > -{ > > > - return copy_to_iter(addr, bytes, i); > > > -} > > > - > > > static int virtio_fs_zero_page_range(struct dax_device *dax_dev, > > > pgoff_t pgoff, size_t nr_pages) > > > { > > > @@ -783,8 +769,6 @@ static int virtio_fs_zero_page_range(struct dax_device *dax_dev, > > > > > > static const struct dax_operations virtio_fs_dax_ops = { > > > .direct_access = virtio_fs_direct_access, > > > - .copy_from_iter = virtio_fs_copy_from_iter, > > > - .copy_to_iter = virtio_fs_copy_to_iter, > > > .zero_page_range = virtio_fs_zero_page_range, > > > }; > > > > > > @@ -853,7 +837,8 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) > > > fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops); > > > if (IS_ERR(fs->dax_dev)) > > > return PTR_ERR(fs->dax_dev); > > > - > > > + set_dax_cached(fs->dax_dev); > > > > Looks good to me from virtiofs point of view. > > > > Reviewed-by: Vivek Goyal <vgoyal at redhat.com> > > > > Going forward, I am wondering should virtiofs use flushcache version as > > well. What if host filesystem is using DAX and mapping persistent memory > > pfn directly into qemu address space. I have never tested that. > > > > Right now we are relying on applications to do fsync/msync on virtiofs > > for data persistence. > > This sounds like it would need coordination with a paravirtualized > driver that can indicate whether the host side is pmem or not, like > the virtio_pmem driver.Agreed. Let me check the details of virtio_pmem driver.> However, if the guest sends any fsync/msync > you would still need to go explicitly cache flush any dirty page > because you can't necessarily trust that the guest did that already.So host dax functionality will already take care of that, IIUC, right? I see a dax_flush() call in dax_writeback_one(). I am assuming that's the will take care of flushing dirty pages when guest issues fsync()/msync(). So probably don't have to do anything extra here. I think qemu should map files using MAP_SYNC though in this case though. Any read/writes to virtiofs files will turn into host file load/store operations. So flushcache in guest makes more sense with MAP_SYNC which should make sure any filesystem metadata will already persist after fault completion. And later guest can do writes followed by flush and ensure data persists too. IOW, I probably only need to do following. - In virtiofs virtual device, add a notion of kind of dax window or memory it supports. So may be some kind of "writethrough" property of virtiofs dax cache. - Use this property in virtiofs driver to decide whether to use plain copy_from_iter() or _copy_from_iter_flushcache(). - qemu should use mmap(MAP_SYNC) if host filesystem is on persistent memory. Thanks Vivek