Vivek Goyal
2021-Dec-14 14:22 UTC
[PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote:> On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote: > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal at redhat.com> wrote: > > > Going forward, I am wondering should virtiofs use flushcache version as > > > well. What if host filesystem is using DAX and mapping persistent memory > > > pfn directly into qemu address space. I have never tested that. > > > > > > Right now we are relying on applications to do fsync/msync on virtiofs > > > for data persistence. > > > > This sounds like it would need coordination with a paravirtualized > > driver that can indicate whether the host side is pmem or not, like > > the virtio_pmem driver. However, if the guest sends any fsync/msync > > you would still need to go explicitly cache flush any dirty page > > because you can't necessarily trust that the guest did that already. > > Do we? The application can't really know what backend it is on, so > it sounds like the current virtiofs implementation doesn't really, does it?Agreed that application does not know what backend it is on. So virtiofs just offers regular posix API where applications have to do fsync/msync for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent memory programming model on virtiofs. That's not the expectation. DAX is used only to bypass guest page cache. With this assumption, I think we might not have to use flushcache version at all even if shared filesystem is on persistent memory on host. - We mmap() host files into qemu address space. So any dax store in virtiofs should make corresponding pages dirty in page cache on host and when and fsync()/msync() comes later, it should flush all the data to PMEM. - In case of file extending writes, virtiofs falls back to regular FUSE_WRITE path (and not use DAX), and in that case host pmem driver should make sure writes are flushed to pmem immediately. Are there any other path I am missing. If not, looks like we might not have to use flushcache version in virtiofs at all as long as we are not offering guest applications user space flushes and MAP_SYNC support. We still might have to use machine check safe variant though as loads might generate synchronous machine check. What's not clear to me is that if this MC safe variant should be used only in case of PMEM or should it be used in case of non-PMEM as well. Vivek
Dan Williams
2021-Dec-14 16:41 UTC
[PATCH 4/5] dax: remove the copy_from_iter and copy_to_iter methods
On Tue, Dec 14, 2021 at 6:23 AM Vivek Goyal <vgoyal at redhat.com> wrote:> > On Mon, Dec 13, 2021 at 09:23:18AM +0100, Christoph Hellwig wrote: > > On Sun, Dec 12, 2021 at 06:44:26AM -0800, Dan Williams wrote: > > > On Fri, Dec 10, 2021 at 6:17 AM Vivek Goyal <vgoyal at redhat.com> wrote: > > > > Going forward, I am wondering should virtiofs use flushcache version as > > > > well. What if host filesystem is using DAX and mapping persistent memory > > > > pfn directly into qemu address space. I have never tested that. > > > > > > > > Right now we are relying on applications to do fsync/msync on virtiofs > > > > for data persistence. > > > > > > This sounds like it would need coordination with a paravirtualized > > > driver that can indicate whether the host side is pmem or not, like > > > the virtio_pmem driver. However, if the guest sends any fsync/msync > > > you would still need to go explicitly cache flush any dirty page > > > because you can't necessarily trust that the guest did that already. > > > > Do we? The application can't really know what backend it is on, so > > it sounds like the current virtiofs implementation doesn't really, does it? > > Agreed that application does not know what backend it is on. So virtiofs > just offers regular posix API where applications have to do fsync/msync > for data persistence. No support for mmap(MAP_SYNC). We don't offer persistent > memory programming model on virtiofs. That's not the expectation. DAX > is used only to bypass guest page cache. > > With this assumption, I think we might not have to use flushcache version > at all even if shared filesystem is on persistent memory on host. > > - We mmap() host files into qemu address space. So any dax store in virtiofs > should make corresponding pages dirty in page cache on host and when > and fsync()/msync() comes later, it should flush all the data to PMEM. > > - In case of file extending writes, virtiofs falls back to regular > FUSE_WRITE path (and not use DAX), and in that case host pmem driver > should make sure writes are flushed to pmem immediately. > > Are there any other path I am missing. If not, looks like we might not > have to use flushcache version in virtiofs at all as long as we are not > offering guest applications user space flushes and MAP_SYNC support. > > We still might have to use machine check safe variant though as loads > might generate synchronous machine check. What's not clear to me is > that if this MC safe variant should be used only in case of PMEM or > should it be used in case of non-PMEM as well.It should be used on any memory address that can throw exception on load, which is any physical address, in paths that can tolerate memcpy() returning an error code, most I/O paths, and can tolerate slower copy performance on older platforms that do not support MC recovery with fast string operations, to date that's only PMEM users.