On 8/17/21 10:08 PM, Miklos Szeredi wrote:> On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu at
linux.alibaba.com> wrote:
>>
>>
>>
>> On 8/17/21 8:39 PM, Vivek Goyal wrote:
>>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote:
>>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu at
linux.alibaba.com> wrote:
>>>>>
>>>>> This patchset adds support of per-file DAX for virtiofs,
which is
>>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2].
>>>>
>>>> Can you please explain the background of this change in detail?
>>>>
>>>> Why would an admin want to enable DAX for a particular virtiofs
file
>>>> and not for others?
>>>
>>> Initially I thought that they needed it because they are
downloading
>>> files on the fly from server. So they don't want to enable dax
on the file
>>> till file is completely downloaded.
>>
>> Right, it's our initial requirement.
>>
>>
>>> But later I realized that they should
>>> be able to block in FUSE_SETUPMAPPING call and make sure associated
>>> file section has been downloaded before returning and solve the
problem.
>>> So that can't be the primary reason.
>>
>> Saying we want to access 4KB of one file inside guest, if it goes
>> through FUSE request routine, then the fuse daemon only need to
download
>> this 4KB from remote server. But if it goes through DAX, then the fuse
>> daemon need to download the whole DAX window (e.g., 2MB) from remote
>> server, so called amplification. Maybe we could decrease the DAX window
>> size, but it's a trade off.
>
> That could be achieved with a plain fuse filesystem on the host (which
> will get 4k READ requests for accesses to mapped area inside guest).
> Since this can be done selectively for files which are not yet
> downloaded, the extra layer wouldn't be a performance problem.
>
> Is there a reason why that wouldn't work?
I didn't realize this mechanism (working around from user space) before
sending this patch set.
After learning the virtualization and KVM stuffs, I find that, as Vivek
Goyal replied in [1], virtiofsd/qemu need to somehow hook the user page
fault and then download the remained part.
IMHO, this mechanism (as you proposed by implementing a plain fuse
filesystem on the host) seems a little bit sophisticated so far.
[1] https://lore.kernel.org/linux-fsdevel/YR08KnP8cO8LjKY7 at redhat.com/
--
Thanks,
Jeffle