Hey, I was looking at the wiki ''unclaimed projects'' list and I thought I''d take a look at what might be needed to use the swap-over-n{fs,bd} functionality in order to implement swapfile support. It *looks* like it''s surprisingly uncomplicated - to the point where I suspect I''m missing something, and would like some sanity checking. My primary references for this are the patch[1] implementing swap-over-NFS and HEAD of cmason''s for-linus branch. I''m going to go over the sections of the commitdiff from top to bottom (and hunk-by-hunk where I feel it necessary), and I''d like it if people could correct any mistakes I make. Kconfig: Simple enough that I''m not worried. direct.c: The first hunk seems to serve to factor out the difference between rw={READ,WRITE} and the KERNEL_ equivalents into a ''uio'' parameter, so I''ll view it as being mostly a code organization thing. The second hunk just adds that uio param to read_schedule_segment(), which it is given secondhand via direct_read(). The third hunk seems to be where the actual reading occurs in read_schedule_segment(), and the differences between uio and !uio seem to boil down to that KERNEL_READ is only allowed to operate one page at a time under penalty of BUG, and that it uses get_kernel_page instead of get_user_pages. Since btrfs calls into __blockdev_direct_IO(), I suspect that fs/direct-io.c would (in this case) be the right place to implement both the {READ,WRITE} vs KERNEL_* logic and those checks in this case. Also, check_direct_IO() in fs/btrfs/inode.c may want to treat KERNEL_WRITE similarly to WRITE - I''m not completely sure of that, however. Hunks 4-8 are just dealing with passing the additional uio parameter around. Hunk 9 is the write-side equivalent of hunk 3, and seems to correspond pretty exactly. It''s looking like fs/direct_io.c is definitely the place to implement this. Hunks 10-17 are more uio parameter passing. file.c: Pretty simple - as far as I can tell, the activate() sets the span to what the swap header expects, the xs_swapper call basically sets the appropriate kmalloc flags for the network stuff (irrelevant here), and the deactivate() just resets those flags. All the smarts seem to be in the DIO code. The rest: The remainder of the patch seems to just be a.) handing that uio parameter around and b.) stuff specific to swapping over the network. Did I miss anything important? If not, I may well decide to tackle this as my first non-trivial kernel contribution. Does anyone have any suggestions on how I should stresstest this? I figure a good starting point would be running xfstests under memory pressure with such a swapfile. [1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;h=a564b8f -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Feb 26, 2013 at 01:45:12PM -0800, Alex Elsayed wrote:> Hey, I was looking at the wiki ''unclaimed projects'' list and I thought I''d > take a look at what might be needed to use the swap-over-n{fs,bd} > functionality in order to implement swapfile support. It *looks* like it''s > surprisingly uncomplicated - to the point where I suspect I''m missing > something, and would like some sanity checking. > > My primary references for this are the patch[1] implementing swap-over-NFS > and HEAD of cmason''s for-linus branch.The patch [1] is not enough to cook swap support for btrfs, have look at the rest of the series. There''s some explanation how the interfaces look like. The read/write mechanism is done via direct_IO, but the filesystem has to add the swap_activate and swap_deactivate operations and take care of preallocating the swap file extents. Details: Analogy of these patches has to be done in btrfs as well: d56b4ddf7781e nfs: teach the NFS client how to treat PG_swapcache pages 29418aa4bd487 nfs: disable data cache revalidation for swapfiles The patch you''ve analyzed in detail adds some nfs-specific bits, not the core work (and thus I don''t disagree that it looks surprisingly uncomplicated :) The generic swap support is demonstrated in a509bc1a9e48 mm: swap: implement generic handler for swap_activate and in case of btrfs it has to avoid using bmap. Thanks for tackling this project, I think it''s not an easy one. There are implications and API requirements that need to be taken into account wrt snapshots and stable block ranges during the file lifetime. I have a few commetns about that from Mel and can share them if you still want to proceed with this project. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba wrote:> On Tue, Feb 26, 2013 at 01:45:12PM -0800, Alex Elsayed wrote: >> Hey, I was looking at the wiki ''unclaimed projects'' list and I thought >> I''d take a look at what might be needed to use the swap-over-n{fs,bd} >> functionality in order to implement swapfile support. It *looks* like >> it''s surprisingly uncomplicated - to the point where I suspect I''m >> missing something, and would like some sanity checking. >> >> My primary references for this are the patch[1] implementing >> swap-over-NFS and HEAD of cmason''s for-linus branch. > > The patch [1] is not enough to cook swap support for btrfs, have look at > the rest of the series. There''s some explanation how the interfaces look > like. The read/write mechanism is done via direct_IO, but the filesystem > has to add the swap_activate and swap_deactivate operations and take > care of preallocating the swap file extents. > > Details: > > Analogy of these patches has to be done in btrfs as well: > > d56b4ddf7781e nfs: teach the NFS client how to treat PG_swapcachepages> 29418aa4bd487 nfs: disable data cache revalidation for swapfiles > > The patch you''ve analyzed in detail adds some nfs-specific bits, not the > core work (and thus I don''t disagree that it looks surprisingly > uncomplicated :) > > The generic swap support is demonstrated in > > a509bc1a9e48 mm: swap: implement generic handler forswap_activate> > and in case of btrfs it has to avoid using bmap. > > Thanks for tackling this project, I think it''s not an easy one. There > are implications and API requirements that need to be taken into account > wrt snapshots and stable block ranges during the file lifetime. I have > a few commetns about that from Mel and can share them if you still want > to proceed with this project. > > davidThanks for the information! I probably will keep going on this, since at this point my disk basically boils down to an EFI system partition and a LUKS volume containing swap and btrfs over LVM. Being able to dwap the swap partition would let me drop LVM, and simplify my setup a little (along with letting me omit the LVM module from dracut). It will, however, likely take a while - I''m not particularly skilled with C, and the kernel is still slightly magic in places, but I''m stubborn and pick up on patterns reasonably well so hopefully it''ll get done eventually. Also, in talking on IRC with Hugo Mills he helped me realize another issue - specifically, that a.) DIO to compressed data falls back to buffered IO, and b.) (at least according to the wiki) the compress_force mount option takes precedence over the NOCOMPRESS file flag. Thinking about it further, I realized that individual extents could be compressed, at which point this could get rather onerous on swapon since we might well need to check each and every extent to make sure it isn''t compressed. Any suggestions? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 27, 2013 at 12:52:50PM -0800, Alex Elsayed wrote:> Also, in talking on IRC with Hugo Mills he helped me realize another issue - > specifically, that a.) DIO to compressed data falls back to buffered IO, and > b.) (at least according to the wiki) the compress_force mount option takes > precedence over the NOCOMPRESS file flag. > > Thinking about it further, I realized that individual extents could be > compressed, at which point this could get rather onerous on swapon since we > might well need to check each and every extent to make sure it isn''t > compressed.There are a few limitations I can think of: * the file has to be nocow (stable block ranges) * no compression for the swap file (due to nocow) * snapshot of the swapfile would work probably the same way as in any other nocow file, but may imply unexpected or unwanted locking and interfering with swap. * relocation must not move the file extents, which makes balance still possible, but it may not be able to satisfy the filter requirements (the 1G chunks are basically pinned until the swap file is active) - this affects ''device remove'', ''fi resize'' - depends on the raid profile used for the file, raid 0 is likely to pin all devices david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html