Mark Fasheh
2011-May-13 23:18 UTC
[RFC][PATCH 0/2] btrfs/vfs: Return same device in stat(2) and /proc/pid/maps
I originally posted about this issue some weeks ago but unfortunately didn''t get a response: http://comments.gmane.org/gmane.comp.file-systems.btrfs/9682 To recap: btrfs_getattr() is filling stat->dev with an anonymous device (for per-snapshot root?): stat->dev = BTRFS_I(inode)->root->anon_super.s_dev; but /proc/pid/maps uses the "real" block device: dev = inode->i_sb->s_dev; This results in some unfortunate behavior for lsof as it reports some duplicate paths (except on different block devices). The easiest way to see this (if your root partition is btrfs): $ lsof | grep lsof <snip> lsof 9238 root txt REG 0,19 139736 14478 /usr/bin/lsof lsof 9238 root mem REG 0,17 14478 /usr/bin/lsof (path dev=0,19) As was noted in my original mail, this winds up breaking userspace software (zypper in my case). The following patch series fixes this issue by allowing a file system to supply a custom method for the device reported in /proc/pid/maps. It fixes the issue for me, but I''m not 100% sold on the approach taken - it''s an admittedly brute-force solution hence the RFC tag. Any comments are appreciated and welcome. Thanks, --Mark -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2011-May-13 23:18 UTC
[RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device
This patch introduces a callback in the super_operations structure, ''get_maps_dev'' which is then used by procfs to query which device to return for reporting in /proc/[PID]/maps. btrfs wants this so that it can return the same device as it uses from stat(2) calls. Signed-off-by: Mark Fasheh <mfasheh@suse.com> --- fs/proc/task_mmu.c | 2 +- fs/proc/task_nommu.c | 2 +- include/linux/fs.h | 8 ++++++++ 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 2e7addf..c0acc19 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -220,7 +220,7 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma) if (file) { struct inode *inode = vma->vm_file->f_path.dentry->d_inode; - dev = inode->i_sb->s_dev; + dev = get_maps_dev(inode); ino = inode->i_ino; pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; } diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 980de54..6769efe 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -148,7 +148,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma) if (file) { struct inode *inode = vma->vm_file->f_path.dentry->d_inode; - dev = inode->i_sb->s_dev; + dev = get_maps_dev(inode); ino = inode->i_ino; pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT; } diff --git a/include/linux/fs.h b/include/linux/fs.h index dbd860a..ad683ad 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1637,8 +1637,16 @@ struct super_operations { ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); #endif int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); + dev_t (*get_maps_dev)(struct inode *); }; +static inline dev_t get_maps_dev(struct inode *inode) +{ + if (inode->i_sb->s_op->get_maps_dev) + return inode->i_sb->s_op->get_maps_dev(inode); + return inode->i_sb->s_dev; +} + /* * Inode state bits. Protected by inode->i_lock * -- 1.6.4.2 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric W. Biederman
2011-May-15 03:06 UTC
Re: [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device
Mark Fasheh <mfasheh@suse.com> writes:> This patch introduces a callback in the super_operations structure, > ''get_maps_dev'' which is then used by procfs to query which device to return > for reporting in /proc/[PID]/maps.No. It may make sense to call the vfs stat method. But introducing an extra vfs operations for this seems like a maintenance nightmare. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2011-May-19 20:18 UTC
Re: [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device
On Sat, May 14, 2011 at 08:06:04PM -0700, Eric W. Biederman wrote:> Mark Fasheh <mfasheh@suse.com> writes: > > > This patch introduces a callback in the super_operations structure, > > ''get_maps_dev'' which is then used by procfs to query which device to return > > for reporting in /proc/[PID]/maps. > > No. > > It may make sense to call the vfs stat method. But introducing an extra > vfs operations for this seems like a maintenance nightmare.Yeah I''m not thrilled with the extra method either. My concern with using ->getattr is whether it''s too heavy since that implies potential disk / network i/o. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2011-Oct-07 18:32 UTC
Re: [RFC][PATCH 1/2] vfs: allow /proc/pid/maps to return a custom device
Ressurecting an old thread, sorry. Here''s the conversation thus far: http://www.spinics.net/lists/linux-btrfs/msg10099.html This is still hitting folks wishing to use btrfs on suse based systems. Using getattr() (unconditionally I might add) is possible, but will affect performance to a far greater degree than just allowing an optional deref of whatever structure btrfs (and similar file systems) have to point to the right block device. Is this really the way we would like to proceed? Chris, maybe you can chime in here? --Mark On Thu, May 19, 2011 at 01:18:26PM -0700, Mark Fasheh wrote:> On Sat, May 14, 2011 at 08:06:04PM -0700, Eric W. Biederman wrote: > > Mark Fasheh <mfasheh@suse.com> writes: > > > > > This patch introduces a callback in the super_operations structure, > > > ''get_maps_dev'' which is then used by procfs to query which device to return > > > for reporting in /proc/[PID]/maps. > > > > No. > > > > It may make sense to call the vfs stat method. But introducing an extra > > vfs operations for this seems like a maintenance nightmare. > > Yeah I''m not thrilled with the extra method either. My concern with using > ->getattr is whether it''s too heavy since that implies potential disk / > network i/o. > --Mark-- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html