Andrew Vagin
2013-Jul-04 09:51 UTC
btrfs: stat(2) and /proc/pid/maps returns different devices
Hi All, I want to resurrect an old problem. Currently stat(2) returns another device than other places where the device is printed (/proc/pid/maps, /proc/pid/fdinfo/, unix-diag). stat(2) reports devices, which is absent in /proc/pid/mountinfo. # cat /proc/self/mountinfo | grep mnt 40 32 0:32 / /mnt rw,relatime shared:27 - btrfs /dev/loop0 rw,noacl,space_cache # cat /proc/2943/maps 00400000-00406000 r-xp 00000000 00:20 260 /mnt/xxx/sleep # stat -L -c "%D" /proc/2943/map_files/400000-406000 23 We are not first who suffer from this problem: https://bugzilla.redhat.com/show_bug.cgi?id=711881 http://marc.info/?l=linux-btrfs&m=130074451403261 https://bugzilla.openvz.org/show_bug.cgi?id=2653 This bug looks like KABI violation. And about 2 years ago Mark Fasheh tried to fix this problem: http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps Eric Biederman sugested to not create a new method and use vfs_getattr, but here is a few problems: * fanotify doesn''t have dentry, but its fdinfo contains device. * vfs_getattr can fail and which device should be shown in this case? * vfs_getattr gets much more parameters, so here is a question about performance degradation. So I have a question: Can two inodes from different subvolumes have equal inode numbers? If someone have any suggestions how to fix this problem or any explanation why this is not a problem at all, please write here. Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrey Wagin
2013-Jul-05 08:06 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
2013/7/4 Andrew Vagin <avagin@gmail.com>:> Hi All, > > I want to resurrect an old problem. Currently stat(2) returns another > device than other places where the device is printed (/proc/pid/maps, > /proc/pid/fdinfo/, unix-diag). stat(2) reports devices, which is absent > in /proc/pid/mountinfo.Hi Chris, I found that this issue was introduced by yours changes: commit 3394e1607eaf870ebba37d303fbd590a4c569908 Author: Chris Mason <chris.mason@oracle.com> Date: Mon Nov 17 20:42:26 2008 -0500 Btrfs: Give each subvol and snapshot their own anonymous devid Now we need to find a way how to fix other places (not only stat(2)). Could you look at Mark Fasheh''s fix and my points, why we can''t use vfs_getattr in this case? Maybe you will have any ideas or comments about this issue. Thanks> > # cat /proc/self/mountinfo | grep mnt > 40 32 0:32 / /mnt rw,relatime shared:27 - btrfs /dev/loop0 rw,noacl,space_cache > > # cat /proc/2943/maps > 00400000-00406000 r-xp 00000000 00:20 260 /mnt/xxx/sleep > > # stat -L -c "%D" /proc/2943/map_files/400000-406000 > 23 > > We are not first who suffer from this problem: > https://bugzilla.redhat.com/show_bug.cgi?id=711881 > http://marc.info/?l=linux-btrfs&m=130074451403261 > https://bugzilla.openvz.org/show_bug.cgi?id=2653 > > This bug looks like KABI violation. > > And about 2 years ago Mark Fasheh tried to fix this problem: > http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps > > Eric Biederman sugested to not create a new method and use vfs_getattr, > but here is a few problems: > * fanotify doesn''t have dentry, but its fdinfo contains device. > * vfs_getattr can fail and which device should be shown in this case? > * vfs_getattr gets much more parameters, so here is a question about > performance degradation. > > So I have a question: Can two inodes from different subvolumes have > equal inode numbers? > > If someone have any suggestions how to fix this problem or any > explanation why this is not a problem at all, please write here. > > Thanks.-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Jul-08 21:54 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Thu, Jul 04, 2013 at 01:51:38PM +0400, Andrew Vagin wrote:> We are not first who suffer from this problem: > https://bugzilla.redhat.com/show_bug.cgi?id=711881 > http://marc.info/?l=linux-btrfs&m=130074451403261 > https://bugzilla.openvz.org/show_bug.cgi?id=2653> And about 2 years ago Mark Fasheh tried to fix this problem: > http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-maps > > Eric Biederman sugested to not create a new method and use vfs_getattr, > but here is a few problems: > * fanotify doesn''t have dentry, but its fdinfo contains device. > * vfs_getattr can fail and which device should be shown in this case? > * vfs_getattr gets much more parameters, so here is a question about > performance degradation. > > So I have a question: Can two inodes from different subvolumes have > equal inode numbers?Yes, subvolumes are separate inode number spaces.> If someone have any suggestions how to fix this problem or any > explanation why this is not a problem at all, please write here.The xstat syscall instead of the potentially heavyweight vfs_getattr could fix that, but it''s not merged. For suse kernels we''ve taken the hackish approach of patching fs/proc/task_mmu.c:show_map_vma() (and the nommu variant) and use vfs_getattr only for btrfs. http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/btrfs-use-correct-device-for-maps.patch?id=2434fa6ee93a83b117461eb13f24272606677fec Only a temporary and not upstreamable solution, but without it the core packaging tool zypper would not work correctly. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2013-Jul-10 16:31 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Mon, Jul 08, 2013 at 11:54:46PM +0200, David Sterba wrote:> On Thu, Jul 04, 2013 at 01:51:38PM +0400, Andrew Vagin wrote: > > We are not first who suffer from this problem: > > https://bugzilla.redhat.com/show_bug.cgi?id=711881 > > http://marc.info/?l=linux-btrfs&m=130074451403261 > > https://bugzilla.openvz.org/show_bug.cgi?id=2653 > > > And about 2 years ago Mark Fasheh tried to fix this problem: > > http://thr3ads.net/btrfs-devel/2011/05/2346176-RFC-PATCH-0-2-btrfs-vfs-Return-same-device-in-stat-2-and-proc-pid-mapsAnd basically nobody cared :/> > Eric Biederman sugested to not create a new method and use vfs_getattr, > > but here is a few problems: > > * fanotify doesn''t have dentry, but its fdinfo contains device. > > * vfs_getattr can fail and which device should be shown in this case? > > * vfs_getattr gets much more parameters, so here is a question about > > performance degradation. > > > > So I have a question: Can two inodes from different subvolumes have > > equal inode numbers? > > Yes, subvolumes are separate inode number spaces. > > > If someone have any suggestions how to fix this problem or any > > explanation why this is not a problem at all, please write here. > > The xstat syscall instead of the potentially heavyweight vfs_getattr > could fix that, but it''s not merged. For suse kernels we''ve taken the > hackish approach of patching fs/proc/task_mmu.c:show_map_vma() (and the > nommu variant) and use vfs_getattr only for btrfs. > > http://kernel.opensuse.org/cgit/kernel-source/tree/patches.suse/btrfs-use-correct-device-for-maps.patch?id=2434fa6ee93a83b117461eb13f24272606677fec > > Only a temporary and not upstreamable solution, but without it the core > packaging tool zypper would not work correctly.As far as I can tell we''ll be carrying this patch until a better solution is possible. When that will happen, I don''t know. --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2013-Jul-10 17:45 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Wed, Jul 10, 2013 at 09:31:05AM -0700, Mark Fasheh wrote:> As far as I can tell we''ll be carrying this patch until a better > solution is possible. > > When that will happen, I don''t know. > --MarkWell, what do I get when I pretend I don''t care any more? The little voice in my head says "keep plugging away". Here''s another attempt at fixing this problem in a sane manner. Basically, this time we''re adding a flag to s_flags which btrfs sets. Proc will see the flag and call ->getattr(). This compiles, but it needs testing (which I will get to soon). It still has a bunch of problems in my honest opinion but maybe if we get something acceptable upstream we can work from there. Also, as Andrew pointed out there''s more than one place which is return different device than from stat(2) so I probably need to update more sites to deal with this. Does anyone see a problem with this approach? --Mark -- Mark Fasheh From: Mark Fasheh <mfasheh@suse.de> vfs: allow /proc/PID/maps to get device from stat stat(2) on btrfs returns a custom device, but proc uses s_dev from the super block. This causes problems because software (and users) are not expecting the kernel to return different devices from these calls. This patch fixes the problem by adding a new superblock flag, MS_PROC_USE_ST. When the proc code sees this flag, it will call the file systems ->getattr() method to extract a device as opposed to getting it directly from s_dev. Signed-off-by: Mark Fasheh <mfasheh@suse.de> --- fs/btrfs/super.c | 1 + fs/proc/generic.c | 30 ++++++++++++++++++++++++++++++ fs/proc/internal.h | 1 + fs/proc/task_mmu.c | 2 +- fs/proc/task_nommu.c | 2 +- include/uapi/linux/fs.h | 1 + 6 files changed, 35 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index f0857e0..67be4ef 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -822,6 +822,7 @@ static int btrfs_fill_super(struct super_block *sb, sb->s_flags |= MS_POSIXACL; #endif sb->s_flags |= MS_I_VERSION; + sb->s_flags |= MS_PROC_USE_ST; err = open_ctree(sb, fs_devices, (char *)data); if (err) { printk("btrfs: open_ctree failed\n"); diff --git a/fs/proc/generic.c b/fs/proc/generic.c index a2596af..eca8195 100644 --- a/fs/proc/generic.c +++ b/fs/proc/generic.c @@ -24,6 +24,8 @@ #include <linux/spinlock.h> #include <linux/completion.h> #include <asm/uaccess.h> +#include <linux/fs.h> +#include <linux/dcache.h> #include "internal.h" @@ -637,3 +639,31 @@ void *PDE_DATA(const struct inode *inode) return __PDE_DATA(inode); } EXPORT_SYMBOL(PDE_DATA); + +static dev_t proc_get_dev_from_stat(struct inode *inode) +{ + struct dentry *dentry = d_find_any_alias(inode); + struct kstat kstat; + + if (!dentry) + goto out_error; + + if (inode->i_op->getattr(NULL, dentry, &kstat)) + goto out_error_dput; + + dput(dentry); + return kstat.dev; + +out_error_dput: + dput(dentry); +out_error: + return inode->i_sb->s_dev; +} + +dev_t proc_get_map_dev(struct inode *inode) +{ + if (inode->i_sb->s_flags & MS_PROC_USE_ST) + return proc_get_dev_from_stat(inode); + else + return inode->i_sb->s_dev; +} diff --git a/fs/proc/internal.h b/fs/proc/internal.h index d600fb0..24808b0 100644 --- a/fs/proc/internal.h +++ b/fs/proc/internal.h @@ -192,6 +192,7 @@ static inline struct proc_dir_entry *pde_get(struct proc_dir_entry *pde) return pde; } extern void pde_put(struct proc_dir_entry *); +dev_t proc_get_map_dev(struct inode *inode); /* * inode.c diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 3e636d8..9226600 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -272,7 +272,7 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma, int is_pid) if (file) { struct inode *inode = file_inode(vma->vm_file); - dev = inode->i_sb->s_dev; + dev = proc_get_map_dev(inode); ino = inode->i_ino; pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; } diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c index 56123a6..892d84a 100644 --- a/fs/proc/task_nommu.c +++ b/fs/proc/task_nommu.c @@ -150,7 +150,7 @@ static int nommu_vma_show(struct seq_file *m, struct vm_area_struct *vma, if (file) { struct inode *inode = file_inode(vma->vm_file); - dev = inode->i_sb->s_dev; + dev = proc_get_map_dev(inode); ino = inode->i_ino; pgoff = (loff_t)vma->vm_pgoff << PAGE_SHIFT; } diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index a4ed56c..b4173a3 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -88,6 +88,7 @@ struct inodes_stat_t { #define MS_STRICTATIME (1<<24) /* Always perform atime updates */ /* These sb flags are internal to the kernel */ +#define MS_PROC_USE_ST (1<<27) #define MS_NOSEC (1<<28) #define MS_BORN (1<<29) #define MS_ACTIVE (1<<30) -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Jul-10 22:26 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote:> Well, what do I get when I pretend I don''t care any more? The little voice > in my head says "keep plugging away". Here''s another attempt at fixing this > problem in a sane manner. Basically, this time we''re adding a flag to > s_flags which btrfs sets. Proc will see the flag and call ->getattr(). > > This compiles, but it needs testing (which I will get to soon). It still has > a bunch of problems in my honest opinion but maybe if we get something > acceptable upstream we can work from there. > > Also, as Andrew pointed out there''s more than one place which is return > different device than from stat(2) so I probably need to update more sites > to deal with this. > > Does anyone see a problem with this approach?The approach looks ok to me, the implementation is internal to vfs and fairly minimal. The bit that bothers me is the name of the flag, it''s completely unobvious what it means. There are some differences to the linked suse patch:> +static dev_t proc_get_dev_from_stat(struct inode *inode) > +{ > + struct dentry *dentry = d_find_any_alias(inode);This does the dentry -> inode mapping, while originally there was &file->f_path passing just the inode to proc_get_dev_from_stat unnecessarily drops the available information that''s about to be retrieved again.> + struct kstat kstat; > + > + if (!dentry) > + goto out_error; > + if (inode->i_op->getattr(NULL, dentry, &kstat))The suse patch calls vfs_getattr that in turn calls security_inode_getattr(path->mnt, path->dentry); That would be missing. Plus checks for presence of the ->getattr operation. Though this is superfluous with btrfs, I suggest to use vfs_getattr here, which will fix all of the above.> + goto out_error_dput; > + > + dput(dentry); > + return kstat.dev; > + > +out_error_dput: > + dput(dentry); > +out_error: > + return inode->i_sb->s_dev; > +}-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Andrew Vagin
2013-Jul-11 21:25 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote:> On Wed, Jul 10, 2013 at 09:31:05AM -0700, Mark Fasheh wrote: > > As far as I can tell we''ll be carrying this patch until a better > > solution is possible. > > > > When that will happen, I don''t know. > > --Mark > > Well, what do I get when I pretend I don''t care any more? The little voice > in my head says "keep plugging away". Here''s another attempt at fixing this > problem in a sane manner. Basically, this time we''re adding a flag to > s_flags which btrfs sets. Proc will see the flag and call ->getattr(). > > This compiles, but it needs testing (which I will get to soon). It still has > a bunch of problems in my honest opinion but maybe if we get something > acceptable upstream we can work from there. > > Also, as Andrew pointed out there''s more than one place which is return > different device than from stat(2) so I probably need to update more sites > to deal with this.Yes, we need to fix unix_diag, fanotify fdinfo, ...> > Does anyone see a problem with this approach?Looks good for me. Thanks.> --Mark > > -- > Mark Fasheh >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mark Fasheh
2013-Jul-19 20:51 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Thu, Jul 11, 2013 at 12:26:50AM +0200, David Sterba wrote:> On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote: > > Well, what do I get when I pretend I don''t care any more? The little voice > > in my head says "keep plugging away". Here''s another attempt at fixing this > > problem in a sane manner. Basically, this time we''re adding a flag to > > s_flags which btrfs sets. Proc will see the flag and call ->getattr(). > > > > This compiles, but it needs testing (which I will get to soon). It still has > > a bunch of problems in my honest opinion but maybe if we get something > > acceptable upstream we can work from there. > > > > Also, as Andrew pointed out there''s more than one place which is return > > different device than from stat(2) so I probably need to update more sites > > to deal with this. > > > > Does anyone see a problem with this approach? > > The approach looks ok to me, the implementation is internal to vfs and > fairly minimal. The bit that bothers me is the name of the flag, it''s > completely unobvious what it means.I''ll come up with something better for my next revision :)> > There are some differences to the linked suse patch: > > > +static dev_t proc_get_dev_from_stat(struct inode *inode) > > +{ > > + struct dentry *dentry = d_find_any_alias(inode); > > This does the dentry -> inode mapping, while originally there was > > &file->f_path > > passing just the inode to proc_get_dev_from_stat unnecessarily drops the > available information that''s about to be retrieved again.Good catch, thanks.> > > + struct kstat kstat; > > + > > + if (!dentry) > > + goto out_error; > > + if (inode->i_op->getattr(NULL, dentry, &kstat)) > > The suse patch calls vfs_getattr that in turn calls > > security_inode_getattr(path->mnt, path->dentry); > > That would be missing. > > Plus checks for presence of the ->getattr operation. Though this is > superfluous with btrfs, I suggest to use vfs_getattr here, which will > fix all of the above.Ok checking for the operation is definitely needed. I''ll check for ->getattr(). The rest of the stuff in our suse patch must have been added after my own commit. Do you know why this was added? Since this is all internal to proc I didn''t think we needed all this extra stuff to replicate vfs_gettatr(). --Mark -- Mark Fasheh -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Sterba
2013-Jul-31 18:24 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On Fri, Jul 19, 2013 at 01:51:15PM -0700, Mark Fasheh wrote:> Ok checking for the operation is definitely needed. I''ll check for > ->getattr(). The rest of the stuff in our suse patch must have been added > after my own commit. Do you know why this was added? Since this is all > internal to proc I didn''t think we needed all this extra stuff to replicate > vfs_gettatr().I don''t remember, probably for code simplicity. The updated version you''ve sent looks ok to me. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pavel Emelyanov
2013-Oct-24 10:41 UTC
Re: btrfs: stat(2) and /proc/pid/maps returns different devices
On 07/20/2013 12:51 AM, Mark Fasheh wrote:> On Thu, Jul 11, 2013 at 12:26:50AM +0200, David Sterba wrote: >> On Wed, Jul 10, 2013 at 10:45:45AM -0700, Mark Fasheh wrote: >>> Well, what do I get when I pretend I don''t care any more? The little voice >>> in my head says "keep plugging away". Here''s another attempt at fixing this >>> problem in a sane manner. Basically, this time we''re adding a flag to >>> s_flags which btrfs sets. Proc will see the flag and call ->getattr(). >>> >>> This compiles, but it needs testing (which I will get to soon). It still has >>> a bunch of problems in my honest opinion but maybe if we get something >>> acceptable upstream we can work from there. >>> >>> Also, as Andrew pointed out there''s more than one place which is return >>> different device than from stat(2) so I probably need to update more sites >>> to deal with this. >>> >>> Does anyone see a problem with this approach? >> >> The approach looks ok to me, the implementation is internal to vfs and >> fairly minimal. The bit that bothers me is the name of the flag, it''s >> completely unobvious what it means. > > I''ll come up with something better for my next revision :)Mark, David, What are your plans about the next version? Any chance we can see it in the 3.13 merge window? (unless I''ve missed the fact, that it''s already there) I''d really love to see it, as this thing is a blocker for checkpoint-restore on btrfs. Thanks, Pavel -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html