These two patches give a degree of control over balance operations. The first makes it possible to get an idea of how much work remains to do, by tracking the number of block groups (chunks) that need to be moved/rewritten. The second patch allows a running balance operation to be cancelled when the current block group has been moved. One fundamental question, though -- is the progress monitor function best implemented as an ioctl, as I''ve done here, or should it be two or three sysfs files? I''m thinking of /proc/mdstat... Obviously, /proc/mdstat would never get into /sys, but exposing the "expected" and "remaining" values as files has an attractive simplicity to it. The user-space side of things are in a separate patch series, to follow. Please be gentle with me, this is my first (serious, non-trivial) kernel patch. :) Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- "No! My collection of rare, incurable diseases! Violated!" --- -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
This patch introduces a basic form of progress monitoring for balance operations, by counting the number of block groups remaining. The information is exposed to userspace by an ioctl. Signed-off-by: Hugo Mills <hugo@carfax.org.uk> --- fs/btrfs/ctree.h | 9 ++++++++ fs/btrfs/disk-io.c | 2 + fs/btrfs/ioctl.c | 34 ++++++++++++++++++++++++++++++++ fs/btrfs/ioctl.h | 7 ++++++ fs/btrfs/volumes.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 5 files changed, 105 insertions(+), 2 deletions(-) Index: linux-mainline/fs/btrfs/ctree.h ==================================================================--- linux-mainline.orig/fs/btrfs/ctree.h 2010-10-26 18:03:38.000000000 +0100 +++ linux-mainline/fs/btrfs/ctree.h 2010-10-29 17:20:43.860460761 +0100 @@ -803,6 +803,11 @@ struct list_head cluster_list; }; +struct btrfs_balance_info { + u64 expected; + u64 completed; +}; + struct reloc_control; struct btrfs_device; struct btrfs_fs_devices; @@ -1010,6 +1015,10 @@ unsigned metadata_ratio; void *bdev_holder; + + /* Keep track of any rebalance operations on this FS */ + spinlock_t balance_info_lock; + struct btrfs_balance_info *balance_info; }; /* Index: linux-mainline/fs/btrfs/ioctl.c ==================================================================--- linux-mainline.orig/fs/btrfs/ioctl.c 2010-10-26 18:03:38.000000000 +0100 +++ linux-mainline/fs/btrfs/ioctl.c 2010-10-29 17:21:26.128742389 +0100 @@ -1984,6 +1984,38 @@ return 0; } +/* + * Return the current status of any balance operation + */ +long btrfs_ioctl_balance_progress( + struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_balance_progress __user *user_dest) +{ + int ret = 0; + struct btrfs_ioctl_balance_progress dest; + + spin_lock(&fs_info->balance_info_lock); + if (!fs_info->balance_info) { + ret = -EINVAL; + goto error; + } + + dest.expected = fs_info->balance_info->expected; + dest.completed = fs_info->balance_info->completed; + + spin_unlock(&fs_info->balance_info_lock); + + if (copy_to_user(user_dest, &dest, + sizeof(struct btrfs_ioctl_balance_progress))) + return -EFAULT; + + return 0; + +error: + spin_unlock(&fs_info->balance_info_lock); + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2017,6 +2049,8 @@ return btrfs_ioctl_rm_dev(root, argp); case BTRFS_IOC_BALANCE: return btrfs_balance(root->fs_info->dev_root); + case BTRFS_IOC_BALANCE_PROGRESS: + return btrfs_ioctl_balance_progress(root->fs_info, argp); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: Index: linux-mainline/fs/btrfs/ioctl.h ==================================================================--- linux-mainline.orig/fs/btrfs/ioctl.h 2010-10-26 18:03:38.000000000 +0100 +++ linux-mainline/fs/btrfs/ioctl.h 2010-10-29 17:05:44.447028825 +0100 @@ -138,6 +138,11 @@ struct btrfs_ioctl_space_info spaces[0]; }; +struct btrfs_ioctl_balance_progress { + __u64 expected; + __u64 completed; +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -178,4 +183,6 @@ #define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, u64) #define BTRFS_IOC_SPACE_INFO _IOWR(BTRFS_IOCTL_MAGIC, 20, \ struct btrfs_ioctl_space_args) +#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \ + struct btrfs_ioctl_balance_progress) #endif Index: linux-mainline/fs/btrfs/volumes.c ==================================================================--- linux-mainline.orig/fs/btrfs/volumes.c 2010-10-26 18:03:38.000000000 +0100 +++ linux-mainline/fs/btrfs/volumes.c 2010-10-29 17:23:40.463279287 +0100 @@ -1902,6 +1902,7 @@ struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; struct btrfs_trans_handle *trans; struct btrfs_key found_key; + struct btrfs_balance_status *bal_info; if (dev_root->fs_info->sb->s_flags & MS_RDONLY) return -EROFS; @@ -1909,6 +1910,18 @@ mutex_lock(&dev_root->fs_info->volume_mutex); dev_root = dev_root->fs_info->dev_root; + dev_root->fs_info->balance_info = kmalloc( + sizeof(struct btrfs_balance_info), + GFP_NOFS); + if (!dev_root->fs_info->balance_info) { + ret = -ENOSPC; + goto error_no_status; + } + bal_info = dev_root->fs_info->balance_info; + bal_info->expected = -1; /* One less than actually counted, + because chunk 0 is special */ + bal_info->completed = 0; + /* step one make some room on all the devices */ list_for_each_entry(device, devices, dev_list) { old_size = device->total_bytes; @@ -1932,10 +1945,40 @@ btrfs_end_transaction(trans, dev_root); } - /* step two, relocate all the chunks */ + /* step two, count the chunks */ path = btrfs_alloc_path(); - BUG_ON(!path); + if (!path) { + ret = -ENOSPC; + goto error; + } + + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; + key.offset = (u64)-1; + key.type = BTRFS_CHUNK_ITEM_KEY; + + ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0); + if (ret <= 0) { + printk(KERN_ERR "btrfs: Failed to find the last chunk.\n"); + BUG(); + } + + while (1) { + ret = btrfs_previous_item(chunk_root, path, 0, + BTRFS_CHUNK_ITEM_KEY); + if (ret) + break; + + bal_info->expected++; + } + + btrfs_free_path(path); + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOSPC; + goto error; + } + /* step three, relocate all the chunks */ key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; key.offset = (u64)-1; key.type = BTRFS_CHUNK_ITEM_KEY; @@ -1976,10 +2019,18 @@ found_key.offset); BUG_ON(ret && ret != -ENOSPC); key.offset = found_key.offset - 1; + bal_info->completed++; + printk(KERN_INFO "btrfs: balance: %llu/%llu block groups completed\n", + bal_info->completed, bal_info->expected); } ret = 0; error: btrfs_free_path(path); + spin_lock(&dev_root->fs_info->balance_info_lock); + kfree(dev_root->fs_info->balance_info); + dev_root->fs_info->balance_info = NULL; + spin_unlock(&dev_root->fs_info->balance_info_lock); +error_no_status: mutex_unlock(&dev_root->fs_info->volume_mutex); return ret; } Index: linux-mainline/fs/btrfs/disk-io.c ==================================================================--- linux-mainline.orig/fs/btrfs/disk-io.c 2010-10-29 17:19:12.404178865 +0100 +++ linux-mainline/fs/btrfs/disk-io.c 2010-10-29 17:20:02.022161666 +0100 @@ -1591,6 +1591,7 @@ spin_lock_init(&fs_info->ref_cache_lock); spin_lock_init(&fs_info->fs_roots_radix_lock); spin_lock_init(&fs_info->delayed_iput_lock); + spin_lock_init(&fs_info->balance_info_lock); init_completion(&fs_info->kobj_unregister); fs_info->tree_root = tree_root; @@ -1616,6 +1617,7 @@ fs_info->sb = sb; fs_info->max_inline = 8192 * 1024; fs_info->metadata_ratio = 0; + fs_info->balance_info = NULL; fs_info->thread_pool_size = min_t(unsigned long, num_online_cpus() + 2, 8); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
This patch adds an ioctl for cancelling a btrfs balance operation mid-flight. The ioctl simply sets a flag, and the operation terminates after the current block group move has completed. Signed-off-by: Hugo Mills <hugo@carfax.org.uk> --- fs/btrfs/ctree.h | 1 + fs/btrfs/ioctl.c | 25 +++++++++++++++++++++++++ fs/btrfs/ioctl.h | 1 + fs/btrfs/volumes.c | 7 ++++++- 4 files changed, 33 insertions(+), 1 deletion(-) Index: linux-mainline/fs/btrfs/ctree.h ==================================================================--- linux-mainline.orig/fs/btrfs/ctree.h 2010-10-29 17:20:43.860460761 +0100 +++ linux-mainline/fs/btrfs/ctree.h 2010-10-29 17:24:06.622214467 +0100 @@ -806,6 +806,7 @@ struct btrfs_balance_info { u64 expected; u64 completed; + int cancel_pending; }; struct reloc_control; Index: linux-mainline/fs/btrfs/ioctl.c ==================================================================--- linux-mainline.orig/fs/btrfs/ioctl.c 2010-10-29 17:21:26.128742389 +0100 +++ linux-mainline/fs/btrfs/ioctl.c 2010-10-29 17:27:51.933043374 +0100 @@ -2016,6 +2016,29 @@ return ret; } +/* + * Cancel a running balance operation + */ +long btrfs_ioctl_balance_cancel(struct btrfs_fs_info *fs_info) +{ + int err = 0; + + spin_lock(&fs_info->balance_info_lock); + if(!fs_info->balance_info) { + err = -EINVAL; + goto error; + } + if(fs_info->balance_info->cancel_pending) { + err = -ECANCELED; + goto error; + } + fs_info->balance_info->cancel_pending = 1; + +error: + spin_unlock(&fs_info->balance_info_lock); + return err; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2051,6 +2074,8 @@ return btrfs_balance(root->fs_info->dev_root); case BTRFS_IOC_BALANCE_PROGRESS: return btrfs_ioctl_balance_progress(root->fs_info, argp); + case BTRFS_IOC_BALANCE_CANCEL: + return btrfs_ioctl_balance_cancel(root->fs_info); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: Index: linux-mainline/fs/btrfs/ioctl.h ==================================================================--- linux-mainline.orig/fs/btrfs/ioctl.h 2010-10-29 17:05:44.447028825 +0100 +++ linux-mainline/fs/btrfs/ioctl.h 2010-10-29 17:24:06.642213653 +0100 @@ -185,4 +185,5 @@ struct btrfs_ioctl_space_args) #define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \ struct btrfs_ioctl_balance_progress) +#define BTRFS_IOC_BALANCE_CANCEL _IO(BTRFS_IOCTL_MAGIC, 22) #endif Index: linux-mainline/fs/btrfs/volumes.c ==================================================================--- linux-mainline.orig/fs/btrfs/volumes.c 2010-10-29 17:23:40.463279287 +0100 +++ linux-mainline/fs/btrfs/volumes.c 2010-10-29 17:24:06.652213246 +0100 @@ -1921,6 +1921,7 @@ bal_info->expected = -1; /* One less than actually counted, because chunk 0 is special */ bal_info->completed = 0; + bal_info->cancel_pending = 0; /* step one make some room on all the devices */ list_for_each_entry(device, devices, dev_list) { @@ -1983,7 +1984,7 @@ key.offset = (u64)-1; key.type = BTRFS_CHUNK_ITEM_KEY; - while (1) { + while (!bal_info->cancel_pending) { ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0); if (ret < 0) goto error; @@ -2024,6 +2025,10 @@ bal_info->completed, bal_info->expected); } ret = 0; + if(bal_info->cancel_pending) { + printk(KERN_INFO "btrfs: balance cancelled\n"); + ret = -EINTR; + } error: btrfs_free_path(path); spin_lock(&dev_root->fs_info->balance_info_lock); -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 30, 2010 at 01:07:27AM +0100, Hugo Mills wrote:> This patch introduces a basic form of progress monitoring for balance > operations, by counting the number of block groups remaining. The > information is exposed to userspace by an ioctl.Dammit. An unrefreshed quilt patch let an error get through (see below). Updated patch in a few moments. Hugo.> Index: linux-mainline/fs/btrfs/volumes.c > ==================================================================> --- linux-mainline.orig/fs/btrfs/volumes.c 2010-10-26 18:03:38.000000000 +0100 > +++ linux-mainline/fs/btrfs/volumes.c 2010-10-29 17:23:40.463279287 +0100 > @@ -1902,6 +1902,7 @@ > struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; > struct btrfs_trans_handle *trans; > struct btrfs_key found_key; > + struct btrfs_balance_status *bal_info;+ struct btrfs_balance_info *bal_info; -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- <dragon> A linked list is still a binary tree. Just a --- very unbalanced one.
This patch introduces a basic form of progress monitoring for balance operations, by counting the number of block groups remaining. The information is exposed to userspace by an ioctl. Signed-off-by: Hugo Mills <hugo@carfax.org.uk> --- This patch replaces the one previously posted, correcting a minor error. fs/btrfs/ctree.h | 9 ++++++++ fs/btrfs/disk-io.c | 2 + fs/btrfs/ioctl.c | 34 ++++++++++++++++++++++++++++++++ fs/btrfs/ioctl.h | 7 ++++++ fs/btrfs/volumes.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 5 files changed, 105 insertions(+), 2 deletions(-) Index: linux-mainline/fs/btrfs/ctree.h ==================================================================--- linux-mainline.orig/fs/btrfs/ctree.h 2010-10-26 18:03:38.000000000 +0100 +++ linux-mainline/fs/btrfs/ctree.h 2010-10-30 14:35:25.306450922 +0100 @@ -803,6 +803,11 @@ struct list_head cluster_list; }; +struct btrfs_balance_info { + u64 expected; + u64 completed; +}; + struct reloc_control; struct btrfs_device; struct btrfs_fs_devices; @@ -1010,6 +1015,10 @@ unsigned metadata_ratio; void *bdev_holder; + + /* Keep track of any rebalance operations on this FS */ + spinlock_t balance_info_lock; + struct btrfs_balance_info *balance_info; }; /* Index: linux-mainline/fs/btrfs/ioctl.c ==================================================================--- linux-mainline.orig/fs/btrfs/ioctl.c 2010-10-26 18:03:38.000000000 +0100 +++ linux-mainline/fs/btrfs/ioctl.c 2010-10-30 14:35:25.396447198 +0100 @@ -1984,6 +1984,38 @@ return 0; } +/* + * Return the current status of any balance operation + */ +long btrfs_ioctl_balance_progress( + struct btrfs_fs_info *fs_info, + struct btrfs_ioctl_balance_progress __user *user_dest) +{ + int ret = 0; + struct btrfs_ioctl_balance_progress dest; + + spin_lock(&fs_info->balance_info_lock); + if (!fs_info->balance_info) { + ret = -EINVAL; + goto error; + } + + dest.expected = fs_info->balance_info->expected; + dest.completed = fs_info->balance_info->completed; + + spin_unlock(&fs_info->balance_info_lock); + + if (copy_to_user(user_dest, &dest, + sizeof(struct btrfs_ioctl_balance_progress))) + return -EFAULT; + + return 0; + +error: + spin_unlock(&fs_info->balance_info_lock); + return ret; +} + long btrfs_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2017,6 +2049,8 @@ return btrfs_ioctl_rm_dev(root, argp); case BTRFS_IOC_BALANCE: return btrfs_balance(root->fs_info->dev_root); + case BTRFS_IOC_BALANCE_PROGRESS: + return btrfs_ioctl_balance_progress(root->fs_info, argp); case BTRFS_IOC_CLONE: return btrfs_ioctl_clone(file, arg, 0, 0, 0); case BTRFS_IOC_CLONE_RANGE: Index: linux-mainline/fs/btrfs/ioctl.h ==================================================================--- linux-mainline.orig/fs/btrfs/ioctl.h 2010-10-26 18:03:38.000000000 +0100 +++ linux-mainline/fs/btrfs/ioctl.h 2010-10-30 14:35:25.316450509 +0100 @@ -138,6 +138,11 @@ struct btrfs_ioctl_space_info spaces[0]; }; +struct btrfs_ioctl_balance_progress { + __u64 expected; + __u64 completed; +}; + #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ struct btrfs_ioctl_vol_args) #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ @@ -178,4 +183,6 @@ #define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, u64) #define BTRFS_IOC_SPACE_INFO _IOWR(BTRFS_IOCTL_MAGIC, 20, \ struct btrfs_ioctl_space_args) +#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \ + struct btrfs_ioctl_balance_progress) #endif Index: linux-mainline/fs/btrfs/volumes.c ==================================================================--- linux-mainline.orig/fs/btrfs/volumes.c 2010-10-26 18:03:38.000000000 +0100 +++ linux-mainline/fs/btrfs/volumes.c 2010-10-30 14:35:25.326450096 +0100 @@ -1902,6 +1902,7 @@ struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; struct btrfs_trans_handle *trans; struct btrfs_key found_key; + struct btrfs_balance_info *bal_info; if (dev_root->fs_info->sb->s_flags & MS_RDONLY) return -EROFS; @@ -1909,6 +1910,18 @@ mutex_lock(&dev_root->fs_info->volume_mutex); dev_root = dev_root->fs_info->dev_root; + dev_root->fs_info->balance_info = kmalloc( + sizeof(struct btrfs_balance_info), + GFP_NOFS); + if (!dev_root->fs_info->balance_info) { + ret = -ENOSPC; + goto error_no_status; + } + bal_info = dev_root->fs_info->balance_info; + bal_info->expected = -1; /* One less than actually counted, + because chunk 0 is special */ + bal_info->completed = 0; + /* step one make some room on all the devices */ list_for_each_entry(device, devices, dev_list) { old_size = device->total_bytes; @@ -1932,10 +1945,40 @@ btrfs_end_transaction(trans, dev_root); } - /* step two, relocate all the chunks */ + /* step two, count the chunks */ path = btrfs_alloc_path(); - BUG_ON(!path); + if (!path) { + ret = -ENOSPC; + goto error; + } + + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; + key.offset = (u64)-1; + key.type = BTRFS_CHUNK_ITEM_KEY; + + ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0); + if (ret <= 0) { + printk(KERN_ERR "btrfs: Failed to find the last chunk.\n"); + BUG(); + } + + while (1) { + ret = btrfs_previous_item(chunk_root, path, 0, + BTRFS_CHUNK_ITEM_KEY); + if (ret) + break; + + bal_info->expected++; + } + + btrfs_free_path(path); + path = btrfs_alloc_path(); + if (!path) { + ret = -ENOSPC; + goto error; + } + /* step three, relocate all the chunks */ key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; key.offset = (u64)-1; key.type = BTRFS_CHUNK_ITEM_KEY; @@ -1976,10 +2019,18 @@ found_key.offset); BUG_ON(ret && ret != -ENOSPC); key.offset = found_key.offset - 1; + bal_info->completed++; + printk(KERN_INFO "btrfs: balance: %llu/%llu block groups completed\n", + bal_info->completed, bal_info->expected); } ret = 0; error: btrfs_free_path(path); + spin_lock(&dev_root->fs_info->balance_info_lock); + kfree(dev_root->fs_info->balance_info); + dev_root->fs_info->balance_info = NULL; + spin_unlock(&dev_root->fs_info->balance_info_lock); +error_no_status: mutex_unlock(&dev_root->fs_info->volume_mutex); return ret; } Index: linux-mainline/fs/btrfs/disk-io.c ==================================================================--- linux-mainline.orig/fs/btrfs/disk-io.c 2010-10-29 17:19:12.000000000 +0100 +++ linux-mainline/fs/btrfs/disk-io.c 2010-10-29 17:20:02.022161666 +0100 @@ -1591,6 +1591,7 @@ spin_lock_init(&fs_info->ref_cache_lock); spin_lock_init(&fs_info->fs_roots_radix_lock); spin_lock_init(&fs_info->delayed_iput_lock); + spin_lock_init(&fs_info->balance_info_lock); init_completion(&fs_info->kobj_unregister); fs_info->tree_root = tree_root; @@ -1616,6 +1617,7 @@ fs_info->sb = sb; fs_info->max_inline = 8192 * 1024; fs_info->metadata_ratio = 0; + fs_info->balance_info = NULL; fs_info->thread_pool_size = min_t(unsigned long, num_online_cpus() + 2, 8); -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- <dragon> A linked list is still a binary tree. Just a --- very unbalanced one.
Goffredo Baroncelli
2010-Oct-30 17:44 UTC
Re: [patch 0/2] Control filesystem balances (kernel side)
On Saturday, 30 October, 2010, Hugo Mills wrote:> These two patches give a degree of control over balance operations. > The first makes it possible to get an idea of how much work remains to > do, by tracking the number of block groups (chunks) that need to be > moved/rewritten. The second patch allows a running balance operation > to be cancelled when the current block group has been moved. > > One fundamental question, though -- is the progress monitor > function best implemented as an ioctl, as I''ve done here, or should it > be two or three sysfs files? I''m thinking of /proc/mdstat... > Obviously, /proc/mdstat would never get into /sys, but exposing the > "expected" and "remaining" values as files has an attractive > simplicity to it.I like the idea that these info should be put under sysfs. Something like /sys/btrfs/<filesystem-uuid>/ balance -> info on balancing devices -> list of device (a directory of links or a file which contains the list of devices) subvolumes/ -> info on subvolume(s) label -> label of the filesystem <other btrfs filesystem related knoba> Obviously we need another btrfs command to extract an uuid from a btrfs filesystem like: # btrfs filesystem get-uuid /path/to/a/btrfs/filesystem f9b9c413-0dc8-4e3f-94f2-86faa702f519> > The user-space side of things are in a separate patch series, to > follow. > > Please be gentle with me, this is my first (serious, non-trivial) > kernel patch. :) > > Hugo. > > > -- > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==> PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > --- "No! My collection of rare, incurable diseases! Violated!" --- > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/30/2010 09:39 PM, Hugo Mills wrote:> This patch introduces a basic form of progress monitoring for balance > operations, by counting the number of block groups remaining. The > information is exposed to userspace by an ioctl. >IMO, tracking the information of blocks which are balancing also makes sense. For example, the block information''s blocknr. It can help us monitor better.> Signed-off-by: Hugo Mills <hugo@carfax.org.uk> > > --- > This patch replaces the one previously posted, correcting a minor error. > > fs/btrfs/ctree.h | 9 ++++++++ > fs/btrfs/disk-io.c | 2 + > fs/btrfs/ioctl.c | 34 ++++++++++++++++++++++++++++++++ > fs/btrfs/ioctl.h | 7 ++++++ > fs/btrfs/volumes.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++-- > 5 files changed, 105 insertions(+), 2 deletions(-) > > Index: linux-mainline/fs/btrfs/ctree.h > ==================================================================> --- linux-mainline.orig/fs/btrfs/ctree.h 2010-10-26 18:03:38.000000000 +0100 > +++ linux-mainline/fs/btrfs/ctree.h 2010-10-30 14:35:25.306450922 +0100 > @@ -803,6 +803,11 @@ > struct list_head cluster_list; > }; > > +struct btrfs_balance_info { > + u64 expected; > + u64 completed; > +}; > + > struct reloc_control; > struct btrfs_device; > struct btrfs_fs_devices; > @@ -1010,6 +1015,10 @@ > unsigned metadata_ratio; > > void *bdev_holder; > + > + /* Keep track of any rebalance operations on this FS */ > + spinlock_t balance_info_lock; > + struct btrfs_balance_info *balance_info; > }; > > /* > Index: linux-mainline/fs/btrfs/ioctl.c > ==================================================================> --- linux-mainline.orig/fs/btrfs/ioctl.c 2010-10-26 18:03:38.000000000 +0100 > +++ linux-mainline/fs/btrfs/ioctl.c 2010-10-30 14:35:25.396447198 +0100 > @@ -1984,6 +1984,38 @@ > return 0; > } > > +/* > + * Return the current status of any balance operation > + */ > +long btrfs_ioctl_balance_progress( > + struct btrfs_fs_info *fs_info, > + struct btrfs_ioctl_balance_progress __user *user_dest) > +{ > + int ret = 0; > + struct btrfs_ioctl_balance_progress dest; > + > + spin_lock(&fs_info->balance_info_lock); > + if (!fs_info->balance_info) { > + ret = -EINVAL; > + goto error; > + } > + > + dest.expected = fs_info->balance_info->expected; > + dest.completed = fs_info->balance_info->completed; > + > + spin_unlock(&fs_info->balance_info_lock); > + > + if (copy_to_user(user_dest, &dest, > + sizeof(struct btrfs_ioctl_balance_progress))) > + return -EFAULT; > + > + return 0; > + > +error: > + spin_unlock(&fs_info->balance_info_lock); > + return ret; > +} > + > long btrfs_ioctl(struct file *file, unsigned int > cmd, unsigned long arg) > { > @@ -2017,6 +2049,8 @@ > return btrfs_ioctl_rm_dev(root, argp); > case BTRFS_IOC_BALANCE: > return btrfs_balance(root->fs_info->dev_root); > + case BTRFS_IOC_BALANCE_PROGRESS: > + return btrfs_ioctl_balance_progress(root->fs_info, argp); > case BTRFS_IOC_CLONE: > return btrfs_ioctl_clone(file, arg, 0, 0, 0); > case BTRFS_IOC_CLONE_RANGE: > Index: linux-mainline/fs/btrfs/ioctl.h > ==================================================================> --- linux-mainline.orig/fs/btrfs/ioctl.h 2010-10-26 18:03:38.000000000 +0100 > +++ linux-mainline/fs/btrfs/ioctl.h 2010-10-30 14:35:25.316450509 +0100 > @@ -138,6 +138,11 @@ > struct btrfs_ioctl_space_info spaces[0]; > }; > > +struct btrfs_ioctl_balance_progress { > + __u64 expected; > + __u64 completed; > +}; > + > #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \ > struct btrfs_ioctl_vol_args) > #define BTRFS_IOC_DEFRAG _IOW(BTRFS_IOCTL_MAGIC, 2, \ > @@ -178,4 +183,6 @@ > #define BTRFS_IOC_DEFAULT_SUBVOL _IOW(BTRFS_IOCTL_MAGIC, 19, u64) > #define BTRFS_IOC_SPACE_INFO _IOWR(BTRFS_IOCTL_MAGIC, 20, \ > struct btrfs_ioctl_space_args) > +#define BTRFS_IOC_BALANCE_PROGRESS _IOR(BTRFS_IOCTL_MAGIC, 21, \ > + struct btrfs_ioctl_balance_progress) > #endif > Index: linux-mainline/fs/btrfs/volumes.c > ==================================================================> --- linux-mainline.orig/fs/btrfs/volumes.c 2010-10-26 18:03:38.000000000 +0100 > +++ linux-mainline/fs/btrfs/volumes.c 2010-10-30 14:35:25.326450096 +0100 > @@ -1902,6 +1902,7 @@ > struct btrfs_root *chunk_root = dev_root->fs_info->chunk_root; > struct btrfs_trans_handle *trans; > struct btrfs_key found_key; > + struct btrfs_balance_info *bal_info; > > if (dev_root->fs_info->sb->s_flags & MS_RDONLY) > return -EROFS; > @@ -1909,6 +1910,18 @@ > mutex_lock(&dev_root->fs_info->volume_mutex); > dev_root = dev_root->fs_info->dev_root; > > + dev_root->fs_info->balance_info = kmalloc( > + sizeof(struct btrfs_balance_info), > + GFP_NOFS); > + if (!dev_root->fs_info->balance_info) { > + ret = -ENOSPC;-ENOMEM is better, for it comes from a kmalloc().> + goto error_no_status; > + } > + bal_info = dev_root->fs_info->balance_info; > + bal_info->expected = -1; /* One less than actually counted, > + because chunk 0 is special */ > + bal_info->completed = 0; > + > /* step one make some room on all the devices */ > list_for_each_entry(device, devices, dev_list) { > old_size = device->total_bytes; > @@ -1932,10 +1945,40 @@ > btrfs_end_transaction(trans, dev_root); > } > > - /* step two, relocate all the chunks */ > + /* step two, count the chunks */ > path = btrfs_alloc_path(); > - BUG_ON(!path); > + if (!path) { > + ret = -ENOSPC;ditto> + goto error; > + } > + > + key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; > + key.offset = (u64)-1; > + key.type = BTRFS_CHUNK_ITEM_KEY; > + > + ret = btrfs_search_slot(NULL, chunk_root, &key, path, 0, 0); > + if (ret <= 0) { > + printk(KERN_ERR "btrfs: Failed to find the last chunk.\n"); > + BUG(); > + } > + > + while (1) { > + ret = btrfs_previous_item(chunk_root, path, 0, > + BTRFS_CHUNK_ITEM_KEY); > + if (ret) > + break; > + > + bal_info->expected++; > + } > + > + btrfs_free_path(path); > + path = btrfs_alloc_path(); > + if (!path) { > + ret = -ENOSPC;ditto> + goto error; > + } > > + /* step three, relocate all the chunks */ > key.objectid = BTRFS_FIRST_CHUNK_TREE_OBJECTID; > key.offset = (u64)-1; > key.type = BTRFS_CHUNK_ITEM_KEY; > @@ -1976,10 +2019,18 @@ > found_key.offset); > BUG_ON(ret && ret != -ENOSPC); > key.offset = found_key.offset - 1; > + bal_info->completed++; > + printk(KERN_INFO "btrfs: balance: %llu/%llu block groups completed\n", > + bal_info->completed, bal_info->expected);Would you please printk found_key.offset which balance code is processing? That would be helpful. thanks, liubo> } > ret = 0; > error: > btrfs_free_path(path); > + spin_lock(&dev_root->fs_info->balance_info_lock); > + kfree(dev_root->fs_info->balance_info); > + dev_root->fs_info->balance_info = NULL; > + spin_unlock(&dev_root->fs_info->balance_info_lock); > +error_no_status: > mutex_unlock(&dev_root->fs_info->volume_mutex); > return ret; > } > Index: linux-mainline/fs/btrfs/disk-io.c > ==================================================================> --- linux-mainline.orig/fs/btrfs/disk-io.c 2010-10-29 17:19:12.000000000 +0100 > +++ linux-mainline/fs/btrfs/disk-io.c 2010-10-29 17:20:02.022161666 +0100 > @@ -1591,6 +1591,7 @@ > spin_lock_init(&fs_info->ref_cache_lock); > spin_lock_init(&fs_info->fs_roots_radix_lock); > spin_lock_init(&fs_info->delayed_iput_lock); > + spin_lock_init(&fs_info->balance_info_lock); > > init_completion(&fs_info->kobj_unregister); > fs_info->tree_root = tree_root; > @@ -1616,6 +1617,7 @@ > fs_info->sb = sb; > fs_info->max_inline = 8192 * 1024; > fs_info->metadata_ratio = 0; > + fs_info->balance_info = NULL; > > fs_info->thread_pool_size = min_t(unsigned long, > num_online_cpus() + 2, 8); >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Tomasz Torcz
2010-Nov-01 12:52 UTC
Re: [patch 0/2] Control filesystem balances (kernel side)
On Mon, Nov 01, 2010 at 01:58:21PM +0100, Xavier Nicollet wrote:> Le 30 octobre 2010 à 19:44, Goffredo Baroncelli a écrit: > > I like the idea that these info should be put under sysfs. Something like > > > > /sys/btrfs/<filesystem-uuid>/ > > balance -> info on balancing > > devices -> list of device (a directory of > > links or a file which contains > > the list of devices) > > subvolumes/ -> info on subvolume(s) > > label -> label of the filesystem > > <other btrfs filesystem related knoba> > > Well, mdstat stats are under /proc/mdstat. > Is sysfs the ideal place ?mdstats are in sys: /sys/block/md127/md/ sync_action, sync_completed, sync_speed, reshape_position etc. /proc file is legacy. -- Tomasz Torcz "Never underestimate the bandwidth of a station xmpp: zdzichubg@chrome.pl wagon filled with backup tapes." -- Jim Gray -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Nov 01, 2010 at 04:06:53PM +0800, liubo wrote:> On 10/30/2010 09:39 PM, Hugo Mills wrote: > > This patch introduces a basic form of progress monitoring for balance > > operations, by counting the number of block groups remaining. The > > information is exposed to userspace by an ioctl. > > > > IMO, tracking the information of blocks which are balancing also makes sense. > For example, the block information''s blocknr. > It can help us monitor better.I don''t see how that will help. The block group IDs (which is all that we get at this level) are effectively arbitrary 64-bit numbers, and are what appear in the kernel logs. How could that information be used to improve monitoring? I''m not ruling out the idea completely -- I just can''t see at the moment how it would be used. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Is a diversity twice as good as a university? ---
Xavier Nicollet
2010-Nov-01 12:58 UTC
Re: [patch 0/2] Control filesystem balances (kernel side)
Le 30 octobre 2010 à 19:44, Goffredo Baroncelli a écrit:> I like the idea that these info should be put under sysfs. Something like > > /sys/btrfs/<filesystem-uuid>/ > balance -> info on balancing > devices -> list of device (a directory of > links or a file which contains > the list of devices) > subvolumes/ -> info on subvolume(s) > label -> label of the filesystem > <other btrfs filesystem related knoba>Well, mdstat stats are under /proc/mdstat. Is sysfs the ideal place ? Just asking. -- Xavier Nicollet -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hugo Mills
2010-Nov-01 13:05 UTC
Re: [patch 0/2] Control filesystem balances (kernel side)
On Sat, Oct 30, 2010 at 07:44:35PM +0200, Goffredo Baroncelli wrote:> On Saturday, 30 October, 2010, Hugo Mills wrote: > > One fundamental question, though -- is the progress monitor > > function best implemented as an ioctl, as I''ve done here, or should it > > be two or three sysfs files? I''m thinking of /proc/mdstat... > > Obviously, /proc/mdstat would never get into /sys, but exposing the > > "expected" and "remaining" values as files has an attractive > > simplicity to it. > > I like the idea that these info should be put under sysfs. Something like > > /sys/btrfs/<filesystem-uuid>//sys/fs/btrfs/<uuid> I think. Also: /sys/fs/btrfs/<label> as a symlink to the <uuid> directory.> balance -> info on balancingFor the one-value-per-file rule of sysfs, this should probably be balance_expected and balance_completed, each holding a count of block groups.> devices -> list of device (a directory of > links or a file which contains > the list of devices) > subvolumes/ -> info on subvolume(s) > label -> label of the filesystem > <other btrfs filesystem related knoba>The other one that struck me earlier today as being useful was tracking the progress of a dev delete operation. But that''ll come later.> Obviously we need another btrfs command to extract an uuid from a btrfs > filesystem like: > > # btrfs filesystem get-uuid /path/to/a/btrfs/filesystem > f9b9c413-0dc8-4e3f-94f2-86faa702f519Possibly a slightly more general "fi metadata" with switches for UUID and label? # btrfs fi metadata [-u|--uuid] /path # btrfs fi metadata [-l|--label] /path Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Is a diversity twice as good as a university? ---
On 11/01/2010 08:55 PM, Hugo Mills wrote:> On Mon, Nov 01, 2010 at 04:06:53PM +0800, liubo wrote: >> On 10/30/2010 09:39 PM, Hugo Mills wrote: >>> This patch introduces a basic form of progress monitoring for balance >>> operations, by counting the number of block groups remaining. The >>> information is exposed to userspace by an ioctl. >>> >> IMO, tracking the information of blocks which are balancing also makes sense. >> For example, the block information''s blocknr. >> It can help us monitor better. > > I don''t see how that will help. The block group IDs (which is all > that we get at this level) are effectively arbitrary 64-bit numbers, > and are what appear in the kernel logs. How could that information be > used to improve monitoring?64-bit numbers are also shown in btrfs-debug-tree. With btrfs-debug-tree, it would be helpful to track balanced extent buffers. thanks, liubo> > I''m not ruling out the idea completely -- I just can''t see at the > moment how it would be used. > > Hugo. >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2010-Nov-04 22:55 UTC
RFC: exporting info via sysfs [was Re: [patch 0/2] Control filesystem balances (kernel side)]
Hi all, I make a prototype for exporting info from btrfs via sysfs. Under /sys/btrfs were created two directories, named "fs" and "devices". /sys/btrfs/fs/<fs-uuid>/ label -> filesystem label num_devices -> total number of devices open_devices -> number of opened devices [...] /sys/btrfs/devices/<dev-uuid>/ devid -> btrfs device number fsid -> filesystem uuid (fs-uuid) major, minor -> major minor name -> device name writeable -> is the device writeable where <fs-uuid> is the filesystem uuid, and <dev-uuid> is the device uuid. The link between devices and filesystem is the <fsid> parameter of a device. I create these structure because we should handle the case were the devices are present (like after a "btrfs device scan") but the filesystem aren''t mounted. In this case the devices/ subdirectory is populated. Instead the fs/ subdirectory is empty. I don''t attach a patch because the code is very ugly. Comments ? Thoughts ? Below an example of use. $ /sbin/blkid img* img0.img: UUID="099ea4b7-96dd-41fc-91df-0d1ab0066e05" UUID_SUB="1103c4e9-2dba-4b58-82ea-7c7c633fe04a" TYPE="btrfs" img1.img: UUID="099ea4b7-96dd-41fc-91df-0d1ab0066e05" UUID_SUB="d677e338-5eb0-4373-a540-78b9e7938987" TYPE="btrfs" img2.img: UUID="099ea4b7-96dd-41fc-91df-0d1ab0066e05" UUID_SUB="de5e3fbf-2400-438c-95b5-e4c876d96bed" TYPE="btrfs" img3.img: UUID="099ea4b7-96dd-41fc-91df-0d1ab0066e05" UUID_SUB="019b1657- edad-488e-ad72-ccd2ea92e3ac" TYPE="btrfs" $ (cd /sys/fs/btrfs/; for i in */*/*; do echo -e "$i:\t$(cat $i)"; done ) devices/019b1657-edad-488e-ad72-ccd2ea92e3ac/devid: 4 devices/019b1657-edad-488e-ad72-ccd2ea92e3ac/fsid: 099ea4b7-96dd-41fc-91df-0d1ab0066e05 devices/019b1657-edad-488e-ad72-ccd2ea92e3ac/major: 98 devices/019b1657-edad-488e-ad72-ccd2ea92e3ac/minor: 64 devices/019b1657-edad-488e-ad72-ccd2ea92e3ac/name: /dev/ubde devices/019b1657-edad-488e-ad72-ccd2ea92e3ac/writeable: 1 devices/1103c4e9-2dba-4b58-82ea-7c7c633fe04a/devid: 1 devices/1103c4e9-2dba-4b58-82ea-7c7c633fe04a/fsid: 099ea4b7-96dd-41fc-91df-0d1ab0066e05 devices/1103c4e9-2dba-4b58-82ea-7c7c633fe04a/major: 98 devices/1103c4e9-2dba-4b58-82ea-7c7c633fe04a/minor: 16 devices/1103c4e9-2dba-4b58-82ea-7c7c633fe04a/name: /dev/ubdb devices/1103c4e9-2dba-4b58-82ea-7c7c633fe04a/writeable: 1 devices/d677e338-5eb0-4373-a540-78b9e7938987/devid: 2 devices/d677e338-5eb0-4373-a540-78b9e7938987/fsid: 099ea4b7-96dd-41fc-91df-0d1ab0066e05 devices/d677e338-5eb0-4373-a540-78b9e7938987/major: 98 devices/d677e338-5eb0-4373-a540-78b9e7938987/minor: 32 devices/d677e338-5eb0-4373-a540-78b9e7938987/name: /dev/ubdc devices/d677e338-5eb0-4373-a540-78b9e7938987/writeable: 1 devices/de5e3fbf-2400-438c-95b5-e4c876d96bed/devid: 3 devices/de5e3fbf-2400-438c-95b5-e4c876d96bed/fsid: 099ea4b7-96dd-41fc-91df-0d1ab0066e05 devices/de5e3fbf-2400-438c-95b5-e4c876d96bed/major: 98 devices/de5e3fbf-2400-438c-95b5-e4c876d96bed/minor: 48 devices/de5e3fbf-2400-438c-95b5-e4c876d96bed/name: /dev/ubdd devices/de5e3fbf-2400-438c-95b5-e4c876d96bed/writeable: 1 fs/099ea4b7-96dd-41fc-91df-0d1ab0066e05/blocks_used: 32768 fs/099ea4b7-96dd-41fc-91df-0d1ab0066e05/blocksize: 4096 fs/099ea4b7-96dd-41fc-91df-0d1ab0066e05/label: fs/099ea4b7-96dd-41fc-91df-0d1ab0066e05/num_devices: 4 fs/099ea4b7-96dd-41fc-91df-0d1ab0066e05/open_devices: 4 fs/099ea4b7-96dd-41fc-91df-0d1ab0066e05/rw_devices: 4 fs/099ea4b7-96dd-41fc-91df-0d1ab0066e05/total_blocks: 2222981120 fs/099ea4b7-96dd-41fc-91df-0d1ab0066e05/total_devices: 4 On Saturday, 30 October, 2010, you (Goffredo Baroncelli) wrote:> On Saturday, 30 October, 2010, Hugo Mills wrote: > > These two patches give a degree of control over balance operations. > > The first makes it possible to get an idea of how much work remains to > > do, by tracking the number of block groups (chunks) that need to be > > moved/rewritten. The second patch allows a running balance operation > > to be cancelled when the current block group has been moved. > > > > One fundamental question, though -- is the progress monitor > > function best implemented as an ioctl, as I''ve done here, or should it > > be two or three sysfs files? I''m thinking of /proc/mdstat... > > Obviously, /proc/mdstat would never get into /sys, but exposing the > > "expected" and "remaining" values as files has an attractive > > simplicity to it. > > > I like the idea that these info should be put under sysfs. Something like > > /sys/btrfs/<filesystem-uuid>/ > balance -> info on balancing > devices -> list of device (a directory of > links or a file which contains > the list of devices) > subvolumes/ -> info on subvolume(s) > label -> label of the filesystem > <other btrfs filesystem related knoba> > > > > > Obviously we need another btrfs command to extract an uuid from a btrfs > filesystem like: > > # btrfs filesystem get-uuid /path/to/a/btrfs/filesystem > f9b9c413-0dc8-4e3f-94f2-86faa702f519 > > > > > The user-space side of things are in a separate patch series, to > > follow. > > > > Please be gentle with me, this is my first (serious, non-trivial) > > kernel patch. :) > > > > Hugo. > > > > > > -- > > === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ==> > PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk > > --- "No! My collection of rare, incurable diseases! Violated!" --- > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo)<kreijack@inwind.it>> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 > >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hugo Mills
2010-Nov-05 12:41 UTC
Re: RFC: exporting info via sysfs [was Re: [patch 0/2] Control filesystem balances (kernel side)]
Hi, Goffredo, On Thu, Nov 04, 2010 at 11:55:24PM +0100, Goffredo Baroncelli wrote:> I make a prototype for exporting info from btrfs via sysfs.Good stuff. I was going to take a look at doing that this weekend. :)> Under /sys/btrfs were created two directories, named "fs" and "devices". > > /sys/btrfs/fs/<fs-uuid>/I''m pretty sure that /sys/btrfs won''t get through any discussion on LKML. I''d suggest /sys/fs/btrfs as the base, since that''s where the other filesystems seem to put their sysfs information.> label -> filesystem label > num_devices -> total number of devices > open_devices -> number of opened devices > [...] > /sys/btrfs/devices/<dev-uuid>/ > devid -> btrfs device number > fsid -> filesystem uuid (fs-uuid) > major, minor -> major minorI think the major, minor should instead be be a symlink to the relevant entry in /sys/devices/... (as done in /sys/block/*) or /sys/block (as done in /sys/block/md*/slaves). Call it "device".> name -> device nameUnnecessary -- and also, I think, unlikely to get through LKML review. Putting a device name here implies that the kernel knows better than userspace what the name of the device is (i.e. which device node you should be using). Having the link to /sys/block/* or /sys/devices/... as above is, I think, all that''s needed here. Userspace should be able to convert the major/minor pair kept in /sys/fs/btrfs/devices/<uuid>/device/dev appropriately.> writeable -> is the device writeable> where <fs-uuid> is the filesystem uuid, and <dev-uuid> is the device uuid. The > link between devices and filesystem is the <fsid> parameter of a device.Could that be made a symlink instead? That seems to be the usual approach in sysfs.> I create these structure because we should handle the case were the devices > are present (like after a "btrfs device scan") but the filesystem aren''t > mounted.... ah, I see it can''t. (Re: my previous comment)> In this case the devices/ subdirectory is populated. Instead the fs/ > subdirectory is empty. > > I don''t attach a patch because the code is very ugly. > Comments ? Thoughts ?Is it ugly because there are significant difficulties in making btrfs or sysfs do this, or just because you hacked something together as quickly as possible for a demo? Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk == PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- "There''s a Martian war machine outside -- they want to talk --- to you about a cure for the common cold."
Mike Fedyk
2010-Nov-08 18:01 UTC
Re: [patch 0/2] Control filesystem balances (kernel side)
[ sorry for breaking the thread, I''m replying from the archives, I was unsubbed after a mail server issue and didn''t notice till now... ] On Sat, Oct 30, 2010 at 07:44:35PM +0200, Goffredo Baroncelli wrote:> > balance -> info on balancingHugo Mills wrote:> For the one-value-per-file rule of sysfs, this should probably be > balance_expected and balance_completed, each holding a count of block > groups.I''d name it balance_chunks_expected and balance_chunks_completed -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html