Li Zefan
2010-Dec-13 09:47 UTC
[PATCH 1/3] Btrfs: Really return keys within specified range
The keys returned by tree search ioctl should be restricted to: key.objectid = [min_objectid, max_objectid] && key.offset = [min_offset, max_offset] && key.type = [min_type, max_type] But actually it returns those keys: [(min_objectid, min_type, min_offset), (max_objectid, max_type, max_offset)]. And the bug can result in missing subvolumes in the output of "btrfs subvolume list" Reported-by: Ian! D. Allen <idallen@idallen.ca> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/ioctl.c | 20 ++++---------------- 1 files changed, 4 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index f1c9bb4..785f713 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1028,23 +1028,11 @@ out: static noinline int key_in_sk(struct btrfs_key *key, struct btrfs_ioctl_search_key *sk) { - struct btrfs_key test; - int ret; - - test.objectid = sk->min_objectid; - test.type = sk->min_type; - test.offset = sk->min_offset; - - ret = btrfs_comp_cpu_keys(key, &test); - if (ret < 0) + if (key->type < sk->min_type || key->type > sk->max_type) return 0; - - test.objectid = sk->max_objectid; - test.type = sk->max_type; - test.offset = sk->max_offset; - - ret = btrfs_comp_cpu_keys(key, &test); - if (ret > 0) + if (key->offset < sk->min_offset || key->offset > sk->max_offset) + return 0; + if (key->objectid < sk->min_objectid || key->objectid > sk->max_objectid) return 0; return 1; } -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2010-Dec-13 09:50 UTC
[PATCH 2/3] Btrfs: Don''t return items more than user specified
We check if num_found >= sk->nr_items every time we find an expected item, but num_found is not incremented, so we may return items more than the user asked. Also return -EINVAL if the user specified 0 nr_items. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/ioctl.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 785f713..08174e2 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1053,7 +1053,6 @@ static noinline int copy_to_sk(struct btrfs_root *root, int nritems; int i; int slot; - int found = 0; int ret = 0; leaf = path->nodes[0]; @@ -1100,8 +1099,8 @@ static noinline int copy_to_sk(struct btrfs_root *root, item_off, item_len); *sk_offset += item_len; } - found++; + (*num_found)++; if (*num_found >= sk->nr_items) break; } @@ -1119,7 +1118,6 @@ advance_key: } else ret = 1; overflow: - *num_found += found; return ret; } @@ -1136,6 +1134,9 @@ static noinline int search_ioctl(struct inode *inode, int num_found = 0; unsigned long sk_offset = 0; + if (sk->nr_items == 0) + return -EINVAL; + path = btrfs_alloc_path(); if (!path) return -ENOMEM; -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
- Check if the key is within specified range before checking the item length. - Advance to the next key a bit more efficiently. - Remove redundant code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> --- fs/btrfs/ioctl.c | 30 ++++++++++++------------------ 1 files changed, 12 insertions(+), 18 deletions(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 08174e2..477affb 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1059,28 +1059,24 @@ static noinline int copy_to_sk(struct btrfs_root *root, slot = path->slots[0]; nritems = btrfs_header_nritems(leaf); - if (btrfs_header_generation(leaf) > sk->max_transid) { - i = nritems; + if (btrfs_header_generation(leaf) > sk->max_transid) goto advance_key; - } found_transid = btrfs_header_generation(leaf); for (i = slot; i < nritems; i++) { item_off = btrfs_item_ptr_offset(leaf, i); item_len = btrfs_item_size_nr(leaf, i); + btrfs_item_key_to_cpu(leaf, key, i); + if (!key_in_sk(key, sk)) + continue; + if (item_len > BTRFS_SEARCH_ARGS_BUFSIZE) item_len = 0; if (sizeof(sh) + item_len + *sk_offset > - BTRFS_SEARCH_ARGS_BUFSIZE) { - ret = 1; - goto overflow; - } - - btrfs_item_key_to_cpu(leaf, key, i); - if (!key_in_sk(key, sk)) - continue; + BTRFS_SEARCH_ARGS_BUFSIZE) + return 1; sh.objectid = key->objectid; sh.offset = key->offset; @@ -1102,22 +1098,21 @@ static noinline int copy_to_sk(struct btrfs_root *root, (*num_found)++; if (*num_found >= sk->nr_items) - break; + return 1; } advance_key: ret = 0; if (key->offset < (u64)-1 && key->offset < sk->max_offset) key->offset++; else if (key->type < (u8)-1 && key->type < sk->max_type) { - key->offset = 0; + key->offset = sk->min_offset; key->type++; } else if (key->objectid < (u64)-1 && key->objectid < sk->max_objectid) { - key->offset = 0; - key->type = 0; + key->offset = sk->min_offset; + key->type = sk->min_type; key->objectid++; } else ret = 1; -overflow: return ret; } @@ -1178,9 +1173,8 @@ static noinline int search_ioctl(struct inode *inode, ret = copy_to_sk(root, path, &key, sk, args->buf, &sk_offset, &num_found); btrfs_release_path(root, path); - if (ret || num_found >= sk->nr_items) + if (ret) break; - } ret = 0; err: -- 1.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2010-Dec-13 18:13 UTC
Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Hi Li, On Monday, 13 December, 2010, Li Zefan wrote:> The keys returned by tree search ioctl should be restricted to: > > key.objectid = [min_objectid, max_objectid] && > key.offset = [min_offset, max_offset] && > key.type = [min_type, max_type] > > But actually it returns those keys: > > [(min_objectid, min_type, min_offset), > (max_objectid, max_type, max_offset)]. >I have to admit that I had need several minutes to understand what you wrote :). Then I came to conclusion that the tree search ioctl is basically wrong. IMHO, the error in this API is to use the lower bound of the acceptance criteria (the min_objectid, min_offset, min_type fields) also as starting point for the search. Let me explain with an example. Suppose to want to search all the keys in the range key.objectid = 10..20 key.offset = 100..200 key.type = 2..5 Suppose to set sk->nr_items to 1 for simplicity, and the keys available which fit in the range are 1) [15,150,3] 2) [16,160,4] 3) [17,180,3] All these key satisfy the "acceptance criteria", but because we have to restart the search from the last key found, the code should resemble sk = &args.key sk->min_objectid=10; sk->max_objectid=20 sk->min_offset=100; sk->max_offset=200 sk->min_type=2; sk->max_type=5 sk->nr_items = 1; while(1){ ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); if( !sk->nr_items ) break for(off = 0, i=0 ; i < sk->nr_items ; i ){ sh = (struct btrfs_ioctl_search_header *)(args.buf off); [...] sk->min_objectid = sh->objectid; sk->min_offset = sh->offset; sk->min_type = sh->type; } <increase the sk->min_* key of 1> } But in this case, the code after found the key #2, sets the minimum acceptance criteria to [16,160,4], which exclude the key #3 because min_type is too high. Ideally, we should add three new field to the search key structure: sk->start_objectid sk->start_offset sk->start_type And after every iteration the code (even the kernel code) should set these fields as "last key found 1", leaving the min_* fields as they are. My analysis is correct or I miss something ? Regards G.Baroncelli> And the bug can result in missing subvolumes in the output of > "btrfs subvolume list" > > Reported-by: Ian! D. Allen <idallen@idallen.ca> > Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> > --- > fs/btrfs/ioctl.c | 20 ++++---------------- > 1 files changed, 4 insertions(+), 16 deletions(-) > > diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c > index f1c9bb4..785f713 100644 > --- a/fs/btrfs/ioctl.c > +++ b/fs/btrfs/ioctl.c > @@ -1028,23 +1028,11 @@ out: > static noinline int key_in_sk(struct btrfs_key *key, > struct btrfs_ioctl_search_key *sk) > { > - struct btrfs_key test; > - int ret; > - > - test.objectid = sk->min_objectid; > - test.type = sk->min_type; > - test.offset = sk->min_offset; > - > - ret = btrfs_comp_cpu_keys(key, &test); > - if (ret < 0) > + if (key->type < sk->min_type || key->type > sk->max_type) > return 0; > - > - test.objectid = sk->max_objectid; > - test.type = sk->max_type; > - test.offset = sk->max_offset; > - > - ret = btrfs_comp_cpu_keys(key, &test); > - if (ret > 0) > + if (key->offset < sk->min_offset || key->offset > sk->max_offset) > + return 0; > + if (key->objectid < sk->min_objectid || key->objectid > sk- >max_objectid) > return 0; > return 1; > } > -- > 1.6.3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2010-Dec-14 05:37 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Goffredo Baroncelli wrote:> Hi Li, > > On Monday, 13 December, 2010, Li Zefan wrote: >> The keys returned by tree search ioctl should be restricted to: >> >> key.objectid = [min_objectid, max_objectid] && >> key.offset = [min_offset, max_offset] && >> key.type = [min_type, max_type] >> >> But actually it returns those keys: >> >> [(min_objectid, min_type, min_offset), >> (max_objectid, max_type, max_offset)]. >> > > I have to admit that I had need several minutes to understand what you wrote > :). Then I came to conclusion that the tree search ioctl is basically wrong. > > IMHO, the error in this API is to use the lower bound of the acceptance > criteria (the min_objectid, min_offset, min_type fields) also as starting > point for the search. > > Let me explain with an example. > > Suppose to want to search all the keys in the range > > key.objectid = 10..20 > key.offset = 100..200 > key.type = 2..5 > > > Suppose to set sk->nr_items to 1 for simplicity, and the keys available which > fit in the range are > > 1) [15,150,3] > 2) [16,160,4] > 3) [17,180,3] > > All these key satisfy the "acceptance criteria", but because we have to > restart the search from the last key found, the code should resemble > > sk = &args.key > > sk->min_objectid=10; sk->max_objectid=20 > sk->min_offset=100; sk->max_offset=200 > sk->min_type=2; sk->max_type=5 > sk->nr_items = 1; > > while(1){ > ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); > if( !sk->nr_items ) > break > > for(off = 0, i=0 ; i < sk->nr_items ; i ){ > sh = (struct btrfs_ioctl_search_header *)(args.buf > off); > > [...] > sk->min_objectid = sh->objectid; > sk->min_offset = sh->offset; > sk->min_type = sh->type; > } > > <increase the sk->min_* key of 1> > > } > > But in this case, the code after found the key #2, sets the minimum acceptance > criteria to [16,160,4], which exclude the key #3 because min_type is too high. > > Ideally, we should add three new field to the search key structure: > > sk->start_objectid > sk->start_offset > sk->start_type > > And after every iteration the code (even the kernel code) should set these > fields as "last key found 1", leaving the min_* fields as they are. > > My analysis is correct or I miss something ? >After looking more deeply, I found the ioctl was changed in this way on purpose, to support "btrfs subvolume find-new" specifically. See this commit: commit abc6e1341bda974e2d0eddb75f57a20ac18e9b33 Author: Chris Mason <chris.mason@oracle.com> Date: Thu Mar 18 12:10:08 2010 -0400 Btrfs: fix key checks and advance in the search ioctl The search ioctl was working well for finding tree roots, but using it for generic searches requires a few changes to how the keys are advanced. This treats the search control min fields for objectid, type and offset more like a key, where we drop the offset to zero once we bump the type, etc. The downside of this is that we are changing the min_type and min_offset fields during the search, and so the ioctl caller needs extra checks to make the keys in the result are the ones it wanted. This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make things more readable. So I think we can just fix the btrfs tool. Though adding sk->start_xxx should also be able to meet the needs for "btrfs subvolume find-new". -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2010-Dec-14 18:16 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
On Tuesday, 14 December, 2010, Li Zefan wrote:> Goffredo Baroncelli wrote: > > Hi Li, > > > > On Monday, 13 December, 2010, Li Zefan wrote: > >> The keys returned by tree search ioctl should be restricted to: > >> > >> key.objectid = [min_objectid, max_objectid] && > >> key.offset = [min_offset, max_offset] && > >> key.type = [min_type, max_type] > >> > >> But actually it returns those keys: > >> > >> [(min_objectid, min_type, min_offset), > >> (max_objectid, max_type, max_offset)]. > >> > > > > I have to admit that I had need several minutes to understand what youwrote> > :). Then I came to conclusion that the tree search ioctl is basicallywrong.> > > > IMHO, the error in this API is to use the lower bound of the acceptance > > criteria (the min_objectid, min_offset, min_type fields) also as starting > > point for the search. > > > > Let me explain with an example. > > > > Suppose to want to search all the keys in the range > > > > key.objectid = 10..20 > > key.offset = 100..200 > > key.type = 2..5 > > > > > > Suppose to set sk->nr_items to 1 for simplicity, and the keys availablewhich> > fit in the range are > > > > 1) [15,150,3] > > 2) [16,160,4] > > 3) [17,180,3] > > > > All these key satisfy the "acceptance criteria", but because we have to > > restart the search from the last key found, the code should resemble > > > > sk = &args.key > > > > sk->min_objectid=10; sk->max_objectid=20 > > sk->min_offset=100; sk->max_offset=200 > > sk->min_type=2; sk->max_type=5 > > sk->nr_items = 1; > > > > while(1){ > > ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); > > if( !sk->nr_items ) > > break > > > > for(off = 0, i=0 ; i < sk->nr_items ; i ){ > > sh = (struct btrfs_ioctl_search_header *)(args.buf > > off); > > > > [...] > > sk->min_objectid = sh->objectid; > > sk->min_offset = sh->offset; > > sk->min_type = sh->type; > > } > > > > <increase the sk->min_* key of 1> > > > > } > > > > But in this case, the code after found the key #2, sets the minimumacceptance> > criteria to [16,160,4], which exclude the key #3 because min_type is toohigh.> > > > Ideally, we should add three new field to the search key structure: > > > > sk->start_objectid > > sk->start_offset > > sk->start_type > > > > And after every iteration the code (even the kernel code) should set these > > fields as "last key found 1", leaving the min_* fields as they are. > > > > My analysis is correct or I miss something ? > > > > After looking more deeply, I found the ioctl was changed in this way > on purpose, to support "btrfs subvolume find-new" specifically. > > See this commit: > > commit abc6e1341bda974e2d0eddb75f57a20ac18e9b33 > Author: Chris Mason <chris.mason@oracle.com> > Date: Thu Mar 18 12:10:08 2010 -0400 > > Btrfs: fix key checks and advance in the search ioctl > > The search ioctl was working well for finding tree roots, but using itfor> generic searches requires a few changes to how the keys are advanced. > This treats the search control min fields for objectid, type and offset > more like a key, where we drop the offset to zero once we bump the type, > etc. > > The downside of this is that we are changing the min_type and min_offset > fields during the search, and so the ioctl caller needs extra checks tomake> the keys in the result are the ones it wanted. > > This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make > things more readable. > > So I think we can just fix the btrfs tool. Though adding sk->start_xxxshould> also be able to meet the needs for "btrfs subvolume find-new".Sorry, but I have to disagree. This API seems to me simply bugged. The example above (which is quite generic) highlights this fact. But I can provide a more real case: suppose to use the BTRFS_IOC_TREE_SEARCH ioctl to find the new files. We are interested to the following items: - BTRFS_EXTENT_DATA_KEY (type = 1) - BTRFS_INODE_ITEM_KEY (type = 24) - BTRFS_XATTR_ITEM_KEY (type = 108) Acceptance criteria: min_type = 1 max_type = 108 min_offset = 0 max_offset = ~0 min_objectid = 0 max_objectid = ~0 min_transid = <the base generation number> Pay attention that we aren''t interested in the offset. Suppose to have the following sequence keys [objectid, type, offset]: [...] 1) [300, BTRFS_EXTENT_DATA_KEY, xx] 2) [300, BTRFS_INODE_ITEM_KEY, xx] 3) [300, BTRFS_XATTR_ITEM_KEY, xx] 4) [301, BTRFS_EXTENT_DATA_KEY, xx] 5) [301, BTRFS_INODE_ITEM_KEY, xx] 7) [30200, BTRFS_EXTENT_DATA_KEY, xx] 8) [30200, BTRFS_INODE_ITEM_KEY, xx] 9) [30200, BTRFS_XATTR_ITEM_KEY, xx] [...] Suppose that the buffer is filled between the item 2 and 3. We should restart the search, but how set the min_* key ? Try the following hypothesis h1) objectid++, type = 0 -> In the next search the key 3 would be skipped h2) objectid asis, type ++, -> in the next search the key 4 would be skipped h3) objectid asis, type = 0 -> in the next search the key 1,2,3 would be returned a second time... Pay attention that every inode may have more key type BTRFS_XATTR_ITEM_KEY or type BTRFS_EXTENT_DATA_KEY, so it is not possible to know in advance when the buffer is filled. Only as theoretical exercise, we can improve the search logic in userspace so when an item is returned, in the next search we set the minimum type as previous type+1, and the *maximum* objectid as the latest ofound bject id. When we are sure that there are not more key with this objectid we can reuse the old max_objectid and min_type... But to me it seems very fragile. Chris what do you think ? Otherwise I missed something this seems a severe bug in the api ? In another email I will propose a patch which may address this problem. Regards G.Baroncelli -- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2010-Dec-15 03:33 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Goffredo Baroncelli wrote:> On Tuesday, 14 December, 2010, Li Zefan wrote: >> Goffredo Baroncelli wrote: >>> Hi Li, >>> >>> On Monday, 13 December, 2010, Li Zefan wrote: >>>> The keys returned by tree search ioctl should be restricted to: >>>> >>>> key.objectid = [min_objectid, max_objectid] && >>>> key.offset = [min_offset, max_offset] && >>>> key.type = [min_type, max_type] >>>> >>>> But actually it returns those keys: >>>> >>>> [(min_objectid, min_type, min_offset), >>>> (max_objectid, max_type, max_offset)]. >>>> >>> I have to admit that I had need several minutes to understand what you > wrote >>> :). Then I came to conclusion that the tree search ioctl is basically > wrong. >>> IMHO, the error in this API is to use the lower bound of the acceptance >>> criteria (the min_objectid, min_offset, min_type fields) also as starting >>> point for the search. >>> >>> Let me explain with an example. >>> >>> Suppose to want to search all the keys in the range >>> >>> key.objectid = 10..20 >>> key.offset = 100..200 >>> key.type = 2..5 >>> >>> >>> Suppose to set sk->nr_items to 1 for simplicity, and the keys available > which >>> fit in the range are >>> >>> 1) [15,150,3] >>> 2) [16,160,4] >>> 3) [17,180,3] >>> >>> All these key satisfy the "acceptance criteria", but because we have to >>> restart the search from the last key found, the code should resemble >>> >>> sk = &args.key >>> >>> sk->min_objectid=10; sk->max_objectid=20 >>> sk->min_offset=100; sk->max_offset=200 >>> sk->min_type=2; sk->max_type=5 >>> sk->nr_items = 1; >>> >>> while(1){ >>> ioctl(fd, BTRFS_IOC_TREE_SEARCH, &args); >>> if( !sk->nr_items ) >>> break >>> >>> for(off = 0, i=0 ; i < sk->nr_items ; i ){ >>> sh = (struct btrfs_ioctl_search_header *)(args.buf >>> off); >>> >>> [...] >>> sk->min_objectid = sh->objectid; >>> sk->min_offset = sh->offset; >>> sk->min_type = sh->type; >>> } >>> >>> <increase the sk->min_* key of 1> >>> >>> } >>> >>> But in this case, the code after found the key #2, sets the minimum > acceptance >>> criteria to [16,160,4], which exclude the key #3 because min_type is too > high. >>> Ideally, we should add three new field to the search key structure: >>> >>> sk->start_objectid >>> sk->start_offset >>> sk->start_type >>> >>> And after every iteration the code (even the kernel code) should set these >>> fields as "last key found 1", leaving the min_* fields as they are. >>> >>> My analysis is correct or I miss something ? >>> >> After looking more deeply, I found the ioctl was changed in this way >> on purpose, to support "btrfs subvolume find-new" specifically. >> >> See this commit: >> >> commit abc6e1341bda974e2d0eddb75f57a20ac18e9b33 >> Author: Chris Mason <chris.mason@oracle.com> >> Date: Thu Mar 18 12:10:08 2010 -0400 >> >> Btrfs: fix key checks and advance in the search ioctl >> >> The search ioctl was working well for finding tree roots, but using it > for >> generic searches requires a few changes to how the keys are advanced. >> This treats the search control min fields for objectid, type and offset >> more like a key, where we drop the offset to zero once we bump the type, >> etc. >> >> The downside of this is that we are changing the min_type and min_offset >> fields during the search, and so the ioctl caller needs extra checks to > make >> the keys in the result are the ones it wanted. >> >> This also changes key_in_sk to use btrfs_comp_cpu_keys, just to make >> things more readable. >> >> So I think we can just fix the btrfs tool. Though adding sk->start_xxx > should >> also be able to meet the needs for "btrfs subvolume find-new". > > Sorry, but I have to disagree. This API seems to me simply bugged. The example > above (which is quite generic) highlights this fact. But I can provide a more > real case: suppose to use the BTRFS_IOC_TREE_SEARCH ioctl to find the new > files. We are interested to the following items: > > - BTRFS_EXTENT_DATA_KEY (type = 1) > - BTRFS_INODE_ITEM_KEY (type = 24) > - BTRFS_XATTR_ITEM_KEY (type = 108) > > Acceptance criteria: > > min_type = 1 > max_type = 108 > min_offset = 0 > max_offset = ~0 > min_objectid = 0 > max_objectid = ~0 > min_transid = <the base generation number> > > Pay attention that we aren''t interested in the offset. > > Suppose to have the following sequence keys [objectid, type, offset]: > > [...] > 1) [300, BTRFS_EXTENT_DATA_KEY, xx] > 2) [300, BTRFS_INODE_ITEM_KEY, xx] > 3) [300, BTRFS_XATTR_ITEM_KEY, xx] > 4) [301, BTRFS_EXTENT_DATA_KEY, xx] > 5) [301, BTRFS_INODE_ITEM_KEY, xx] > 7) [30200, BTRFS_EXTENT_DATA_KEY, xx] > 8) [30200, BTRFS_INODE_ITEM_KEY, xx] > 9) [30200, BTRFS_XATTR_ITEM_KEY, xx] > [...] > > > Suppose that the buffer is filled between the item 2 and 3. We should restart > the search, but how set the min_* key ? Try the following hypothesis > > h1) objectid++, type = 0 -> In the next search the key 3 would be skipped > h2) objectid asis, type ++, -> in the next search the key 4 would be skipped > h3) objectid asis, type = 0 -> in the next search the key 1,2,3 would beh4) objectid asis, type asis, offset++ -> we should get the correct result. because the current ioctl uses min_{x,y,z} and max_{x,y,z} as start_key and end_key, and it returns all keys that falls in [start_key, end_key]. So this btrfs-progs patch should fix missing subvolumes in the output of "subvolume list": diff --git a/btrfs-list.c b/btrfs-list.c index 93766a8..1b9ea45 100644 --- a/btrfs-list.c +++ b/btrfs-list.c @@ -620,7 +620,10 @@ int list_subvols(int fd) /* this iteration is done, step forward one root for the next * ioctl */ - if (sk->min_objectid < (u64)-1) { + if (sk->min_type < BTRFS_ROOT_BACKREF_KEY) { + sk->min_type = BTRFS_ROOT_BACKREF_KEY; + sk->min_offset = 0; + } else if (sk->min_objectid < (u64)-1) { sk->min_objectid++; sk->min_type = BTRFS_ROOT_BACKREF_KEY; sk->min_offset = 0;> returned a second time... > > Pay attention that every inode may have more key type BTRFS_XATTR_ITEM_KEY or > type BTRFS_EXTENT_DATA_KEY, so it is not possible to know in advance when the > buffer is filled. > > Only as theoretical exercise, we can improve the search logic in userspace so > when an item is returned, in the next search we set the minimum type as > previous type+1, and the *maximum* objectid as the latest ofound bject id. > When we are sure that there are not more key with this objectid we can reuse > the old max_objectid and min_type... But to me it seems very fragile. > > Chris what do you think ? Otherwise I missed something this seems a severe bug > in the api ? > > In another email I will propose a patch which may address this problem. >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2010-Dec-15 06:53 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
On Wednesday, 15 December, 2010, Li Zefan wrote:> h4) objectid asis, type asis, offset++ -> we should get the correct result.This fix the problem of the "missing subvolume". But for the other case (searching for more than one type) the problem still here.> because the current ioctl uses min_{x,y,z} and max_{x,y,z} as start_key and > end_key, and it returns all keys that falls in [start_key, end_key]. > > So this btrfs-progs patch should fix missing subvolumes in the output of > "subvolume list": > > diff --git a/btrfs-list.c b/btrfs-list.c > index 93766a8..1b9ea45 100644 > --- a/btrfs-list.c > +++ b/btrfs-list.c > @@ -620,7 +620,10 @@ int list_subvols(int fd) > /* this iteration is done, step forward one root for thenext> * ioctl > */ > - if (sk->min_objectid < (u64)-1) { > + if (sk->min_type < BTRFS_ROOT_BACKREF_KEY) { > + sk->min_type = BTRFS_ROOT_BACKREF_KEY; > + sk->min_offset = 0; > + } else if (sk->min_objectid < (u64)-1) { > sk->min_objectid++; > sk->min_type = BTRFS_ROOT_BACKREF_KEY; > sk->min_offset = 0; >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2010-Dec-15 07:13 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Goffredo Baroncelli wrote:> On Wednesday, 15 December, 2010, Li Zefan wrote: >> h4) objectid asis, type asis, offset++ -> we should get the correct result. > > This fix the problem of the "missing subvolume". But for the other case > (searching for more than one type) the problem still here. >I don''t think so. And the above "h4" has showed how we search for more than one type. The generic userland code for next search is: /* this is in essence the same as how we advance key in kernel code */ if (sk->min_offset < (u64)-1 && sk->min_offset < sk->max_offset) sk->min_offset++; else if (sk->min_type < (u8)-1 && sk->min_type < sk->max_type) { sk->min_offset = 0; sk->min_type++; } else if (sk->min_objectid < (u64)-1 && sk->min_objectid < sk->max_objectid){ sk->min_offset = 0; sk->min_type = 0; sk->min_objectid++; } else break; ioctl(...); for (i = 0; i < nr_items; i++) { if (!filter(items[i])) continue; /* process this item */ ... }>> because the current ioctl uses min_{x,y,z} and max_{x,y,z} as start_key and >> end_key, and it returns all keys that falls in [start_key, end_key]. >> >> So this btrfs-progs patch should fix missing subvolumes in the output of >> "subvolume list": >> >> diff --git a/btrfs-list.c b/btrfs-list.c >> index 93766a8..1b9ea45 100644 >> --- a/btrfs-list.c >> +++ b/btrfs-list.c >> @@ -620,7 +620,10 @@ int list_subvols(int fd) >> /* this iteration is done, step forward one root for the > next >> * ioctl >> */ >> - if (sk->min_objectid < (u64)-1) { >> + if (sk->min_type < BTRFS_ROOT_BACKREF_KEY) { >> + sk->min_type = BTRFS_ROOT_BACKREF_KEY; >> + sk->min_offset = 0; >> + } else if (sk->min_objectid < (u64)-1) { >> sk->min_objectid++; >> sk->min_type = BTRFS_ROOT_BACKREF_KEY; >> sk->min_offset = 0; >> > >-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2010-Dec-15 16:14 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Excerpts from Li Zefan''s message of 2010-12-14 22:33:33 -0500:> > Suppose to have the following sequence keys [objectid, type, offset]: > > > > [...] > > 1) [300, BTRFS_EXTENT_DATA_KEY, xx] > > 2) [300, BTRFS_INODE_ITEM_KEY, xx] > > 3) [300, BTRFS_XATTR_ITEM_KEY, xx] > > 4) [301, BTRFS_EXTENT_DATA_KEY, xx] > > 5) [301, BTRFS_INODE_ITEM_KEY, xx] > > 7) [30200, BTRFS_EXTENT_DATA_KEY, xx] > > 8) [30200, BTRFS_INODE_ITEM_KEY, xx] > > 9) [30200, BTRFS_XATTR_ITEM_KEY, xx] > > [...] > > > > > > Suppose that the buffer is filled between the item 2 and 3. We should restart > > the search, but how set the min_* key ? Try the following hypothesis > > > > h1) objectid++, type = 0 -> In the next search the key 3 would be skipped > > h2) objectid asis, type ++, -> in the next search the key 4 would be skipped > > h3) objectid asis, type = 0 -> in the next search the key 1,2,3 would be > > h4) objectid asis, type asis, offset++ -> we should get the correct result.This is the right answer ;). The problem is that even though our key has 3 distinct parts, and the API makes it look like you have very fine grained control over those three parts, you have to remember to reset them as you iterate between objectids. It isn''t a obvious as it should be. The current API is a very raw export of how we do the searches in the kernel too. You can do pretty much anything with it, but we pay with complexity. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2010-Dec-15 18:42 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
On Wednesday, 15 December, 2010, Chris Mason wrote:> Excerpts from Li Zefan''s message of 2010-12-14 22:33:33 -0500: > > > Suppose to have the following sequence keys [objectid, type, offset]: > > > > > > [...] > > > 1) [300, BTRFS_EXTENT_DATA_KEY, xx] > > > 2) [300, BTRFS_INODE_ITEM_KEY, xx] > > > 3) [300, BTRFS_XATTR_ITEM_KEY, xx] > > > 4) [301, BTRFS_EXTENT_DATA_KEY, xx] > > > 5) [301, BTRFS_INODE_ITEM_KEY, xx] > > > 7) [30200, BTRFS_EXTENT_DATA_KEY, xx] > > > 8) [30200, BTRFS_INODE_ITEM_KEY, xx] > > > 9) [30200, BTRFS_XATTR_ITEM_KEY, xx] > > > [...] > > > > > > > > > Suppose that the buffer is filled between the item 2 and 3. We shouldrestart> > > the search, but how set the min_* key ? Try the following hypothesis > > > > > > h1) objectid++, type = 0 -> In the next search the key 3 would beskipped> > > h2) objectid asis, type ++, -> in the next search the key 4 would beskipped> > > h3) objectid asis, type = 0 -> in the next search the key 1,2,3 would be > > > > h4) objectid asis, type asis, offset++ -> we should get the correctresult.> > This is the right answer ;). The problem is that even though our key has > 3 distinct parts, and the API makes it look like you have very fine > grained control over those three parts, you have to remember to reset > them as you iterate between objectids. It isn''t a obvious as it should > be. > > The current API is a very raw export of how we do the searches in the > kernel too. You can do pretty much anything with it, but we pay with > complexity.Hi Chris, I am a bit confused about your answer. The actual API is a bit confused (or almost not "obvious"). An application in order to work properly has to make some adjustment to the min_* fields AND filter the results (because if we tweak with the min_* field, not useful data is returned). Moreover this means that we move between user-space<->kernel- space a lot of unused data (un-necessary context switch). On the basis of your answer, it seems that this is ok (please don''t consider only the case of listing the subvolumes which is a very simple cases). And nothing have to do. Instead I suggest to (see my email [PATCH] BTRFS_IOC_TREE_SEARCH: store and use the last key found): - leave the min_* and max_* fields to act only as filter - add three more fields start_* as start point for the search - make some small modification to the kernel code to track in the start_* fields the last key found pro: - we pass to userspace only useful data - we simplify a lot the userspace application, because they don''t have to update the min_* fields (they will work in a obvious way :-) ) - we can use the ordered property of a btree structure to perform efficient data lookup (if we reset to zero the min_* fields we lookup un-necessary data) - we have the same functionality of the old API cons: - we need a modification ( :-) ) May be that I missed some point, but I don''t see any advantage to continue to support the actual API. Of course, that doesn''t means that we can remove the old API ignoring the backward compatibility. But I think that there are sufficient pros to develop a new API Please be patient: my english is very bad; I am not trying to blame anybody; I want only a "perfect fs" (TM) :-)> > -chris > >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2010-Dec-15 18:48 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
On Wednesday, 15 December, 2010, Li Zefan wrote:> Goffredo Baroncelli wrote: > > On Wednesday, 15 December, 2010, Li Zefan wrote: > >> h4) objectid asis, type asis, offset++ -> we should get the correctresult.> > > > This fix the problem of the "missing subvolume". But for the other case > > (searching for more than one type) the problem still here. > > > > I don''t think so. And the above "h4" has showed how we search for more > than one type. > > The generic userland code for next search is: > > /* this is in essence the same as how we advance key in kernel code */ > if (sk->min_offset < (u64)-1 && sk->min_offset < sk->max_offset) > sk->min_offset++; > else if (sk->min_type < (u8)-1 && sk->min_type < sk->max_type) { > sk->min_offset = 0; > sk->min_type++; > } else if (sk->min_objectid < (u64)-1 && sk->min_objectid < sk- >max_objectid){ > sk->min_offset = 0; > sk->min_type = 0;Sorry but if you reset the sk->min_type to 0, this means that the min_type lost its purpose (act as lover bound of the acceptance criteria).> sk->min_objectid++; > } else > break; > > ioctl(...); > > for (i = 0; i < nr_items; i++) { > if (!filter(items[i])) > continue;So you are suggesting: "Move all tree items from kernel to user space and filter it in userspace ?". This mean a lot of un-needed kernel-space <-> userspace transition... Sorry I don''t understand if we are talking about a workaround or a solution.> /* process this item */ > ... > } > > >> because the current ioctl uses min_{x,y,z} and max_{x,y,z} as start_keyand> >> end_key, and it returns all keys that falls in [start_key, end_key]. > >> > >> So this btrfs-progs patch should fix missing subvolumes in the output of > >> "subvolume list": > >> > >> diff --git a/btrfs-list.c b/btrfs-list.c > >> index 93766a8..1b9ea45 100644 > >> --- a/btrfs-list.c > >> +++ b/btrfs-list.c > >> @@ -620,7 +620,10 @@ int list_subvols(int fd) > >> /* this iteration is done, step forward one root for the > > next > >> * ioctl > >> */ > >> - if (sk->min_objectid < (u64)-1) { > >> + if (sk->min_type < BTRFS_ROOT_BACKREF_KEY) { > >> + sk->min_type = BTRFS_ROOT_BACKREF_KEY; > >> + sk->min_offset = 0; > >> + } else if (sk->min_objectid < (u64)-1) { > >> sk->min_objectid++; > >> sk->min_type = BTRFS_ROOT_BACKREF_KEY; > >> sk->min_offset = 0; > >> > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Mason
2010-Dec-15 18:51 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
Excerpts from Goffredo Baroncelli''s message of 2010-12-15 13:42:23 -0500:> On Wednesday, 15 December, 2010, Chris Mason wrote: > > Excerpts from Li Zefan''s message of 2010-12-14 22:33:33 -0500: > > > > Suppose to have the following sequence keys [objectid, type, offset]: > > > > > > > > [...] > > > > 1) [300, BTRFS_EXTENT_DATA_KEY, xx] > > > > 2) [300, BTRFS_INODE_ITEM_KEY, xx] > > > > 3) [300, BTRFS_XATTR_ITEM_KEY, xx] > > > > 4) [301, BTRFS_EXTENT_DATA_KEY, xx] > > > > 5) [301, BTRFS_INODE_ITEM_KEY, xx] > > > > 7) [30200, BTRFS_EXTENT_DATA_KEY, xx] > > > > 8) [30200, BTRFS_INODE_ITEM_KEY, xx] > > > > 9) [30200, BTRFS_XATTR_ITEM_KEY, xx] > > > > [...] > > > > > > > > > > > > Suppose that the buffer is filled between the item 2 and 3. We should > restart > > > > the search, but how set the min_* key ? Try the following hypothesis > > > > > > > > h1) objectid++, type = 0 -> In the next search the key 3 would be > skipped > > > > h2) objectid asis, type ++, -> in the next search the key 4 would be > skipped > > > > h3) objectid asis, type = 0 -> in the next search the key 1,2,3 would be > > > > > > h4) objectid asis, type asis, offset++ -> we should get the correct > result. > > > > This is the right answer ;). The problem is that even though our key has > > 3 distinct parts, and the API makes it look like you have very fine > > grained control over those three parts, you have to remember to reset > > them as you iterate between objectids. It isn''t a obvious as it should > > be. > > > > The current API is a very raw export of how we do the searches in the > > kernel too. You can do pretty much anything with it, but we pay with > > complexity. > > Hi Chris, > > I am a bit confused about your answer. > > The actual API is a bit confused (or almost not "obvious"). An application in > order to work properly has to make some adjustment to the min_* fields AND > filter the results (because if we tweak with the min_* field, not useful data > is returned). Moreover this means that we move between user-space<->kernel- > space a lot of unused data (un-necessary context switch). > > On the basis of your answer, it seems that this is ok (please don''t consider > only the case of listing the subvolumes which is a very simple cases). And > nothing have to do.Well, it''s ok in that I wanted the API to be very close to the way searches are done in the kernel. I''ll definitely agree it isn''t perfect, especially as we hop between objectids or types. But I don''t want to extend it just yet, mostly because we don''t have new applications making use of it. I''d rather couple any new apis with new applications that we haven''t yet thought of. Thanks a lot for the time you''re spending on review and looking at this. If you have killer apps that can really make use of new APIs, I''m happy to start reworking things. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Goffredo Baroncelli
2010-Dec-15 19:13 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
On Wednesday, 15 December, 2010, Chris Mason wrote: [...]> > > > Hi Chris, > > > > I am a bit confused about your answer. > > > > The actual API is a bit confused (or almost not "obvious"). An applicationin> > order to work properly has to make some adjustment to the min_* fields AND > > filter the results (because if we tweak with the min_* field, not usefuldata> > is returned). Moreover this means that we move between user-space<- >kernel- > > space a lot of unused data (un-necessary context switch). > > > > On the basis of your answer, it seems that this is ok (please don''tconsider> > only the case of listing the subvolumes which is a very simple cases). And > > nothing have to do. > > Well, it''s ok in that I wanted the API to be very close to the way > searches are done in the kernel. I''ll definitely agree it isn''t > perfect, especially as we hop between objectids or types. > > But I don''t want to extend it just yet, mostly because we don''t have new > applications making use of it. I''d rather couple any new apis with new > applications that we haven''t yet thought of. > > Thanks a lot for the time you''re spending on review and looking at this. > If you have killer apps that can really make use of new APIs, I''m happy > to start reworking things.Look at a my previous email about an enhancement of the find-new command ([RFC] Improve btrfs subvolume find-new command - 11/12/2010). It would be sufficient ? (of course now on the basis of the last news of this API I know that this command is bugged :-( ) I can live with the current API ( tweaking the increasing of the min_* fields).. but think about another side of the question: now the only client of this API is the btrfs command (btrfs subvol list, which actually is broken) , and we can update this api with the minimum effort. Instead if we leave the current behavior in the future may appears an application which depends on it. So we may be obligated to maintain it .. Goffredo> > -chris > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >-- gpg key@ keyserver.linux.it: Goffredo Baroncelli (ghigo) <kreijack@inwind.it> Key fingerprint = 4769 7E51 5293 D36C 814E C054 BF04 F161 3DC5 0512 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Li Zefan
2010-Dec-16 01:03 UTC
Re: Bug in the design of the tree search ioctl API ? [was Re: [PATCH 1/3] Btrfs: Really return keys within specified range]
02:48, Goffredo Baroncelli wrote:> On Wednesday, 15 December, 2010, Li Zefan wrote: >> Goffredo Baroncelli wrote: >>> On Wednesday, 15 December, 2010, Li Zefan wrote: >>>> h4) objectid asis, type asis, offset++ -> we should get the correct > result. >>> This fix the problem of the "missing subvolume". But for the other case >>> (searching for more than one type) the problem still here. >>> >> I don''t think so. And the above "h4" has showed how we search for more >> than one type. >> >> The generic userland code for next search is: >> >> /* this is in essence the same as how we advance key in kernel code */ >> if (sk->min_offset < (u64)-1 && sk->min_offset < sk->max_offset) >> sk->min_offset++; >> else if (sk->min_type < (u8)-1 && sk->min_type < sk->max_type) { >> sk->min_offset = 0; >> sk->min_type++; >> } else if (sk->min_objectid < (u64)-1 && sk->min_objectid < sk- >> max_objectid){ >> sk->min_offset = 0; >> sk->min_type = 0; > > Sorry but if you reset the sk->min_type to 0, this means that the min_type > lost its purpose (act as lover bound of the acceptance criteria). >Yep, the changelog of Chris'' commit has said that userland has to do this.>> sk->min_objectid++; >> } else >> break; >> >> ioctl(...); >> >> for (i = 0; i < nr_items; i++) { >> if (!filter(items[i])) >> continue; > > So you are suggesting: "Move all tree items from kernel to user space and > filter it in userspace ?". This mean a lot of un-needed kernel-space <-> > userspace transition... >Right, but it''s fine so far. I''m not suggesting anything, but explaining how the ioctl is working.> Sorry I don''t understand if we are talking about a workaround or a solution. >As Chris said, it''s not perfect but we can just live along with it, until we find some killer app that requests us to improve/expand this ioctl. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html