Raghavendra G
2016-Nov-03 04:52 UTC
[Gluster-users] [Gluster-devel] A question of GlusterFS dentries!
On Wed, Nov 2, 2016 at 9:38 AM, Raghavendra Gowdappa <rgowdapp at redhat.com> wrote:> > > ----- Original Message ----- > > From: "Keiviw" <keiviw at 163.com> > > To: gluster-devel at gluster.org > > Sent: Tuesday, November 1, 2016 12:41:02 PM > > Subject: [Gluster-devel] A question of GlusterFS dentries! > > > > Hi, > > In GlusterFS distributed volumes, listing a non-empty directory was slow. > > Then I read the dht codes and found the reasons. But I was confused that > > GlusterFS dht travesed all the bricks(in the volume) sequentially,why not > > use multi-thread to read dentries from multiple bricks simultaneously. > > That's a question that's always puzzled me, Couly you please tell me > > something about this??? > > readdir across subvols is sequential mostly because we have to support > rewinddir(3).Sorry. seekdir(3) is the more relevant function here. Since rewinddir resets the dir stream to beginning, its not much of a difficulty to support rewinddir with parallel readdirs across subvols.> We need to maintain the mapping of offset and dentry across multiple > invocations of readdir. In other words if someone did a rewinddir to an > offset corresponding to earlier dentry, subsequent readdirs should return > same set of dentries what the earlier invocation of readdir returned. For > example, in an hypothetical scenario, readdir returned following dentries: > > 1. a, off=10 > 2. b, off=2 > 3. c, off=5 > 4. d, off=15 > 5. e, off=17 > 6. f, off=13 > > Now if we did rewinddir to off 5 and issue readdir again we should get > following dentries: > (c, off=5), (d, off=15), (e, off=17), (f, off=13) > > Within a subvol backend filesystem provides rewinddir guarantee for the > dentries present on that subvol. However, across subvols it is the > responsibility of DHT to provide the above guarantee. Which means we > should've some well defined order in which we send readdir calls (Note that > order is not well defined if we do a parallel readdir across all subvols). > So, DHT has sequential readdir which is a well defined order of reading > dentries. > > To give an example if we have another subvol - subvol2 - (in addiction to > the subvol above - say subvol1) with following listing: > 1. g, off=16 > 2. h, off=20 > 3. i, off=3 > 4. j, off=19 > > With parallel readdir we can have many ordering like - (a, b, g, h, i, c, > d, e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with readdir > done parallely): > > 1. A complete listing of the directory (which can be any one of 10P1 = 10 > ways - I hope math is correct here). > 2. Do rewinddir (20) > > We cannot predict what are the set of dentries that come _after_ offset > 20. However, if we do a readdir sequentially across subvols there is only > one directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier > to support rewinddir. > > If there is no POSIX requirement for rewinddir support, I think a parallel > readdir can easily be implemented (which improves performance too). But > unfortunately rewinddir is still a POSIX requirement. This also opens up > another possibility of a "no-rewinddir-support" option in DHT, which if > enabled results in parallel readdirs across subvols. What I am not sure is > how many users still use rewinddir? If there is a critical mass which wants > performance with a tradeoff of no rewinddir support this can be a good > feature. > > +gluster-users to get an opinion on this. > > regards, > Raghavendra > > > > > > > > > > > > > > > _______________________________________________ > > Gluster-devel mailing list > > Gluster-devel at gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel >-- Raghavendra G -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161103/d8a3ed42/attachment.html>
Keiviw
2016-Nov-03 06:04 UTC
[Gluster-users] [Gluster-devel] A question of GlusterFS dentries!
If GlusterFS does not support POSIX seekdir?what problems will user or GlusterFS have? ???????? On 11/03/2016 12:52, Raghavendra G wrote: On Wed, Nov 2, 2016 at 9:38 AM, Raghavendra Gowdappa <rgowdapp at redhat.com> wrote: ----- Original Message -----> From: "Keiviw" <keiviw at 163.com> > To: gluster-devel at gluster.org > Sent: Tuesday, November 1, 2016 12:41:02 PM > Subject: [Gluster-devel] A question of GlusterFS dentries! > > Hi, > In GlusterFS distributed volumes, listing a non-empty directory was slow. > Then I read the dht codes and found the reasons. But I was confused that > GlusterFS dht travesed all the bricks(in the volume) sequentially,why not > use multi-thread to read dentries from multiple bricks simultaneously. > That's a question that's always puzzled me, Couly you please tell me > something about this???readdir across subvols is sequential mostly because we have to support rewinddir(3). Sorry. seekdir(3) is the more relevant function here. Since rewinddir resets the dir stream to beginning, its not much of a difficulty to support rewinddir with parallel readdirs across subvols. We need to maintain the mapping of offset and dentry across multiple invocations of readdir. In other words if someone did a rewinddir to an offset corresponding to earlier dentry, subsequent readdirs should return same set of dentries what the earlier invocation of readdir returned. For example, in an hypothetical scenario, readdir returned following dentries: 1. a, off=10 2. b, off=2 3. c, off=5 4. d, off=15 5. e, off=17 6. f, off=13 Now if we did rewinddir to off 5 and issue readdir again we should get following dentries: (c, off=5), (d, off=15), (e, off=17), (f, off=13) Within a subvol backend filesystem provides rewinddir guarantee for the dentries present on that subvol. However, across subvols it is the responsibility of DHT to provide the above guarantee. Which means we should've some well defined order in which we send readdir calls (Note that order is not well defined if we do a parallel readdir across all subvols). So, DHT has sequential readdir which is a well defined order of reading dentries. To give an example if we have another subvol - subvol2 - (in addiction to the subvol above - say subvol1) with following listing: 1. g, off=16 2. h, off=20 3. i, off=3 4. j, off=19 With parallel readdir we can have many ordering like - (a, b, g, h, i, c, d, e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with readdir done parallely): 1. A complete listing of the directory (which can be any one of 10P1 = 10 ways - I hope math is correct here). 2. Do rewinddir (20) We cannot predict what are the set of dentries that come _after_ offset 20. However, if we do a readdir sequentially across subvols there is only one directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier to support rewinddir. If there is no POSIX requirement for rewinddir support, I think a parallel readdir can easily be implemented (which improves performance too). But unfortunately rewinddir is still a POSIX requirement. This also opens up another possibility of a "no-rewinddir-support" option in DHT, which if enabled results in parallel readdirs across subvols. What I am not sure is how many users still use rewinddir? If there is a critical mass which wants performance with a tradeoff of no rewinddir support this can be a good feature. +gluster-users to get an opinion on this. regards, Raghavendra> > > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel_______________________________________________ Gluster-devel mailing list Gluster-devel at gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- Raghavendra G -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161103/dd220794/attachment.html>
Raghavendra G
2016-Nov-03 07:25 UTC
[Gluster-users] [Gluster-devel] A question of GlusterFS dentries!
On Thu, Nov 3, 2016 at 11:34 AM, Keiviw <keiviw at 163.com> wrote:> If GlusterFS does not support POSIX seekdir?what problems will user or > GlusterFS have? >Glusterfs won't have any problem if we don't support seekdir. I am also not sure whether applications have real use-case for seekdir. But, however its a POSIX requirement.> > ???????? > On 11/03/2016 12:52, Raghavendra G <raghavendra at gluster.com> wrote: > > > > On Wed, Nov 2, 2016 at 9:38 AM, Raghavendra Gowdappa <rgowdapp at redhat.com> > wrote: > >> >> >> ----- Original Message ----- >> > From: "Keiviw" <keiviw at 163.com> >> > To: gluster-devel at gluster.org >> > Sent: Tuesday, November 1, 2016 12:41:02 PM >> > Subject: [Gluster-devel] A question of GlusterFS dentries! >> > >> > Hi, >> > In GlusterFS distributed volumes, listing a non-empty directory was >> slow. >> > Then I read the dht codes and found the reasons. But I was confused that >> > GlusterFS dht travesed all the bricks(in the volume) sequentially,why >> not >> > use multi-thread to read dentries from multiple bricks simultaneously. >> > That's a question that's always puzzled me, Couly you please tell me >> > something about this??? >> >> readdir across subvols is sequential mostly because we have to support >> rewinddir(3). > > > Sorry. seekdir(3) is the more relevant function here. Since rewinddir > resets the dir stream to beginning, its not much of a difficulty to support > rewinddir with parallel readdirs across subvols. > > >> We need to maintain the mapping of offset and dentry across multiple >> invocations of readdir. In other words if someone did a rewinddir to an >> offset corresponding to earlier dentry, subsequent readdirs should return >> same set of dentries what the earlier invocation of readdir returned. For >> example, in an hypothetical scenario, readdir returned following dentries: >> >> 1. a, off=10 >> 2. b, off=2 >> 3. c, off=5 >> 4. d, off=15 >> 5. e, off=17 >> 6. f, off=13 >> >> Now if we did rewinddir to off 5 and issue readdir again we should get >> following dentries: >> (c, off=5), (d, off=15), (e, off=17), (f, off=13) >> >> Within a subvol backend filesystem provides rewinddir guarantee for the >> dentries present on that subvol. However, across subvols it is the >> responsibility of DHT to provide the above guarantee. Which means we >> should've some well defined order in which we send readdir calls (Note that >> order is not well defined if we do a parallel readdir across all subvols). >> So, DHT has sequential readdir which is a well defined order of reading >> dentries. >> >> To give an example if we have another subvol - subvol2 - (in addiction to >> the subvol above - say subvol1) with following listing: >> 1. g, off=16 >> 2. h, off=20 >> 3. i, off=3 >> 4. j, off=19 >> >> With parallel readdir we can have many ordering like - (a, b, g, h, i, c, >> d, e, f, j), (g, h, a, b, c, i, j, d, e, f) etc. Now if we do (with readdir >> done parallely): >> >> 1. A complete listing of the directory (which can be any one of 10P1 = 10 >> ways - I hope math is correct here). >> 2. Do rewinddir (20) >> >> We cannot predict what are the set of dentries that come _after_ offset >> 20. However, if we do a readdir sequentially across subvols there is only >> one directory listing i.e, (a, b, c, d, e, f, g, h, i, j). So, its easier >> to support rewinddir. >> >> If there is no POSIX requirement for rewinddir support, I think a >> parallel readdir can easily be implemented (which improves performance >> too). But unfortunately rewinddir is still a POSIX requirement. This also >> opens up another possibility of a "no-rewinddir-support" option in DHT, >> which if enabled results in parallel readdirs across subvols. What I am not >> sure is how many users still use rewinddir? If there is a critical mass >> which wants performance with a tradeoff of no rewinddir support this can be >> a good feature. >> >> +gluster-users to get an opinion on this. >> >> regards, >> Raghavendra >> >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > Gluster-devel mailing list >> > Gluster-devel at gluster.org >> > http://www.gluster.org/mailman/listinfo/gluster-devel >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel at gluster.org >> http://www.gluster.org/mailman/listinfo/gluster-devel >> > > > > -- > Raghavendra G > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel >-- Raghavendra G -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20161103/5991f338/attachment.html>