Shaohua Li
2011-Jan-19 01:15 UTC
[PATCH v3 0/5]add new ioctls to do metadata readahead in btrfs
Hi, We have file readahead to do asyn file read, but has no metadata readahead. For a list of files, their metadata is stored in fragmented disk space and metadata read is a sync operation, which impacts the efficiency of readahead much. The patches try to add meatadata readahead for btrfs. It has two advantages. One is make metadata read async, the other is significant reducing disk I/O seek. In btrfs, metadata is stored in btree_inode. Ideally, if we could hook the inode to a fd so we could use existing syscalls (readahead, mincore or upcoming fincore) to do readahead, but the inode is hidden, there is no easy way for this from my understanding. Another problem is we need check page referenced bit to make sure if a page is valid, which isn''t ok doing this in fincore/mincore. And in metadata readahead, filesystem need specific checking like the patch4. Doing the checking in current API (for example fadvise) will mess things too. So we add two ioctls for this. One is like readahead syscall, the other is like micore/fincore syscall. Under a harddisk based netbook with Meego, the metadata readahead reduced about 3.5s boot time in average from total 16s. v2->v3: 1. fixed some issues Arnd pointed out 2. rebased to latest git 3. remove the ''updated'' page flag check from patch 2 as suggested by Fengguang. v1->v2: 1. Added more comments and fix return values suggested by Andrew Morton 2. fix a race condition pointed out by Yan Zheng initial post: http://marc.info/?l=linux-fsdevel&m=129222493406353&w=2 Thanks, Shaohua
Andrew Morton
2011-Jan-19 20:34 UTC
Re: [PATCH v3 0/5]add new ioctls to do metadata readahead in btrfs
On Wed, 19 Jan 2011 09:15:15 +0800 Shaohua Li <shaohua.li@intel.com> wrote:> We have file readahead to do asyn file read, but has no metadata > readahead. For a list of files, their metadata is stored in fragmented > disk space and metadata read is a sync operation, which impacts the > efficiency of readahead much. The patches try to add meatadata readahead > for btrfs. It has two advantages. One is make metadata read async, the > other is significant reducing disk I/O seek. > In btrfs, metadata is stored in btree_inode. Ideally, if we could hook > the inode to a fd so we could use existing syscalls (readahead, mincore > or upcoming fincore) to do readahead, but the inode is hidden, there is > no easy way for this from my understanding. Another problem is we need > check page referenced bit to make sure if a page is valid, which isn''t > ok doing this in fincore/mincore. And in metadata readahead, filesystem > need specific checking like the patch4. Doing the checking in current > API (for example fadvise) will mess things too. So we add two ioctls for > this. One is like readahead syscall, the other is like micore/fincore > syscall.Has anyone looked at implementing this for filesystems other than btrfs? Have the ext4 guys taken a look? Did they see any impediments to implementing it for ext4?> Under a harddisk based netbook with Meego, the metadata readahead > reduced about 3.5s boot time in average from total 16s.That''s a respectable speedup. And it *needs* to be a good speedup, given how hacky all of this is! But then.. reducing bootup time on a laptop/desktop/server by 3.5s isn''t exactly a world-shattering benefit, is it? Is it worth all the hacky code? It would be much more valuable if those 3.5 seconds were available to devices which really really care about bootup times, but very few of those devices use rotating disks nowadays, I expect? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Nicol
2011-Jan-19 21:33 UTC
Re: [PATCH v3 0/5]add new ioctls to do metadata readahead in btrfs
On Wed, Jan 19, 2011 at 2:34 PM, Andrew Morton <akpm@linux-foundation.org> wrote:> It would be much more valuable if those 3.5 seconds were available to > devices which really really care about bootup times, but very few of > those devices use rotating disks nowadays, I expect?And don''t rotating disk modules read and buffer whole tracks, doing their own readahead, anymore, anyway? Isn''t that part of what "on-disk cache" does? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Shaohua Li
2011-Jan-20 02:27 UTC
Re: [PATCH v3 0/5]add new ioctls to do metadata readahead in btrfs
On Thu, 2011-01-20 at 05:33 +0800, David Nicol wrote:> On Wed, Jan 19, 2011 at 2:34 PM, Andrew Morton > <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote: > > > It would be much more valuable if those 3.5 seconds were available to > > devices which really really care about bootup times, but very few of > > those devices use rotating disks nowadays, I expect? > > And don''t rotating disk modules read and buffer whole tracks, doing > their own readahead, anymore, anyway? Isn''t that part of what "on-disk > cache" does?The disk readahead and the metadata readahead is completely different, you didn''t even look at the patch or log before saying this.
Shaohua Li
2011-Jan-20 02:34 UTC
Re: [PATCH v3 0/5]add new ioctls to do metadata readahead in btrfs
On Thu, 2011-01-20 at 04:34 +0800, Andrew Morton wrote:> On Wed, 19 Jan 2011 09:15:15 +0800 > Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > > We have file readahead to do asyn file read, but has no metadata > > readahead. For a list of files, their metadata is stored in fragmented > > disk space and metadata read is a sync operation, which impacts the > > efficiency of readahead much. The patches try to add meatadata readahead > > for btrfs. It has two advantages. One is make metadata read async, the > > other is significant reducing disk I/O seek. > > In btrfs, metadata is stored in btree_inode. Ideally, if we could hook > > the inode to a fd so we could use existing syscalls (readahead, mincore > > or upcoming fincore) to do readahead, but the inode is hidden, there is > > no easy way for this from my understanding. Another problem is we need > > check page referenced bit to make sure if a page is valid, which isn''t > > ok doing this in fincore/mincore. And in metadata readahead, filesystem > > need specific checking like the patch4. Doing the checking in current > > API (for example fadvise) will mess things too. So we add two ioctls for > > this. One is like readahead syscall, the other is like micore/fincore > > syscall. > > Has anyone looked at implementing this for filesystems other than > btrfs? Have the ext4 guys taken a look? Did they see any impediments > to implementing it for ext4?Not yet. I do expect ext4 guys can check it. From my understanding, it should be relatively easy to do it in ext filesystems.> > Under a harddisk based netbook with Meego, the metadata readahead > > reduced about 3.5s boot time in average from total 16s. > > That''s a respectable speedup. And it *needs* to be a good speedup, > given how hacky all of this is! > > But then.. reducing bootup time on a laptop/desktop/server by 3.5s > isn''t exactly a world-shattering benefit, is it? Is it worth all the > hacky code?a laptop/desktop/server need read more data from hard disks, this will give more bootup time saving I think, though not tested yet.> It would be much more valuable if those 3.5 seconds were available to > devices which really really care about bootup times, but very few of > those devices use rotating disks nowadays, I expect?Currently most popular netbooks are using rotating disks actually. And this will benefit laptop/desktop too. Thanks, Shaohua
Andrew Morton
2011-Jan-20 02:46 UTC
Re: [PATCH v3 0/5]add new ioctls to do metadata readahead in btrfs
On Thu, 20 Jan 2011 10:34:18 +0800 Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:> > > Under a harddisk based netbook with Meego, the metadata readahead > > > reduced about 3.5s boot time in average from total 16s. > > > > That''s a respectable speedup. And it *needs* to be a good speedup, > > given how hacky all of this is! > > > > But then.. reducing bootup time on a laptop/desktop/server by 3.5s > > isn''t exactly a world-shattering benefit, is it? Is it worth all the > > hacky code? > a laptop/desktop/server need read more data from hard disks, this will > give more bootup time saving I think, though not tested yet.Well, the whole point of the patch is to improve boot times, so the more boot-time testing you can do, the better that is!> > It would be much more valuable if those 3.5 seconds were available to > > devices which really really care about bootup times, but very few of > > those devices use rotating disks nowadays, I expect? > Currently most popular netbooks are using rotating disks actually. And > this will benefit laptop/desktop too.But my point is that three seconds boot-time improvement for a system which has an uptime of days or months isn''t terribly exciting. What *would* be terribly exciting is a three-second improvement for cameras, cellphones, etc. But they don''t use spinning disks. Can we expect *any* benefit for flash-type storage devices? If so, how much?
Shaohua Li
2011-Jan-20 02:58 UTC
Re: [PATCH v3 0/5]add new ioctls to do metadata readahead in btrfs
On Thu, 2011-01-20 at 10:46 +0800, Andrew Morton wrote:> On Thu, 20 Jan 2011 10:34:18 +0800 Shaohua Li <shaohua.li-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote: > > > > > Under a harddisk based netbook with Meego, the metadata readahead > > > > reduced about 3.5s boot time in average from total 16s. > > > > > > That''s a respectable speedup. And it *needs* to be a good speedup, > > > given how hacky all of this is! > > > > > > But then.. reducing bootup time on a laptop/desktop/server by 3.5s > > > isn''t exactly a world-shattering benefit, is it? Is it worth all the > > > hacky code? > > a laptop/desktop/server need read more data from hard disks, this will > > give more bootup time saving I think, though not tested yet. > > Well, the whole point of the patch is to improve boot times, so the > more boot-time testing you can do, the better that is!each distribution uses its own readahead (data readahead) daemon, it''s time-cost to change the daemon, but I''ll check if I get some data in a desktop.> > > It would be much more valuable if those 3.5 seconds were available to > > > devices which really really care about bootup times, but very few of > > > those devices use rotating disks nowadays, I expect? > > Currently most popular netbooks are using rotating disks actually. And > > this will benefit laptop/desktop too. > > But my point is that three seconds boot-time improvement for a system > which has an uptime of days or months isn''t terribly exciting. > > What *would* be terribly exciting is a three-second improvement for > cameras, cellphones, etc. But they don''t use spinning disks. > > Can we expect *any* benefit for flash-type storage devices? If so, how > much?There should be no benefit for high end SSD, because they have high throughput even for random IO. For low end flash-type storage devices, this should have a little benefit, but won''t expect much. I can''t test a camera or cellphone, I can test a USB disk in a desktop if you like. Thanks, Shaohua