I have been chasing down the way the inode and page caches are structured and handled, but there is a mystery I have not been able to track down yet. How does the I/O queue that dirty inodes and pages are put into when it's time to flush them out to disk get picked up? Also, where are the sources for the I/O schedulers? I haven't been able to locate them yet either. I know this should not be too hard to locate, but it seems as though there's some magic involved when the inode/page gets put in the i/o queue and then <something happens here> and the i/o is scheduled and, eventually, performed. Thanks. Mark Hull-Richter Linux Kernel Engineer (949) 680-3082 - Office (949) 680-3001 - Fax (949) 632-8403 - Mobile mhull-richter at datallegro.com <mailto:mhull-richter at datallegro.com> www.datallegro.com <http://www.datallegro.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos/attachments/20070321/e16360c2/attachment.html>
On Wednesday 21 March 2007, Mark Hull-Richter wrote:> I have been chasing down the way the inode and page caches are > structured and handled, but there is a mystery I have not been able to > track down yet. How does the I/O queue that dirty inodes and pages are > put into when it's time to flush them out to disk get picked up? Also, > where are the sources for the I/O schedulers? I haven't been able to > locate them yet either.The actual I/O schedulers are in-kernel thingies that live in the $KERNELSRC/block directory. There are (normally) four: noop, as, deadline and cfq (cfq being the default in centos-4). However, it doesn't seem like to me that you're actually looking for the I/O schedulers but more for why and when the kernel decides to flush dirty pages (I've been known to be wrong though ;-). When a page has yet to be committed to disk it's called a dirty page (see eg. /proc/meminfo). Dirty pages are flushed to disk either explicitly (umount, sync, ..) or periodically (the kernel won't allow a dirty page to age forever). When the they are finally flushed, then the I/O schedulers for the involved block device(s) gets involved as they handle the queueing of all requests for their device. I hope some of that made sense :-) /Peter> I know this should not be too hard to locate, but it seems as though > there's some magic involved when the inode/page gets put in the i/o > queue and then <something happens here> and the i/o is scheduled and, > eventually, performed.-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <http://lists.centos.org/pipermail/centos/attachments/20070321/9a3b7484/attachment.sig>
> -----Original Message----- > From: centos-bounces at centos.org [mailto:centos-bounces at centos.org] On > Behalf Of Peter Kjellstrom > Sent: Wednesday, March 21, 2007 11:51 AM > To: centos at centos.org > Subject: Re: [CentOS] Kernel question(s): I/O handling > > > I hope some of that made sense :-) >Actually, yes, thanks. Now the big one - please verify if you can (anyone). As I read the source, it appears that the page cache is flushed one page at a time, regardless of any kind of order of flushing that might optimize the throughput. Not only that, the superblocks get flushed in the same order every time, assuming that there are no mount changes and the superblock queue remains constant. I saw no markers to indicate to whoever happens to be doing the cache flush at the time to pick up at a later point (i.e., continue where it left off on the last flush), which means that a really busy file system near the head of the queue can cause i/o problems with all file systems (superblocks) behind it. Even if one increases the count and range of pdflush processes, this would be more or less true (although, admittedly, the head FS would have to be REALLY busy to do this). Unfortunately, that is our situation - we work the file systems to death, almost, by doing huge i/os (like 100Mb+ at a time) in sequential order in the various files, with more than one such transaction likely to be occurring concurrently, hopefully on more than one disk. This would explain the first question I asked here: why the page cache is so much slower than direct i/o for huge i/os. (Wait - does this mean that if I stick around long enough and keep digging on my own, I'll find all the answers to the questions that no one else has? Heavens to murgatroid! :-) mhr