Hi all, I''m new to this list and ZFS, so forgive me if I''m re-hashing an old topic. I''m also using ZFS on FreeBSD not Solaris, so forgive me for being a heretic ;-) I recently setup a home NAS box and decided that ZFS is the only sensible way to manage 4TB of disks. The primary use of the box is to serve my telly (actually a Mac mini). This is using afp (via netatalk) to serve space to the telly for storing and retrieving video. The video tends to be 2-4GB files that are read/written sequentially at a rate in the region of 800KB/s. Unfortunately, the performance has been very choppy. The video software assumes it''s talking to fast local storage and thus makes little attempt to buffer. I spent a long time trying to figure out the network problem before determining that the problem is actually in reading from the FS. This is a pretty cheap box, but it can still sustain 110GB/s off the array and low milliseconds access times. So there really is no excuse for not being able to serve up 800KB/s in an even fashion. After some experimentation I have determined that the problem is prefetching. Given this thing is mostly serving sequentially at a low, even rate it ought to be perfect territory for prefetching. I spent the weekend reading the ZFS code (bank holiday fun eh?) and running some experiments and think the problem is in the interaction between the prefetching code and the running processes. (Warning: some of the following is speculation on observed behaviour and may be rubbish.) The behaviour I see is the file streaming stalling whenever the prefetch code decides to read some more blocks. The dmu_zfetch code is all run as part of the read() operation. When this finds itself getting close to running out of prefetched blocks it queues up requests for more blocks - 256 of them. At 128KB per block, that''s 32MB of data it requests. At this point it should be asynchronous and the caller should get back control and be able to process the data it just read. However, my NAS box is a uniprocessor and the issue thread is higher priority than user processes. So, in fact, it immediately begins issuing the physical reads to the disks. Given that modern disks tend to prefetch into their own caches anyway, some of these reads are likely to be served up instantly. This causes interrupts back into the kernel to deal with the data. This queues up the interrupt threads, which are also higher priority than user processes. These consume a not-insubstantial amount of CPU time to gather, checksum and load the blocks into the ARC. During which time, the disks have located the other blocks and started serving them up. So what I seem to get is a small "perfect storm" of interrupt processing. This delays the user process for a few hundred milliseconds. Even though the originally requested block was *in* the cache! To add insult to injury the, user process in this case, when it finally regains the CPU and returns the data to the the caller, then sleeps for a couple of hundred milliseconds. So prefetching, instead of evening-out reading and reducing jitter, has produced the worst case performance of compressing all of the jitter into one massive lump every 40 seconds (32MB / 800K). I get reasonably even performance if I disable prefetching or if I reduce the zfetch_block_cap to 16-32 blocks instead of 256. Other than just taking this opportunity to rant, I''m wondering if anyone else has seen similar problems and found a way around them? Also, to any ZFS developers: why does the prefetching logic follow the same path as a regular async read? Surely these ought to be way down the priority list? My immediate thought after a weekend of reading the code was to re-write it to use a low priority prefetch thread and have all of the dmu_zfetch() logic in that instead of in-line with the original dbuf_read(). Jonathan PS: Hi Darren!