The following is a short note I wrote a while back,
mainly in response to a discussion of filesystem
fragmentation in Windows operating systems. Most
of what I saw also applies to *nix systems.
Jon Forrest
----------------
Why PC Disk Fragmentation Doesn't Matter (much)
Jon Forrest (jlforrest at berkeley.edu)
[The following is an hypothesis. I don't have
any real data to back this up. I'd like to know
if I'm overlooking any technical details.]
Disk fragmentation can mean several things.
On one hand it can mean that the disk blocks
that a file occupies aren't right next to each
other physically. The more pieces that make up a file, the
more fragmented the file is. Or, it can mean
that the unused blocks on a disk aren't all right
next to each other. Win9X, Windows 2000, and Windows XP
come with defragmentation programs. Such programs
are also available for other Microsoft and non-Microsoft
operating systems from commercial vendors.
The question of whether a fragmented disk really
results in anything bad has always been a topic
of heated discussion. On one side of the issue
the vendors of disk defragmentation programs can
always be found. The other side is usually occupied
by skeptical system managers, such as yours truly.
For example, the following claim is made by the
vendor of one commercial vendor:
"Disk fragmentation can cripple performance even worse
than running with insufficient memory. Eliminate it
and you've eliminated the primary performance bottleneck
plaguing even the best-equipped systems." But can it, and
does it? The user's guide for this product spends some 60 pages
describing how to run the product but never justifies this
claim.
I'm not saying that fragmentation is good. That's one reason
why you can't buy a product whose purpose is to fragment a disk.
But, it's hard to imagine how fragmentation can cause any noticeable
performance problems. Here's why:
1) The greatest benefit from having a contiguous file would
be when the whole file is read (let's stick with reads) in
one I/O operation. The would result in the minimal amount of
disk arm movement, which is the slowest part of a disk I/O
operation. But, this isn't the way most I/Os take place. Instead,
most I/Os are fairly small. Plus, and this is the kicker, on
a modern multitasking operating system, those small I/Os are coming
from different processes reading from different files. Assuming that the
data to be read isn't in a memory cache, this means that the disk arm is
going to be flying all over the place, trying to satisfy all
the seek operations being issued by the operating system.
Sure, the operating system, and maybe even the disk controller,
might be trying to re-order I/Os but there's only so much of
this that can be done. A contiguous file doesn't really help
much because there's a very good change that the disk arm is
going to have to move elsewhere on the disk between the time
that pieces of a file are read.
2) The metadata for managing a filesystem is probably
cached in RAM. This means when a file is created, or
extended, the necessary metadata updates are done at memory
speed, not at disk speed. So, the overhead of allocating
multiple pieces for a new file is probably in the noise.
Of course, the in-memory metadata eventually has to be flushed
to disk but this is usually done after the original I/O completes,
so there won't be any visible slowdown in the program that issued
the I/O.
3) Modern disks do all kind of internal block remapping so there's
no guarantee that what appears to be contiguous to the operating
system is actually really and truly contiguous on the disk. I have
no idea how often this possibility occurs, or how bad the skew is
between "fake" blocks and "real" blocks. But, it could
happen.
So, go ahead and run your favorite disk defragmenter. I know I do.
Now that W2K and later have an official API for moving files in an
atomic operation, such programs probably can't cause any harm. But
don't be surprised if you don't see any noticeable performance
improvements.
The mystery that really puzzles and sometimes frightens me is
why an NTFS file system becomes fragmented so easily in the first
place. Let's say I'm installing Windows 2000 on a newly formatted
20GB disk. Let's say that the total amount of space used by the
new installation is 600MB. Why should I see any fragmented files,
other than registry files, after such an installation? I have no
idea. My thinking is that all files that aren't created and then
later extended should be able to be created contiguously to begin with.