For a synchronous write to a pool with mirrored disks, does the write unblock after just one of the disks'' write caches is flushed, or only after all of the disks'' caches are flushed? This message posted from opensolaris.org
> For a synchronous write to a pool with mirrored disks, does the write > unblock after just one of the disks'' write caches is flushed, > or only after all of the disks'' caches are flushed?The latter. We don''t consider a write to be committed until the data is on stable storage at full replication. This might seem overly paranoid, because the only we could lose a transaction would be if we acked a write to some over-the-network app after writing to one side of a mirror, crashed, came back up, and then that side of the mirror failed before intent log replay (which happens during boot when we mount ZFS filesystems). What are the chances that you''d lose power *and* a lose disk at the same time? If they were independent events, it would be unlikely. But the thing is, power failures often cause disk failures. The probability of coupled failure isn''t low enough to ingore. Jeff
Jeff Bonwick wrote:>> For a synchronous write to a pool with mirrored disks, does the write >> unblock after just one of the disks'' write caches is flushed, >> or only after all of the disks'' caches are flushed? > The latter. We don''t consider a write to be committed until > the data is on stable storage at full replication.[snip] That makes sense, but there''s a point at which ZFS must abandon this strategy; otherwise, the malfunction of one disk in a 3-way mirror could halt the entire system, when what''s probably desired is for the system to keep running in degraded mode with only 2 remaining functional disks in the mirror. But then of course there would be the problem of divergent disks in a mirror; suppose there''s a system with one pool on a pair of mirrored disks, and system root is on that pool. The disks are external, with interface cables running across the room. The system is running fine until my dog trips over the cable for disk #2. Down goes disk #2, and the system continues running fine, with a degraded pool, and during operation continues modifying various files. Later, the dog chews through the cable for disk #1. Down goes the system. I don''t have a spare cable, so I just plug in disk #2, and restart the system. The system continues running fine, with a degraded pool, and during operation continues modifying various files. I go to the store to buy a new cable for disk #1, and when I come back, I trip over the cable for disk #2. Down goes the system. I plug #2 back in, replace the cable for #1, and restart the system. At this point, the system comes up with its root on a pool with divergent mirrors, and... ? This message posted from opensolaris.org
James C. McPherson
2006-Jul-30 22:47 UTC
[zfs-discuss] Re: Flushing synchronous writes to mirrors
Andrew wrote:> Jeff Bonwick wrote: >>> For a synchronous write to a pool with mirrored disks, does the write >>> unblock after just one of the disks'' write caches is flushed, or >>> only after all of the disks'' caches are flushed? >> The latter. We don''t consider a write to be committed until the data is >> on stable storage at full replication. > [snip] > > That makes sense, but there''s a point at which ZFS must abandon this > strategy; otherwise, the malfunction of one disk in a 3-way mirror could > halt the entire system, when what''s probably desired is for the system to > keep running in degraded mode with only 2 remaining functional disks in > the mirror. > > But then of course there would be the problem of divergent disks in a > mirror; suppose there''s a system with one pool on a pair of mirrored > disks, and system root is on that pool. The disks are external, with > interface cables running across the room. The system is running fine > until my dog trips over the cable for disk #2. Down goes disk #2, and the > system continues running fine, with a degraded pool, and during operation > continues modifying various files. Later, the dog chews through the cable > for disk #1. Down goes the system. I don''t have a spare cable, so I just > plug in disk #2, and restart the system. The system continues running > fine, with a degraded pool, and during operation continues modifying > various files. I go to the store to buy a new cable for disk #1, and when > I come back, I trip over the cable for disk #2. Down goes the system. I > plug #2 back in, replace the cable for #1, and restart the system. At > this point, the system comes up with its root on a pool with divergent > mirrors, and... ?Wow, you must be the unluckiest person ever! And such a strong dog..... So when the system comes back up, your uberblock and your ditto blocks will be examined, and those which have incorrect checksums will be detected and fixed. James C. McPherson -- Solaris Datapath Engineering Storage Division Sun Microsystems