Sorry for the cross-posting, but nor the Author nor freebsd-bugs did acknowledge my message, and I think this is a very serious bug in vinum, leading to loss of data... If these are not the correct foruns for this, please forgive me and tell me which is the correct one. PS: Please CC: me, since I'm not currently subscribed to these lists. ===================================================Hi Greg, I've been a big fan of vinum since it's beggining. I use it for RAID0 and RAID1 solution for lots of servers. In some RAID0 (stripe) configurations, though, I've had some serious problems. If an underlying disk fails, the respective plex and volume do not fail, as they should. This leads to full corruption of data, but worst of that, leads to a system which believes the data is safe. In one ocasion, for example, the backup ran and overwrote good data with bad data, full of zeros. I am not fully aware of vinum programming details, but a quick look at 4.9-STABLE, in file vinumstate.c, dated Jul, 7, 2000, at line 588, function update_volume_state() sets volume state to up if plex state is corrupt or better for at least one plex: for (plexno = 0; plexno < vol->plexes; plexno++) { struct plex *plex = &PLEX[vol->plex[plexno]]; /* point to the plex */ if (plex->state >= plex_corrupt) { /* something accessible, */ vol->state = volume_up; break; } } I think this should be like: if (plex->state > plex_corrupt) { /* something accessible, */ Or, in other words, volume state is up only if plex state is degraded or better. I did not test this, since the situation is not easy to reproduce, but I think it depends only on the real meaning of the "corrupt" state. Thanks in advance for your attention, Jonny
On Tuesday, 30 March 2004 at 0:32:38 -0300, Joo Carlos Mendes Lus wrote:> Sorry for the cross-posting, but nor the Author nor freebsd-bugs did > acknowledge my message, and I think this is a very serious bug in vinum, > leading to loss of data... > > If these are not the correct foruns for this, please forgive me and > tell me which is the correct one. > > PS: Please CC: me, since I'm not currently subscribed to these > lists.Sorry for the lack of response. Yes, I saw it, and so did Lukas Ertl, and we've been discussing it. This list is probably not the best.> > ===================================================> Hi Greg, > > I've been a big fan of vinum since it's beggining. I use it for RAID0 > and RAID1 solution for lots of servers. > > In some RAID0 (stripe) configurations, though, I've had some serious > problems. If an underlying disk fails, the respective plex and volume do > not fail, as they should. This leads to full corruption of data, but worst > of that, leads to a system which believes the data is safe. In one ocasion, > for example, the backup ran and overwrote good data with bad data, full of > zeros. > > I am not fully aware of vinum programming details, but a quick look at > 4.9-STABLE, in file vinumstate.c, dated Jul, 7, 2000, at line 588, function > update_volume_state() sets volume state to up if plex state is corrupt or > better for at least one plex: > > for (plexno = 0; plexno < vol->plexes; plexno++) { > struct plex *plex = &PLEX[vol->plex[plexno]]; /* point to the plex */ > if (plex->state >= plex_corrupt) { /* something accessible, */ > vol->state = volume_up; > break; > } > } > > I think this should be like: > > if (plex->state > plex_corrupt) { /* something accessible, */Basically, this is a feature and not a bug. A plex that is corrupt is still partially accessible, so we should allow access to it. If you have two striped plexes both striped between two disks, with the same stripe size, and one plex starts on the first drive, and the other on the second, and one drive dies, then each plex will lose half of its data, every second stripe. But the volume will be completely accessible. I think that the real issue here is that Vinum should have returned an I/O error for accesses to the defective parts. How did you perform the backup? Greg -- Note: I discard all HTML mail unseen. Finger grog@FreeBSD.org for PGP public key. See complete headers for address and phone numbers. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20040330/93899465/attachment.bin
On Tuesday, 30 March 2004 at 14:37:00 +0200, Lukas Ertl wrote:> On Fri, 26 Mar 2004, Joao Carlos Mendes Luis wrote: > >> I think this should be like: >> >> if (plex->state > plex_corrupt) { /* something accessible, */ >> >> Or, in other words, volume state is up only if plex state is degraded >> or better. > > You are right, this is a bug,No, see my reply.> The correct solution, of course, is to check if the data is valid > before changing the volume state, but turn might turn out to be a > very complex check.Well, the minimum correct solution is to return an error if somebody tries to access the inaccessible part of the volume. That should happen, and I'm confused that it doesn't appear to be doing so in this case. On Tuesday, 30 March 2004 at 11:07:55 -0300, Joo Carlos Mendes Lus wrote:> Greg 'groggy' Lehey wrote: >> On Tuesday, 30 March 2004 at 0:32:38 -0300, Joo Carlos Mendes Lus wrote: >>> >> Basically, this is a feature and not a bug. A plex that is corrupt is >> still partially accessible, so we should allow access to it. If you >> have two striped plexes both striped between two disks, with the same >> stripe size, and one plex starts on the first drive, and the other on >> the second, and one drive dies, then each plex will lose half of its >> data, every second stripe. But the volume will be completely >> accessible. > > A good idea if you have both stripe and mirror, to avoid discarding the > whole disk. But, IMHO, if some part of the disk is inacessible, the volume > should go down, and IFF the operator wants to try recovery, should use the > setstate command. This is the safe state.setstate is not safe. It bypasses a lot of consistency checking. One possibility would be: 1. Based on the plex states, check if all of the volume is still accessible. 2. If not, take the volume into a "flaky" state. 3. *Somehow* ensure that the volume can't be accessed again as a file system until it has been remounted. 4. Refuse to remount the file system without the -f option. The last two are outside the scope of Vinum, of course. Discussion? -- Note: I discard all HTML mail unseen. Finger grog@FreeBSD.org for PGP public key. See complete headers for address and phone numbers. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20040331/e74f504b/attachment.bin