Karl Denninger
2006-Sep-09 10:38 UTC
ARRRRGH! Guys, who's breaking -STABLE's GMIRROR code?!
This is not cool folks. Anyone know what I have to roll back to - and what files I have to roll back - to stop this cluster-##@kery? tty ad4 ad6 twed0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 224 453 0.61 0 0.00 120.16 427 50.06 0.61 0 0.00 2 0 4 2 92 See that? There's nothing really running. What I tried to do was "gmirror insert b500 ad4s1" The command took, but NO IO WAS TAKEN TO THE TARGET DRIVE FOR REBUILDING; the SOURCE disk was locked in a 100% I/O run, and after stopping the rebuild THE I/O INFINITE LOOP IS STILL GOING ON! I had a PRODUCTION MACHINE go down on my last night over this when it attempted to run its backup process and wedged due to process table overflow; the first attempt apparently never finished the day before and the second, to a SECOND backup disk (I have a rolling disk backup system using GMIRROR's resync) caused the system to wedge in an I/O wait. This was also not cleanly restartable, as the root partition had multiple error on it that fsck -p couldn't fix. This is a SEVERE emergency in that anyone who has a disk that has to be rebuilt under -STABLE right now (sources as of 7 September) is screwed, blued and tattooed. That PRODUCTION machine is running UNPROTECTED right now (no mirroring) as a consequence of this, and I can neither back it up using the usual mirror NOR restore its redundancy! I see only one comment about GMIRROR changes in the commitlogs since 9/1, and it claims to be (mostly) cosmetic. Obviously not! -- -- Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://genesis3.blogspot.com Musings Of A Sentient Mind
Patrick M. Hausen
2006-Sep-09 10:55 UTC
ARRRRGH! Guys, who's breaking -STABLE's GMIRROR code?!
Hi! On Sat, Sep 09, 2006 at 12:38:13PM -0500, Karl Denninger wrote:> This is not cool folks. > ...I experienced the same problem - luckily on a lab machine. As much as I understand your anger, -stable is not guaranteed bug free. And to answer your question: RELENG_6_1 doesn't show this problem. I recommend running RELENG_X_Y instead of RELENG_X for recent values of X and Y on production systems, anyway. HTH, Patrick M. Hausen Leiter Netzwerke und Sicherheit -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25 Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de
On Saturday 09 September 2006 19:38, Karl Denninger wrote:> This is not cool folks.Want a refund? -- /"\ Best regards, | mlaier@freebsd.org \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | mlaier@EFnet / \ ASCII Ribbon Campaign | Against HTML Mail and News -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060909/e3b84513/attachment.pgp
Christopher Schulte
2006-Sep-10 09:49 UTC
ARRRRGH! Guys, who's breaking -STABLE's GMIRROR code?!
> -----Original Message----- > From: owner-freebsd-stable@freebsd.org > [mailto:owner-freebsd-stable@freebsd.org] On Behalf Of Patrick J Okui > Sent: Sunday, September 10, 2006 10:22 AM > To: Karl Denninger > Cc: freebsd-stable@freebsd.org > Subject: Re: ARRRRGH! Guys, who's breaking -STABLE's GMIRROR code?! > > You can track changes to a particular release - say by using > RELENG_6_1 rather than RELENG_6. In which case, would you > still say you are tracking STABLE?Well, that depends. For security and "critical fixes" (as the handbook phrases it) you can track RELENG_6_1 (in the case of 6.1-RELEASE) and be happy. But what happens if the needed fix isn't security or "critical" in the minds of the FreeBSD developers? At that point you either need to wait for the next RELEASE, manually merge fixes into your production source (which depending on the fix(s) could be non-trivial) or cross your fingers and follow -STABLE. This problem isn't specific to FreeBSD (or unix in general) by Any means, of course. Sure, we could broaden the scope of RELENG_X_Y. Or introduce a new branch that's closer to -STABLE yet tuned for something like, "security, critical and major fixes" for production systems. I'm not sure either of those options are preferable, would be effective in alleviating the problem, or even workable in the first place. Personally, I've been served quite well for many years with the current configuration. Since I don't track -STABLE on anything important (or more accurately have yet NEEDED to do so), I've never been hit by any of these transient issues that crop up from time to time and can elicit loud complaints. --Chris
> This should be documented somewhere clearly then, as my understanding was that -STABLE meant that anything MFCd back to it *was* tested and deemed stable ... and yes, I do run stable, and yes, I do expect to hit the occasional 'oopses', but "blantant and obvious bugs due to insufficient testing", IMHO, doesn't classify as an 'oops' .... >Guys, we're talking about software. Have you ever seen a piece of software which has been really bug-free? Not the hello-world, I'm talking about real software. Also, you should never go with -STABLE on a production server. I'm sure this has been made clear in the handbook. If it's really a that import server in production use, go with a RELEASE. -STABLE is not a technology playground as CURRENT but should be seen as a BETA testing system. If that's not the case, then why use RELEASE at all? Sure you may blame a developer for not testing enough but you're on your own if you use beta quality software on your production systems. As a developer I've seen many bugs which haven't been found during testing and I know it's nearly impossible to find _all_ bugs while testing. I've seen applications failing just because the user typed the wrong key at the wrong time (or an unexpected key). As a user I'm thankful for bugs being fast fixed bugs but on the other side I really know what I'm doing when using -STABLE software on my system. I do see this as a give-back to the community to find bugs early before -RELEASE. Also keep in mind most kernel hackers do kernel hacking in their spare time. Everyone using FreeBSD (or any other OS system) is profiting from their spare time and it's unfair to be not that polite. And back to the issue: The gmirror bug has already been fixed and I posted a note to the ML hours before the first "who the f... did cause that bug" post. A short look into ML postings would have made this thread needless. If you blame developers, then please shut off your computer. my2ct Volker
On Sun, 10 Sep 2006 21:28:00 -0400, Lowell Gilbert <freebsd-stable-local@be-well.ilk.org> wrote:> It *is* in the Handbook's glossary.Geez. I can't believe all those times I've searched for the FreeBSD jargon, the glossary *never* appeared in my results, so I'm getting to know it right now, after your comment. As I own a copy of Greg's "The Complete FreeBSD", I rarely check the online handbook. Shame on me. Well, now I've bookmarked it (glossary). Thanks for the tip, and sorry for the noise. -- Ricardo Nabinger Sanchez <rnsanchez@{gmail.com,wait4.org}> Powered by FreeBSD "Left to themselves, things tend to go from bad to worse."
Karl Denninger schrieb:> This is not cool folks.I think you misunderstood what -STABLE means. (Or maybe I do?) -STABLE is still a development branch without guarantee of a stable and working operating system. -STABLE guarantees that interfaces remain stable. If you want reliability then jump from release to release. Regards Bj?rn
Pawel Jakub Dawidek
2006-Sep-13 07:46 UTC
ARRRRGH! Guys, who's breaking -STABLE's GMIRROR code?!
On Sat, Sep 09, 2006 at 12:38:13PM -0500, Karl Denninger wrote:> This is not cool folks.I'm really sorry for the breakage. I'm trying to treat -STABLE very gently, unfortunately this time I made a mistake. The change was committed to HEAD at 9 August. The change fixed one bug, but introduced another, which I didn't expected. The change seemed to be trivial and I only tested that it fixes the bug I was tracking down, I haven't looked for regressions. After nearly one month in HEAD, I MFCed the change (at 4 September), because I wanted it to be released in -BETAs, so people can test it if they already didn't in HEAD and I was quite sure that after 1 month in HEAD the change is ok. I found the problem after 4 days (at 8 September) and backed the change out from the RELENG_6 branch. Once again, I'm really sorry, I'm trying not to make such surprises to the users, unfortunately it sometimes happens and you have to be ready that many changes goes to -STABLE branch just before release, so they can be tested by a wider audience. That's why we prepare -BETAs and not release -RELEASEs immediately. I'm not writting this to justify my mistake, just trying to show how you can avoid such bad days in the future. -- Pawel Jakub Dawidek http://www.wheel.pl pjd@FreeBSD.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060913/f30363fa/attachment.pgp