Hello, I've been using FreeBSD since 2.1 release. I really like it. I like everything: kernel, ports, development, maintenance and release process. Unfortunately I have so many problems with release 5, with ATA (FAILURE - READ_DMA, TIMEOUT - READ_DMA, etc) in particularly. Definitely I was using production releases only. Finally after so many system rebuilds I found 5.3-RELEASE-p13 very stable. But because of my previous great experience with FreeBSD I decided to upgrade my 5.3 production to 5.4 production (5.4-RELEASE-p4). Now I have the same ATA problems as I had with first 5.3 "production" release. Please don't tell me that I have bad disk or m/b or ATA controller or cable. I'm pretty sure, 90%, it's not hardware related problem. My advice to FreeBSD release engineering team: - do more testing; - have it tested with hardware what was published in "Hardware Notes"; - do not release it for production if it is not in production quality; - reread again what was written by yourself regarding 4.4 release quality. I wish to say more. This mail was written because I like FreeBSD and I want to continue using it. And wouldn't mind to wait longer for real production quality releases instead of start using something else. And please, I know, it's open source project. Best regards, Real FreeBSD fan
On Wed, Jul 20, 2005 at 08:43:33PM -0700, Alexey Yakimovich wrote:> My advice to FreeBSD release engineering team: > - do more testing; > - have it tested with hardware what was published in "Hardware Notes"; > - do not release it for production if it is not in production quality; > - reread again what was written by yourself regarding 4.4 release > quality. > I wish to say more. > > This mail was written because I like FreeBSD and I want to continue > using it. And wouldn't mind to wait longer for real production quality > releases instead of start using something else. And please, I know, it's > open source project. > > Best regards, > Real FreeBSD fanThank you for expressing my exact same sentiments. I'm still a huge FreeBSD fan and switching to anything else (well, perhaps DragonFly) seems out of the question, but my faith is being tested a lot lately. Having switched some of my companies production machines to 5.4, since it was (in my eyes falsely) called a 'production release', FreeBSD's reputation within the less technical parts of the company has taken a large dent. Luckily they know as well that there's still no comparison to FreeBSD 4.x; top of my ruptime looks like: up 1124+12:15, 1 user, load 2.14, 2.10, 2.02 up 1095+06:22, 11 users, load 2.01, 2.04, 2.02 up 1095+05:31, 5 users, load 2.38, 2.31, 2.24 up 1095+05:06, 2 users, load 1.07, 1.08, 1.01 up 1095+04:46, 0 users, load 1.09, 1.08, 1.01 up 1087+21:04, 1 user, load 1.01, 1.00, 1.00 but then again, I'd really like to use the new 5.x features in a stable environment... Marc also a Real FreeBSD fan :-) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 187 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20050721/a2545f80/attachment.bin
On Wed, 20 Jul 2005, Alexey Yakimovich wrote:> My advice to FreeBSD release engineering team: > - do more testing; > - have it tested with hardware what was published in "Hardware Notes"; > - do not release it for production if it is not in production quality; > - reread again what was written by yourself regarding 4.4 release > quality. > I wish to say more. > > This mail was written because I like FreeBSD and I want to continue > using it. And wouldn't mind to wait longer for real production quality > releases instead of start using something else. And please, I know, it's > open source project.While I agree more testing always helps, and that there are some fairly concete ways we can work to improve testing, there are also some practical realities to how software testing happens, especially for complex software products running on diverse hardware. I have a question for you though: Have you tried, and do you plan to try, our 6.0 test releases before 6.0-RELEASE goes out the door? Specifically, on the hardware you know you're having problems with 5.4 on? The way hardware gets tested is that people who have the hardware run the software on it under a variety of loads, and see if it works. Since a volunteer project of a couple of hundred developers can't buy all known past and future hardware, we have to rely on hardware vendors, software resellers, and FreeBSD users to do some of the testing. In order for that testing to affect a release, it must happen before the release goes out the door, rather than afterwards. And it has to happen sufficiently in advance of the release that someone can do something about the results of failed testing. If hardware isn't tested before the releasee, then inevitably people with that untested hardware are more likely to experience problems. This means that the best way to help us support your hardware is to run our test releases with useful workloads, and then provide feedback if/when they don't work. I realize you're providing feedback now on the 5.x branch, but what you may or may not know is that in the 6.x branch, we have a significant update to the ATA code that may get merged to 5.x, if it proves to be as much better as we hope. This means that we need you to test the future code, not the current code, in order to fix the problems you are experiencing. 90% of useful FreeBSD testing happens when large FreeBSD consumers take release of FreeBSD and deploy them in their testbeds and real-world environments, and find the bugs through the application of high levels of load and obscure hardware configurations. This is why later FreeBSD releases along a -STABLE branch are typically much more stable than earlier ones -- the code has run on millions of machines for untold amounts of load, instead of the thousand or so with a very selected load it's likely to run on during development. This is how all software vendors work, really -- be it Microsoft, or Apple, old-style UNIX vendors, or any of the Linux vendors. Some set of users sits on the bleeding edge and shakes out the early problems, and then the rest of the user base suffers through the later versions to shake out more subtle problems that gradually get resolved. The FreeBSD Project is working on moving towards a more formal testing regimen. This change will help shake out software bugs relating to workload -- i.e., IP stack bugs, file system bugs, etc. But the chances of it having a significant impact on broad hardware testing is very low. So if you have non-production instances of your production hardware, and can reproduce the workloads of your production environment on that hardware, what we would love you to do is run 6-CURRENT on it and tell us if that works better. If it does, then it's a question of back-porting the functionality (if possible) to 5.x. If it doesn't, then we can fix the problem in the active development tree, then merge as makes sense. 4.x became a great success after a quite shaking 3.x release branch, and after some bumps early in 4.x. It got there because of a lot of testing and improvement resulting from production experience. If you didn't have problems with 3.x and 4.x, it's because someone else got there first. The reason I suggest waiting for BETA2 is that BETA2 will have cleaned up support for running 5.x applications. Specifically, there are one or two system calls that have changed in 6.x, and require COMPAT_FREEBSD5 to be compiled into the kernel, which it wasn't in BETA1. Likewise, a number of library version bumps and compatibility pieces will be in BETA2. This will make it easier to test 5.x application workloads on a 6.x install. We take the concerns you've expressed seriously, and you should know that every FreeBSD developer I've talked with in the last few years has been talking about how to improve 5.x stability. The challenge has been to integrate the agressive feature set improvement in 5.x with our stability goals. Much of that improvement has happened for 6.x, and I think you'll find that you're much happier with the general level of testing and support there. This was possible because people running 5.x have provided us a lot of detailed feedback and bug reporting. 6.x is much less agressive in terms of feature set, and cleans up many of the architectural changes that made 5.x such a feature-rich releasee. Your feedback on 6.x sooner rather than latter will improve the quality of the 6.x release, and we'd appreciate it greatly if you could help us test it! Robert N M Watson
Hi, at this point i musttail my paint with you and the other's. I have really made a few tests on one big issue or RELENG_5. At the time as it was early enough to change things, but the guys they have me telled someone else have to fast machines to test ( in my eyes they should test on some sloweer hardware, to become the maximum performance) I have telled some guys the problems that i have found, these Problems are really important for other issues ( performance from applications etc.) but no one would really hear what i have to say, they telled me some unrelevant ( and many bullshit), and they think not before they speak..... so that the result for me ist to wait on RELENG_6, so that i made one or two tests and if the tests do not perform in the right direction then i leave the FreeBSD and going back to Linux or switching eventually to DragonFly. Now my question to you : is the performance of ata-related disk-access under UFS-Filesystem not important for other application, so that the performance can be a half of them that RELENG_4 does? In fact under RELENG_4 i can write a GIG FIle double as fast as under RELENG_5 ! and i would not hear any thing about serial performance or that this is not really like the real world, if i syimulate that with: /usr/bin/time dd if=/dev/zero of=/zerofile bs=1024 count=1024k; this is reality poor! I know we gave all our best, but many people are more arrogant, and think not really... best regards Michael
Hi all, Robert, I was hopping for you to mention user's feedback. I started this thread http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052288.html back with SNAP004. The problem is still present in BETA1. I haven't seen any more advances in the thread, and I know this must be a very localized issue, and that everyone is pretty busy with the upcoming release but I wouldn't want this issue forgotten. Should I submit a PR? As this is a kernel issue, I'm pretty much stuck to 5, although I would prefer start using 6. Yet, another loyal FreeBSD user :-) -- Joao Barros
On 7/21/05, Robert Watson <rwatson@freebsd.org> wrote:> > On Thu, 21 Jul 2005, Joao Barros wrote: > > > I was hopping for you to mention user's feedback. I started this thread > > http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052288.html > > back with SNAP004. The problem is still present in BETA1. I haven't seen > > any more advances in the thread, and I know this must be a very > > localized issue, and that everyone is pretty busy with the upcoming > > release but I wouldn't want this issue forgotten. Should I submit a PR? > > As this is a kernel issue, I'm pretty much stuck to 5, although I would > > prefer start using 6. > > I would suggest always filling a PR if you worry the problem is going to > get lost. While PR's can also get lost, they tend to persist more than > old e-mails. > > There are two likely causes of problems: > > (1) amr driver problems > (2) General PCI/interrupt/ACPI/APIC problemsI suspect the 2nd> > The last few functional changes to amr were by Paull Saab (ps@) and Scott > Long (scottl@), and I'd be tempted to try to chase that option first.Scott replied: The kernel isn't hung, it's just forever waiting for an interrupt from the amr card that it'll never get. Again, this is almost certainly an interrupt routing problem, so please contact John Baldwin <jhb at freebsd.org> and provide him your details. Scott> The first question to answer is whether you can get into the debugger using a > console or serial break, as that will tell us what sort of "hang" you're > seeing. > > You can find detailed instructions for kernel debugging in the handbook. > Try adding BREAK_TO_DEBUGGER, KDB, and KDB as a first step, and see if a > break gets you to the debugger or not. If you can get into the debugger, > submit the information to the PR, forward me the PR receipt, and I'll try > assigning it to one of the above and see if we can get someone to take > some interest in it.After reading this http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052434.html I breaked into the debugger and posted this http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052489.html Is the information there suficient to open a PR?> > If you can't get into the debugger, it's more likely an interrupt/etc > problem. We might try John Baldwin (jhb@) as a possible first contact.John started debugging this with another person with similar problems on 5 and the debugging never got to 6 (no feedback from the other person): http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052727.html> > Robert N M Watson >
At 09:23 AM 21/07/2005, Joao Barros wrote:>John started debugging this with another person with similar problems >on 5 and the debugging never got to 6 (no feedback from the other >person): >http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052727.htmlYes, The other person is me :) I should have some time today to try and test. ---Mike
Mike Tancsa
2005-Jul-24 23:07 UTC
make -j as a stress test (was: Re: Quality of FreeBSD) [WARNING - 6.0-BETA1 still hosed!]
At 02:13 PM 24/07/2005, Karl Denninger wrote:>On Sun, Jul 24, 2005 at 07:58:20PM +0200, Erik Trulsson wrote: > > Most likely the bug you have run into is difficult or impossible to > > reproduce on other hardware than the particular combination you are using. > >FWIW my earlier post about it appearing to work with only one disk on the >chain was incorrect. It still fails - just takes longer (and more load).The LINUX driver seems to kick the card into a lower UDMA mode to work around a bug on that chip. Is this possible to do with your controller via atacontrol ? There seemed to be a number of other bugs as well that needed to be worked around too however. ---Mike
>The VIA SATA onboard controller on my server works (and has worked) >flawlessly. I have two identical 80G Maxtor SATA drives connected to it >and have had absolutely no problems and excellent performance. Even my >Dell Inspirion 5100--with a few hiccups--has made great progress and is >mostly functional.The ATA (definitely not SATA) controller on my Dell Latitude C840 laptop has READ_DMA, WRITE_DMA problems, maybe 2-3 times a day unless I turn off DMA. It hangs for 3-4 seconds, logs errors, then proceeds. It isn't heavily loaded. I tried using Soren's ATA mkIII patches, and now it likes to panic about that same frequency instead of hanging, so I'm going back to the stock RELENG_5 config next chance I get. It's one of these: atapci0: <Intel ICH3 UDMA100 controller> port 0xbfa0-0xbfaf,0x376,0x170-0x177,0x 3f6,0x1f0-0x1f7 at device 31.1 on pci0 -- J. Porter Clark <jpc@suespammers.org>
>On 7/25/05, J. Porter Clark <jpc@suespammers.org> wrote: >> >> The ATA (definitely not SATA) controller on my Dell Latitude >> C840 laptop has READ_DMA, WRITE_DMA problems, maybe 2-3 times >> a day unless I turn off DMA. It hangs for 3-4 seconds, logs >> errors, then proceeds. It isn't heavily loaded. I tried using >> Soren's ATA mkIII patches, and now it likes to panic about that >> same frequency instead of hanging, so I'm going back to the >> stock RELENG_5 config next chance I get. >> >> It's one of these: >> >> atapci0: <Intel ICH3 UDMA100 controller> port 0xbfa0-0xbfaf,0x376,0x170-0x177,0x 3f6,0x1f0-0x1f7 at device 31.1 on pci0>Your going back to stock RELENG_5 from what?RELENG_5 with Soren's patches.>I have that very same laptop running FreeBSD from 5.0 up to 5.4 with >absolutely no DMA problems using it. >Note: I did exchange the HDD from 20 to 60GB, different brands.Might be a clue. My laptop has this: ad0: 19077MB <IC25N020ATCS04 0 CA2OA72A> at ata0-master UDMA100 Typical errors: Jul 23 15:00:33 auricle kernel: ad0: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=20586350 Jul 25 09:01:52 auricle kernel: ad0: TIMEOUT - READ_MUL retrying (2 retries left) LBA=1731115 I had been running 4.X on this box with no DMA problems until I put 5.3 on it many months ago. It also boots Windows XP, and I don't have any obvious disk problems with that. Do you have trouble with the touchpad occasionally going nuts? I had mine replaced, and it still does it. With any OS I care to boot. I have some Firewire-related problems, too, but I won't go into them right now. -- J. Porter Clark <jpc@suespammers.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, Bruce A. Mah wrote: >> I know the developers don't hear it often enough, but thanks for all you=20 >> do. I'm not a programmer, and I currently don't have the funds to=20 >> donate to the project, but you do have my heartfelt thanks for still=20 >> turning out my favorite OS. > You're welcome, and I'm sure I speak for at least a few other developers > when I say that you'd be surprised how valuable a "donation" of a few > kind words can be. I'm following this thread from the start on. To add a "few kind words" I may report that I have three FreeBSD-5 servers (COMPAQ/HP ProLiant) up'n running for quite some time now (starting with 5.2.1!) and they act very well!! Mainly NFS and Samba servers, so their focus lies on filesystem space. To be honest, I was a bit astonished how many people obviously use ATA-Disks in a fileserver environment. I just read an article in the german iX-Magazine where the author emphasizes (once again!) that ATA disks are *not* designed for 24*7 use (with the exception of WDs Raptor). Considered the weak definitions in the so called "ATA- standard", I can't imagine for me personal to use ATA-disks for more than more or less temporary storage. Especially if I earn money with the server in question I always heard the urgent recommendation to use SCSI-disk. If I compare the value of the data with the cost of a SCSI subsystem, there are no questions any more... Some special kind words go to Soren Schmidt here. I never understood how one person could voluntary dive into this "shark basin" of ATA. There are no merits to earn and there seem to be always many "special" combinations of hardware, which don't work. A well thought out standard should avoid exactly that! So "Hats off" to Soren for his work and his boundless ability to suffer with the many complaints. So I stay with the old FreeBSD behaviour to use proven technology[TM] and let the cheap toys for the Linux-kiddies. It remains true: You get what you pay for! - -- Ciao/BSD - Matthias Matthias Schuendehuette <msch [at] snafu.de>, Berlin (Germany) PGP-Key at <pgp.mit.edu> and <wwwkeys.de.pgp.net> ID: 0xDDFB0A5F -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (Darwin) iD8DBQFC6zSnf1BNcN37Cl8RAnd0AJ9enOmZ1VcCLNG3CqTuwE5iHtSnJwCcCEAQ 5d1lAHQdhkMxyCDzj8E8xv4=arDg -----END PGP SIGNATURE-----