d_elbracht
2007-Oct-14 06:35 UTC
g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
we are trying to diagnose errors seen on 6.2, SMP, amd64, cvsup'ed of 2007-10-09 Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x Opteron 2216, da3 is on a 3ware 9550-12 we are seeing this error: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5 on a 12 GB Hyperdrive the offset changes sometimes, but it is always 81064794xxxxxxxxx and well out the 12GB range. We did have the Hyperdrive connected directly to the mainboards SATA0 (ad4) with similar errors. We used to have a md instead of the hyperdrive before, coming up with similar errors. Blocksize on the partition is 8192 (newsfs -b 8192 ..). We did have a blocksize of 65536 before, but after some hours (sometimes days), the machine will be unresponsible with "newbuf" as a waitmessage in top and has to be hard-reset. Regarding "newbuf", as well as nbufkv and nbufbs, I will write a seperate message to the list. According to systat -vm, da3 does tps > 500 (yes, that's a lot) This leads to an assumption, the error has to do with very high IOs per second on a SMP machine. The system-disk is a RAID1 on an ICP 5805. All other disks (51) are 20 gstripe'd partitions. Any hint to diagnose / fix the problem is well appreciated. Cheers, Dieter
Scott Long
2007-Oct-14 06:56 UTC
g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
d_elbracht wrote:> we are trying to diagnose errors seen on 6.2, SMP, amd64, cvsup'ed of > 2007-10-09 > > Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x Opteron > 2216, da3 is on a 3ware 9550-12 > > we are seeing this error: > g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5 > on a 12 GB Hyperdrive > > the offset changes sometimes, but it is always 81064794xxxxxxxxx and well > out the 12GB range. > > We did have the Hyperdrive connected directly to the mainboards SATA0 (ad4) > with similar errors. > We used to have a md instead of the hyperdrive before, coming up with > similar errors. > > Blocksize on the partition is 8192 (newsfs -b 8192 ..). > We did have a blocksize of 65536 before, but after some hours (sometimes > days), the machine will be unresponsible with "newbuf" as a waitmessage in > top and has to be hard-reset. > Regarding "newbuf", as well as nbufkv and nbufbs, I will write a seperate > message to the list. > > According to systat -vm, da3 does tps > 500 (yes, that's a lot) > > This leads to an assumption, the error has to do with very high IOs per > second on a SMP machine. > The system-disk is a RAID1 on an ICP 5805. All other disks (51) are 20 > gstripe'd partitions. > > Any hint to diagnose / fix the problem is well appreciated. > > Cheers, > > Dieter >I can geneate 30,000 I/O's per second for hours on end on several types of storage hardware on FreeBSD SMP, and have no problems. Since you're seeing this problem both when connected to a 3ware controller and when connected to a simple ATA/SATA controller (both of which have also been observed to do high amounts of I/O with no problems), I suspect that the problem is with your disk device, not with FreeBSD. I don't know anything about a "hyperdrive" though, so more information might help. Scott
Lars Eighner
2007-Oct-14 08:38 UTC
g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
On Sun, 14 Oct 2007, d_elbracht wrote:> we are trying to diagnose errors seen on 6.2, SMP, amd64, cvsup'ed of > 2007-10-09 > > Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x Opteron > 2216, da3 is on a 3ware 9550-12 > > we are seeing this error: > g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5 > on a 12 GB HyperdriveI trashed a perfectly disk drive before learning that there is a serious bug in g_vfs. Apparently it is one of those things which shows up in some configurations and not others. Although I am told they are unable to isolate the problem, all the reports I've seen were from people using AMD systems.
d_elbracht
2007-Oct-14 09:09 UTC
AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
> --- Scott Long <scottl@samsco.org> wrote: > > I can geneate 30,000 I/O's per second for hours on end on several > > types of storage hardware on FreeBSD SMP, and have no > problems. Since > > you're seeing this problem both when connected to a 3ware > controller > > and when connected to a simple ATA/SATA controller (both of > which have > > also been observed to do high amounts of I/O with no problems), I > > suspect that the problem is with your disk device, not with > FreeBSD. > > I don't know anything about a "hyperdrive" though, so more > information might help. > > > > Scott > > > I would say so, too... > > Especially because errno 5 is EIO: > http://www.freebsd.org/cgi/man.cgi?query=errno&apropos=0&sekti > on=0&manpath=FreeBSD+6.2-RELEASE&format=html > > -ArneI would agree with you on that, if the error (EIO) is NOT because of the READ going wrong in the first place.>From my understanding, the offset 81064794762854400 is NOT within the 12 GBof the drive anymore. Or, does the offset mean something else ? Dieter
Ivan Voras
2007-Oct-14 15:30 UTC
g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
d_elbracht wrote:> we are trying to diagnose errors seen on 6.2, SMP, amd64, cvsup'ed of > 2007-10-09 > > Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x Opteron > 2216, da3 is on a 3ware 9550-12 > > we are seeing this error: > g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5 > on a 12 GB Hyperdrive > > the offset changes sometimes, but it is always 81064794xxxxxxxxx and well > out the 12GB range.Yes.> According to systat -vm, da3 does tps > 500 (yes, that's a lot)That's not a lot :) That's actually low for a modern solid state drive.> This leads to an assumption, the error has to do with very high IOs per > second on a SMP machine.Either that or file system errors. Does fsck run ok or does it say anything unusual? There are several theoretical reasons for such errors that are connected with the fact you use solid state drives, but all are tricky to diagnose if you don't have a certain repeatable test you can try. For example: some SSDs optimize writes to "spread out" the IO on the chips, but some do it by looking into file system structures to determine where it's safe to relocate the write - obviously this works only with a known and supported file system. This is a really wild guess, but maybe the SSD firmware has error somewhere in this area, trying to interpret UFS as it was FAT? If you manage to get a repeatable failure test, you can try formatting the drive as FAT32 and trying it on that. Or maybe it's just a bad drive...> The system-disk is a RAID1 on an ICP 5805. All other disks (51) are 20 > gstripe'd partitions.51 drives and 20 partitions? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 250 bytes Desc: OpenPGP digital signature Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20071014/d9ae553a/signature.pgp
d_elbracht
2007-Oct-16 03:05 UTC
AW: Re: AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
> > One basic question to ask: where does the value for offset= in > > g_vfs_done() come from ? > >>From the time the error shows up in syslog I believe, the error only > > happens, when a file get's appended. > > I wonder if (wild guess follows) there's a 32/64 bit > conversion problem somewhere, like a 32bit number cast as > 64bit or something. > > I'd like to see a full trace to see what path it takes. > Maybe putting a > panic in the error path would be worth doing. >can you give me some hints please how to do this ? I'm willing to try about everything to get this problem nailed down. Dieter
d_elbracht
2007-Dec-05 10:35 UTC
AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
Just an update> > --- Scott Long <scottl@samsco.org> wrote: > > > I can geneate 30,000 I/O's per second for hours on end on several > > > types of storage hardware on FreeBSD SMP, and have no > > problems. Since > > > you're seeing this problem both when connected to a 3ware > > controller > > > and when connected to a simple ATA/SATA controller (both of > > which have > > > also been observed to do high amounts of I/O with no problems), I > > > suspect that the problem is with your disk device, not with > > FreeBSD. > > > I don't know anything about a "hyperdrive" though, so more > > information might help. > > > > > > Scott > > > > > I would say so, too... > > > > Especially because errno 5 is EIO: > > http://www.freebsd.org/cgi/man.cgi?query=errno&apropos=0&sekti > > on=0&manpath=FreeBSD+6.2-RELEASE&format=html > > > > -Arne > > I would agree with you on that, if the error (EIO) is NOT > because of the READ going wrong in the first place. > > From my understanding, the offset 81064794762854400 is NOT > within the 12 GB of the drive anymore. Or, does the offset > mean something else ?Scott, you were right in the first place, it was definitely a disk error. Thanks Dieter