thr3ads.net - freebsd stable - g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5 [Oct 2007]

If this information is useful, please help other people find it:
Share via:

d_elbracht

2007-Oct-14 06:35 UTC

g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

we are trying to diagnose errors seen on 6.2, SMP, amd64, cvsup'ed of
2007-10-09

Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x Opteron
2216, da3 is on a 3ware 9550-12

we are seeing this error:
g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
on a 12 GB Hyperdrive

the offset changes sometimes, but it is always 81064794xxxxxxxxx and well
out the 12GB range.

We did have the Hyperdrive connected directly to the mainboards SATA0 (ad4)
with similar errors.
We used to have a md instead of the hyperdrive before, coming up with
similar errors.

Blocksize on the partition is 8192 (newsfs -b 8192 ..). 
We did have a blocksize of 65536 before, but after some hours (sometimes
days), the machine will be unresponsible with "newbuf" as a
waitmessage in
top and has to be hard-reset. 
Regarding "newbuf", as well as nbufkv and nbufbs, I will write a
seperate
message to the list.

According to systat -vm, da3 does tps > 500 (yes, that's a lot)

This leads to an assumption, the error has to do with very high IOs per
second on a SMP machine.
The system-disk is a RAID1 on an ICP 5805. All other disks (51) are 20
gstripe'd partitions.

Any hint to diagnose / fix the problem is well appreciated.

Cheers,

Dieter

Scott Long

2007-Oct-14 06:56 UTC

head link

g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

d_elbracht wrote:> we are trying to diagnose errors seen on 6.2, SMP, amd64, cvsup'ed of
> 2007-10-09
> 
> Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x Opteron
> 2216, da3 is on a 3ware 9550-12
> 
> we are seeing this error:
> g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
> on a 12 GB Hyperdrive
> 
> the offset changes sometimes, but it is always 81064794xxxxxxxxx and well
> out the 12GB range.
> 
> We did have the Hyperdrive connected directly to the mainboards SATA0 (ad4)
> with similar errors.
> We used to have a md instead of the hyperdrive before, coming up with
> similar errors.
> 
> Blocksize on the partition is 8192 (newsfs -b 8192 ..). 
> We did have a blocksize of 65536 before, but after some hours (sometimes
> days), the machine will be unresponsible with "newbuf" as a
waitmessage in
> top and has to be hard-reset. 
> Regarding "newbuf", as well as nbufkv and nbufbs, I will write a
seperate
> message to the list.
> 
> According to systat -vm, da3 does tps > 500 (yes, that's a lot)
> 
> This leads to an assumption, the error has to do with very high IOs per
> second on a SMP machine.
> The system-disk is a RAID1 on an ICP 5805. All other disks (51) are 20
> gstripe'd partitions.
> 
> Any hint to diagnose / fix the problem is well appreciated.
> 
> Cheers,
> 
> Dieter
> 
I can geneate 30,000 I/O's per second for hours on end on several types
of storage hardware on FreeBSD SMP, and have no problems.  Since you're
seeing this problem both when connected to a 3ware controller and when
connected to a simple ATA/SATA controller (both of which have also been
observed to do high amounts of I/O with no problems), I suspect that the
problem is with your disk device, not with FreeBSD.  I don't know
anything about a "hyperdrive" though, so more information might help.

Scott

Lars Eighner

2007-Oct-14 08:38 UTC

head link

g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

On Sun, 14 Oct 2007, d_elbracht wrote:
> we are trying to diagnose errors seen on 6.2, SMP, amd64, cvsup'ed of
> 2007-10-09
>
> Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x Opteron
> 2216, da3 is on a 3ware 9550-12
>
> we are seeing this error:
> g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
> on a 12 GB Hyperdrive
I trashed a perfectly disk drive before learning that there is a serious bug
in g_vfs.  Apparently it is one of those things which shows up in some
configurations and not others.  Although I am told they are unable to
isolate the problem, all the reports I've seen were from people using AMD
systems.

d_elbracht

2007-Oct-14 09:09 UTC

head link

AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

> --- Scott Long <scottl@samsco.org> wrote:
> > I can geneate 30,000 I/O's per second for hours on end on several 
> > types of storage hardware on FreeBSD SMP, and have no 
> problems.  Since 
> > you're seeing this problem both when connected to a 3ware 
> controller 
> > and when connected to a simple ATA/SATA controller (both of 
> which have 
> > also been observed to do high amounts of I/O with no problems), I 
> > suspect that the problem is with your disk device, not with 
> FreeBSD.  
> > I don't know anything about a "hyperdrive" though, so
more
> information might help.
> > 
> > Scott
> > 
> I would say so, too...
> 
> Especially because errno 5 is EIO:
> http://www.freebsd.org/cgi/man.cgi?query=errno&apropos=0&sekti
> on=0&manpath=FreeBSD+6.2-RELEASE&format=html
> 
> -Arne
I would agree with you on that, if the error (EIO) is NOT because of the
READ going wrong in the first place.
>From my understanding, the offset 81064794762854400 is NOT within the 12 GBof the drive anymore. Or, does the offset mean something else ?

Dieter

Ivan Voras

2007-Oct-14 15:30 UTC

head link

g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

d_elbracht wrote:> we are trying to diagnose errors seen on 6.2, SMP, amd64, cvsup'ed of
> 2007-10-09
> 
> Mainboard is a Tyan Thunder h2000M (S3992-E) with 16 GB RAM and 2 x Opteron
> 2216, da3 is on a 3ware 9550-12
> 
> we are seeing this error:
> g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5
> on a 12 GB Hyperdrive
> 
> the offset changes sometimes, but it is always 81064794xxxxxxxxx and well
> out the 12GB range.
Yes.
> According to systat -vm, da3 does tps > 500 (yes, that's a lot)
That's not a lot :) That's actually low for a modern solid state drive.
> This leads to an assumption, the error has to do with very high IOs per
> second on a SMP machine.
Either that or file system errors. Does fsck run ok or does it say
anything unusual?

There are several theoretical reasons for such errors that are connected
with the fact you use solid state drives, but all are tricky to diagnose
if you don't have a certain repeatable test you can try. For example:
some SSDs optimize writes to "spread out" the IO on the chips, but
some
do it by looking into file system structures to determine where it's
safe to relocate the write - obviously this works only with a known and
supported file system. This is a really wild guess, but maybe the SSD
firmware has error somewhere in this area, trying to interpret UFS as it
was FAT? If you manage to get a repeatable failure test, you can try
formatting the drive as FAT32 and trying it on that.

Or maybe it's just a bad drive...
> The system-disk is a RAID1 on an ICP 5805. All other disks (51) are 20
> gstripe'd partitions.
51 drives and 20 partitions?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 250 bytes
Desc: OpenPGP digital signature
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20071014/d9ae553a/signature.pgp

d_elbracht

2007-Oct-16 03:05 UTC

head link

AW: Re: AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

> > One basic question to ask: where does the value for offset= in 
> > g_vfs_done() come from ?
> >>From the time the error shows up in syslog I believe, the error
only
> > happens, when a file get's appended.
> 
> I wonder if (wild guess follows) there's a 32/64 bit 
> conversion problem somewhere, like a 32bit number cast as 
> 64bit or something.
> 
> I'd like to see a full trace to see what path it takes.  
> Maybe putting a
>   panic in the error path would be worth doing.
> 
can you give me some hints please how to do this ? I'm willing to try about
everything to get this problem nailed down.

Dieter

d_elbracht

2007-Dec-05 10:35 UTC

head link

AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

Just an update
 > > --- Scott Long <scottl@samsco.org> wrote:
> > > I can geneate 30,000 I/O's per second for hours on end on
several
> > > types of storage hardware on FreeBSD SMP, and have no
> > problems.  Since
> > > you're seeing this problem both when connected to a 3ware
> > controller
> > > and when connected to a simple ATA/SATA controller (both of
> > which have
> > > also been observed to do high amounts of I/O with no problems), I
> > > suspect that the problem is with your disk device, not with
> > FreeBSD.  
> > > I don't know anything about a "hyperdrive" though,
so more
> > information might help.
> > > 
> > > Scott
> > > 
> > I would say so, too...
> > 
> > Especially because errno 5 is EIO:
> > http://www.freebsd.org/cgi/man.cgi?query=errno&apropos=0&sekti
> > on=0&manpath=FreeBSD+6.2-RELEASE&format=html
> > 
> > -Arne
> 
> I would agree with you on that, if the error (EIO) is NOT 
> because of the READ going wrong in the first place.
> 
> From my understanding, the offset 81064794762854400 is NOT 
> within the 12 GB of the drive anymore. Or, does the offset 
> mean something else ?
Scott, you were right in the first place, it was definitely a disk error.

Thanks

Dieter

freebsd stable - Oct 2007 - g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

AW: Re: AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5

AW: g_vfs_done():da3s1a[READ(offset=81064794762854400, length=8192)]error = 5