thr3ads.net - dtrace discuss - [dtrace-discuss] Disk corruption problem, where to start [Apr 2006]

If this information is useful, please help other people find it:
Share via:

Carisdad

2006-Apr-11 16:57 UTC

[dtrace-discuss] Disk corruption problem, where to start

I''ve got a funky problem, and was wondering if anyone could help get me
started with using dtrace to find the problem.

The setup:
    Solaris 10 with all of the patches up to about a month ago.
    Veritas Foundation Suite 4.1 MP1
    8 TB SCSI Mirrored Disk on one array, SAN attached w/ 4x2G HBA''s 
    30 TB SATA RAID-5 Disk on another array, also SAN attached w/ 
separate 2x2G HBA''s

The problem:
     We have a script which moves oracle files from the mirrored disk to 
the RAID-5 disk as the data ages.  At the end of the day, the script is 
essentially doing a cp -p /expensivedisk/file1 /cheapdisk/file2.  After 
the cp we do checksums on the files and compare for corruption.  These 
are usually 2GB files, and about 1 in 500 get corrupted, where a 
contiguous 2K chunk of data gets zero''d in the destination file.

I''m wondering if I could use dtrace to narrow down the source of the 
corruption from somewhere between the disk, the hba, the driver, the OS, 
or veritas.

Thanks.

Carisdad

2006-Apr-11 17:54 UTC

head link

[dtrace-discuss] Re: Disk corruption problem, where to start

Carisdad wrote:
> I''ve got a funky problem, and was wondering if anyone could help
get
> me started with using dtrace to find the problem.
>
> The setup:
>    Solaris 10 with all of the patches up to about a month ago.
>    Veritas Foundation Suite 4.1 MP1
>    8 TB SCSI Mirrored Disk on one array, SAN attached w/ 4x2G
HBA''s
> 30 TB SATA RAID-5 Disk on another array, also SAN attached w/ separate 
> 2x2G HBA''s
>
> The problem:
>     We have a script which moves oracle files from the mirrored disk 
> to the RAID-5 disk as the data ages.  At the end of the day, the 
> script is essentially doing a cp -p /expensivedisk/file1 
> /cheapdisk/file2.  After the cp we do checksums on the files and 
> compare for corruption.  These are usually 2GB files, and about 1 in 
> 500 get corrupted, where a contiguous 2K chunk of data gets zero''d
in
> the destination file.
I actually mis-stated the corruption.  It''s a contiguous 48k (4 oracle 
blocks vs 4 disk blocks) of zero''d data.
>
> I''m wondering if I could use dtrace to narrow down the source of
the
> corruption from somewhere between the disk, the hba, the driver, the 
> OS, or veritas.
>
> Thanks.
>Thanks again.

Andy Rumer

2006-Apr-11 19:37 UTC

head link

[dtrace-discuss] Re: Disk corruption problem, where to start

> Carisdad wrote:
> 
> > I''ve got a funky problem, and was wondering if
> anyone could help get 
> > me started with using dtrace to find the problem.
> >
> > The setup:
> >    Solaris 10 with all of the patches up to about a
> month ago.
> >    Veritas Foundation Suite 4.1 MP1
> >    8 TB SCSI Mirrored Disk on one array, SAN
> attached w/ 4x2G HBA''s    
> > 30 TB SATA RAID-5 Disk on another array, also SAN
> attached w/ separate 
> > 2x2G HBA''s
> >
> > The problem:
> >     We have a script which moves oracle files from
> the mirrored disk 
> > to the RAID-5 disk as the data ages.  At the end of
> the day, the 
> > script is essentially doing a cp -p
> /expensivedisk/file1 
> > /cheapdisk/file2.  After the cp we do checksums on
> the files and 
> > compare for corruption.  These are usually 2GB
> files, and about 1 in 
> > 500 get corrupted, where a contiguous 2K chunk of
> data gets zero''d in 
> > the destination file.
> 
> I actually mis-stated the corruption.  It''s a
> contiguous 48k (4 oracle 
> blocks vs 4 disk blocks) of zero''d data.
> And because I haven''t been sleeping, make that 64K (4x16K oracle
blocks) not 48K.  Sheesh, I''m not really this incompetent, I
promise.> >
> > I''m wondering if I could use dtrace to narrow down
> the source of the 
> > corruption from somewhere between the disk, the
> hba, the driver, the 
> > OS, or veritas.
> >
> > Thanks.
> >
> Thanks again.
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
> 
 
This message posted from opensolaris.org

Wee Yeh Tan

2006-Apr-14 05:22 UTC

head link

[dtrace-discuss] Re: Disk corruption problem, where to start

Andy,

I''ll start by looking at who else is could have written to the said
files during the entire period of the copy/verify.  Brendan''s rwsnoop
will be a good start.

Check out:
<http://users.tpg.com.au/adsln4yb/dtrace.html#DTraceToolkit>


--
Just me,
Wire ...


On 4/12/06, Andy Rumer <carisdad at gmail.com>
wrote:> > Carisdad wrote:
> >
> > > I''ve got a funky problem, and was wondering if
> > anyone could help get
> > > me started with using dtrace to find the problem.
> > >
> > > The setup:
> > >    Solaris 10 with all of the patches up to about a
> > month ago.
> > >    Veritas Foundation Suite 4.1 MP1
> > >    8 TB SCSI Mirrored Disk on one array, SAN
> > attached w/ 4x2G HBA''s
> > > 30 TB SATA RAID-5 Disk on another array, also SAN
> > attached w/ separate
> > > 2x2G HBA''s
> > >
> > > The problem:
> > >     We have a script which moves oracle files from
> > the mirrored disk
> > > to the RAID-5 disk as the data ages.  At the end of
> > the day, the
> > > script is essentially doing a cp -p
> > /expensivedisk/file1
> > > /cheapdisk/file2.  After the cp we do checksums on
> > the files and
> > > compare for corruption.  These are usually 2GB
> > files, and about 1 in
> > > 500 get corrupted, where a contiguous 2K chunk of
> > data gets zero''d in
> > > the destination file.
> >
> > I actually mis-stated the corruption.  It''s a
> > contiguous 48k (4 oracle
> > blocks vs 4 disk blocks) of zero''d data.
> >
> And because I haven''t been sleeping, make that 64K (4x16K oracle
blocks) not 48K.  Sheesh, I''m not really this incompetent, I promise.
> > >
> > > I''m wondering if I could use dtrace to narrow down
> > the source of the
> > > corruption from somewhere between the disk, the
> > hba, the driver, the
> > > OS, or veritas.
> > >
> > > Thanks.
> > >
> > Thanks again.
> > _______________________________________________
> > dtrace-discuss mailing list
> > dtrace-discuss at opensolaris.org
> >
>
>
> This message posted from opensolaris.org
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss at opensolaris.org
>

dtrace discuss - Apr 2006 - Disk corruption problem, where to start

[dtrace-discuss] Disk corruption problem, where to start

[dtrace-discuss] Re: Disk corruption problem, where to start

[dtrace-discuss] Re: Disk corruption problem, where to start

[dtrace-discuss] Re: Disk corruption problem, where to start