thr3ads.net - CentOS - [CentOS] Software RAID1 with CentOS-6.2 [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Kahlil Hodgson

2012-Feb-29 00:27 UTC

[CentOS] Software RAID1 with CentOS-6.2

Hello,

Having a problem with software RAID that is driving me crazy.

Here's the details:

1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).
2. Reasonably good PC hardware (i.e. not budget, but not server grade either)
with a pair of 1TB Western Digital SATA3 Drives.
3. Drives are plugged into the SATA3 ports on the mainboard (both drives and
cables say they can do 6Gb/s).
4. During the install I set up software RAID1 for the two drives with two raid
partitions:
    md0 - 500M for /boot 
    md1 - "the rest" for a physical volume 
5. Setup LVM on md1 in the standard slash, swap, home layout

Install goes fine (actually really fast) and I reboot into CentoS 6.2.  Next I
ran yum update, added a few minor packages and performed some basic
configuration.

Now I start to get I/O errors on printed on the console.  Run 'mdadm -D
/dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
faulty.

Okay, fair enough, I've got at least one bad drive.  I boot the system from
a
live usb and run the short and long SMART tests on both drive.  No problems
reported but I know that can be misleading, so I'm going to have to gather
some
evidence before I try to return these drives.  I run badblocks in destructive
mode on both drives as follows

    badblocks -w -b 4096 -c 98304 -s /dev/sda
    badblocks -w -b 4096 -c 98304 -s /dev/sdb

Come back the next day and see that no errors are reported. Er thats odd.  I
check the SMART data in case badblocks activity has triggered something.
Nope.  Maybe I screwed up the install somehow?

So I start again and repeat the install process very carefully.  This time I
check the raid array straight after boot.

    mdadm -D /dev/md0   -   all is fine.
    mdadm -D /dev/md1   -   the two drives are resyncing.

Okay, that is odd. The RAID1 array was created at the start of the install
process, before any software was installed. Surely it should be in sync
already?  Googled a bit and found a post were someone else had seen same thing
happen.  The advice was to just wait until the drives sync so the 'blocks
match exactly' but I'm not really happy with the explanation.  At this
rate
its going to take a whole day to do a single minimal install and I'm sure I
would have heard others complaining about the process.

Anyway, I leave the system to sync for the rest of the day.  When I get back to
it I see the same (similar) I/O errors on the console and mdadm shows the RAID
array is degraded, /dev/sdb2 has been marked as faulty.  This time I notice
that the I/O errors all refer to /dev/sda.  Have to reboot because the fs is
now readonly.  When the system comes back up, its trying to resync the drive
again. Eh?

Any ideas what is going on here? If its bad drives, I really need some
confirmation independent of the software raid failing. I thought SMART or
badblocks give me that. Perhaps it has nothing to do with the drives.  Could a
problem with the mainboard or the memory cause this issue?  Is it a SATA3
issue?  Should I try it on the 3Gb/s channels since there's probably little
speed difference with non-SSDs? 

Cheers,

Kal

Keith Keller

2012-Feb-29 00:43 UTC

head link

[CentOS] Software RAID1 with CentOS-6.2

On 2012-02-29, Kahlil Hodgson <kahlil.hodgson at dealmax.com.au>
wrote:>
> 2. Reasonably good PC hardware (i.e. not budget, but not server grade
either)
> with a pair of 1TB Western Digital SATA3 Drives.
One thing you can try is to download WD's drive tester and throw it at
your drives.  It seems unlikely to find anything, but you never know.
The tester is available on the UBCD bootable CD image (which has lots of
other handy tools).

Which model drives do you have?  I've found a lot of variability between
WDxxEARS vs their RE drives.
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already?  Googled a bit and found a post were someone else had seen same
thing
> happen.  The advice was to just wait until the drives sync so the
'blocks
> match exactly' but I'm not really happy with the explanation.
Supposedly, at least with RAID[456], the array is completely usable when
it's resyncing after an initial creation.  In practice, I found that
writing significant amounts of data to that array killed resync
performance, so I just let the resync finish before doing any heavy
lifting on the array.
> Anyway, I leave the system to sync for the rest of the day.  When I get
back to
> it I see the same (similar) I/O errors on the console and mdadm shows the
RAID
> array is degraded, /dev/sdb2 has been marked as faulty.  This time I notice
> that the I/O errors all refer to /dev/sda.  Have to reboot because the fs
is
> now readonly.  When the system comes back up, its trying to resync the
drive
> again. Eh?
This sounds a little odd.  You're having IO errors on sda, but sdb2 has
been kicked out of the RAID?  Do you have any other errors in
/var/log/messages that relate to sdb, and/or the errors right around
when the md devices failed?


--keith

-- 
kkeller-usenet at wombat.san-francisco.ca.us

Scott Silva

2012-Feb-29 00:48 UTC

head link

[CentOS] Software RAID1 with CentOS-6.2

on 2/28/2012 4:27 PM Kahlil Hodgson spake the following:> Hello,
>
> Having a problem with software RAID that is driving me crazy.
>
> Here's the details:
>
> 1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).
> 2. Reasonably good PC hardware (i.e. not budget, but not server grade
either)
> with a pair of 1TB Western Digital SATA3 Drives.
> 3. Drives are plugged into the SATA3 ports on the mainboard (both drives
and
> cables say they can do 6Gb/s).
> 4. During the install I set up software RAID1 for the two drives with two
raid
> partitions:
>      md0 - 500M for /boot
>      md1 - "the rest" for a physical volume
> 5. Setup LVM on md1 in the standard slash, swap, home layout
>
> Install goes fine (actually really fast) and I reboot into CentoS 6.2. 
Next I
> ran yum update, added a few minor packages and performed some basic
> configuration.
>
> Now I start to get I/O errors on printed on the console.  Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked
as
> faulty.
>
> Okay, fair enough, I've got at least one bad drive.  I boot the system
from a
> live usb and run the short and long SMART tests on both drive.  No problems
> reported but I know that can be misleading, so I'm going to have to
gather some
> evidence before I try to return these drives.  I run badblocks in
destructive
> mode on both drives as follows
>
>      badblocks -w -b 4096 -c 98304 -s /dev/sda
>      badblocks -w -b 4096 -c 98304 -s /dev/sdb
>
> Come back the next day and see that no errors are reported. Er thats odd. 
I
> check the SMART data in case badblocks activity has triggered something.
> Nope.  Maybe I screwed up the install somehow?
>
> So I start again and repeat the install process very carefully.  This time
I
> check the raid array straight after boot.
>
>      mdadm -D /dev/md0   -   all is fine.
>      mdadm -D /dev/md1   -   the two drives are resyncing.
>
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already?  Googled a bit and found a post were someone else had seen same
thing
> happen.  The advice was to just wait until the drives sync so the
'blocks
> match exactly' but I'm not really happy with the explanation.  At
this rate
> its going to take a whole day to do a single minimal install and I'm
sure I
> would have heard others complaining about the process.
>
> Anyway, I leave the system to sync for the rest of the day.  When I get
back to
> it I see the same (similar) I/O errors on the console and mdadm shows the
RAID
> array is degraded, /dev/sdb2 has been marked as faulty.  This time I notice
> that the I/O errors all refer to /dev/sda.  Have to reboot because the fs
is
> now readonly.  When the system comes back up, its trying to resync the
drive
> again. Eh?
>
> Any ideas what is going on here? If its bad drives, I really need some
> confirmation independent of the software raid failing. I thought SMART or
> badblocks give me that. Perhaps it has nothing to do with the drives. 
Could a
> problem with the mainboard or the memory cause this issue?  Is it a SATA3
> issue?  Should I try it on the 3Gb/s channels since there's probably
little
> speed difference with non-SSDs?
>
> Cheers,
>
> KalFirst thing... Are they green drives? Green drives power down randomly and can 
cause these types of errors... Also, maybe the 6GB sata isn't fully
supported
by linux and that board... Try the 3 GB channels

Luke S. Crawford

2012-Feb-29 01:30 UTC

head link

[CentOS] Software RAID1 with CentOS-6.2

On Wed, Feb 29, 2012 at 11:27:53AM +1100, Kahlil Hodgson
wrote:> Now I start to get I/O errors on printed on the console.  Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked
as
> faulty.
what I/O errors?

> So I start again and repeat the install process very carefully.  This time
I
> check the raid array straight after boot.
> 
>     mdadm -D /dev/md0   -   all is fine.
>     mdadm -D /dev/md1   -   the two drives are resyncing.
> 
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already?  Googled a bit and found a post were someone else had seen same
thing
> happen.  The advice was to just wait until the drives sync so the
'blocks
> match exactly' but I'm not really happy with the explanation.  At
this rate
> its going to take a whole day to do a single minimal install and I'm
sure I
> would have heard others complaining about the process.
Yeah, it's normal for a raid1 to 'sync' when you first create it.
the odd part is the I/O errors. 
> Any ideas what is going on here? If its bad drives, I really need some
> confirmation independent of the software raid failing. I thought SMART or
> badblocks give me that. Perhaps it has nothing to do with the drives. 
Could a
> problem with the mainboard or the memory cause this issue?  Is it a SATA3
> issue?  Should I try it on the 3Gb/s channels since there's probably
little
> speed difference with non-SSDs? 
look up the drive errors.   

Oh, and my experience?  both wd and seagate won't complain if you
error on the side of 'when in doubt, return the drive'  - that's
what I
do.   

But yeah, usually smart will report something... at least a high reallocated
sectors or something.

Ellen Shull

2012-Feb-29 01:59 UTC

head link

[CentOS] Software RAID1 with CentOS-6.2

On Tue, Feb 28, 2012 at 5:27 PM, Kahlil Hodgson
<kahlil.hodgson at dealmax.com.au> wrote:> Now I start to get I/O errors on printed on the console. ?Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked
as
> faulty.
I had a problem like this once.  In a heterogeneous array of 80 GB
PATA drives (it was a while ago), the one WD drive kept dropping out
like this.  WD's diagnostic tool showed a problem, so I RMA'ed the
drive...  only to discover the replacement did the same thing on the
system, but checked out just fine on a different system.  Turned out
to be a combination of a power supply with less-than-stellar
regulation (go Enermax...) and the WD was particularly sensitive to
it; nothing else in the system seemed to be affected  Replacing the
power supply finally eliminated the issue.

--ln

Emmett Culley

2012-Feb-29 02:18 UTC

head link

[CentOS] Software RAID1 with CentOS-6.2

On 02/28/2012 04:27 PM, Kahlil Hodgson wrote:> Hello,
> 
> Having a problem with software RAID that is driving me crazy.
> 
> Here's the details:
> 
> 1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).
> 2. Reasonably good PC hardware (i.e. not budget, but not server grade
either)
> with a pair of 1TB Western Digital SATA3 Drives.
> 3. Drives are plugged into the SATA3 ports on the mainboard (both drives
and
> cables say they can do 6Gb/s).
> 4. During the install I set up software RAID1 for the two drives with two
raid
> partitions:
>      md0 - 500M for /boot
>      md1 - "the rest" for a physical volume
> 5. Setup LVM on md1 in the standard slash, swap, home layout
> 
> Install goes fine (actually really fast) and I reboot into CentoS 6.2. 
Next I
> ran yum update, added a few minor packages and performed some basic
> configuration.
> 
> Now I start to get I/O errors on printed on the console.  Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked
as
> faulty.
> 
> Okay, fair enough, I've got at least one bad drive.  I boot the system
from a
> live usb and run the short and long SMART tests on both drive.  No problems
> reported but I know that can be misleading, so I'm going to have to
gather some
> evidence before I try to return these drives.  I run badblocks in
destructive
> mode on both drives as follows
> 
>      badblocks -w -b 4096 -c 98304 -s /dev/sda
>      badblocks -w -b 4096 -c 98304 -s /dev/sdb
> 
> Come back the next day and see that no errors are reported. Er thats odd. 
I
> check the SMART data in case badblocks activity has triggered something.
> Nope.  Maybe I screwed up the install somehow?
> 
> So I start again and repeat the install process very carefully.  This time
I
> check the raid array straight after boot.
> 
>      mdadm -D /dev/md0   -   all is fine.
>      mdadm -D /dev/md1   -   the two drives are resyncing.
> 
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already?  Googled a bit and found a post were someone else had seen same
thing
> happen.  The advice was to just wait until the drives sync so the
'blocks
> match exactly' but I'm not really happy with the explanation.  At
this rate
> its going to take a whole day to do a single minimal install and I'm
sure I
> would have heard others complaining about the process.
> 
> Anyway, I leave the system to sync for the rest of the day.  When I get
back to
> it I see the same (similar) I/O errors on the console and mdadm shows the
RAID
> array is degraded, /dev/sdb2 has been marked as faulty.  This time I notice
> that the I/O errors all refer to /dev/sda.  Have to reboot because the fs
is
> now readonly.  When the system comes back up, its trying to resync the
drive
> again. Eh?
> 
> Any ideas what is going on here? If its bad drives, I really need some
> confirmation independent of the software raid failing. I thought SMART or
> badblocks give me that. Perhaps it has nothing to do with the drives. 
Could a
> problem with the mainboard or the memory cause this issue?  Is it a SATA3
> issue?  Should I try it on the 3Gb/s channels since there's probably
little
> speed difference with non-SSDs?
> 
> Cheers,
> 
> Kal
> 
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
> 
> I just had a very similar problem with a raid 10 array with four new 1TB drives.
It turned out to be the SATA cable.

I first tried a new drive and even replaced the five disk hot plug carrier.  It
was always the same logical drive (/dev/sdb).  I then tried using an additional
SATA adapter card.  That cinched it, as the only thing common to all the above
was the SATA cable.

All has been well for a week now.

I should have tired replacing the cable first :-)

Emmett

Miguel Medalha

2012-Feb-29 19:46 UTC

head link

[CentOS] Software RAID1 with CentOS-6.2

A few months ago I had an enormous amount of grief trying to understand 
why a RAID array in a new server kept getting corrupted and suddenly 
changing configuration. After a lot of despair and head scratching it 
turned out to be the SATA cables. This was a rack server from Asus with 
a SATA backplane. The cables, made by Foxconn, came pre-installed.

After I replaced the SATA cables with new ones, all problems were gone 
and the array is now rock solid.

Many SATA cables on the market are pieces of junk either incapable of 
coping with the high frequencies involved in SATA 3Gb/s or 6Gb/s or 
their connector are made of bad quality plastics unable to keep the 
necessary pressure on the contacts.

I had already found this problem with desktop machines, I simply 
wouldn't believe that such a class of hardware would exhibit it also.

So, I would advise you to replace the SATA cables with good quality ones.


As an additional information, I quote from the Caviar Black range datasheet:

"Desktop / Consumer RAID Environments - WD Caviar Black Hard Drives are 
tested and recommended for use in consumer-type RAID applications 
(RAID-0 /RAID-1).
- Business Critical RAID Environments ? WD Caviar Black Hard Drives are 
not recommended for and are not warranted for use in RAID environments 
utilizing Enterprise HBAs and/or expanders and in multi-bay chassis, as 
they are not designed for, nor tested in, these specific types of RAID 
applications. For all Business Critical RAID applications, please 
consider WD?s Enterprise Hard Drives that are specifically designed with 
RAID-specific, time-limited error recovery (TLER), are tested 
extensively in 24x7 RAID applications, and include features like 
enhanced RAFF technology and thermal extended burn-in testing."

Chuck Munro

2012-Mar-01 18:13 UTC

head link

[CentOS] Software RAID1 with CentOS-6.2

On 03/01/2012 09:00 AM, Mark Roth wrote:>
> Miguel Medalha wrote:
>> >
>> >  A few months ago I had an enormous amount of grief trying to
understand
>> >  why a RAID array in a new server kept getting corrupted and
suddenly
>> >  changing configuration. After a lot of despair and head
scratching it
>> >  turned out to be the SATA cables. This was a rack server from
Asus with
>> >  a SATA backplane. The cables, made by Foxconn, came
pre-installed.
>> >
>> >  After I replaced the SATA cables with new ones, all problems were
gone
>> >  and the array is now rock solid.
> Thanks for this info, Miguel.
> <snip>
>> >  As an additional information, I quote from the Caviar Black range
>> >  datasheet:
>> >
>> >  "Desktop / Consumer RAID Environments - WD Caviar Black Hard
Drives are
>> >  tested and recommended for use in consumer-type RAID applications
>> >  (RAID-0 /RAID-1).
>> >  - Business Critical RAID Environments ? WD Caviar Black Hard
Drives are
>> >  not recommended for and are not warranted for use in RAID
environments
>> >  utilizing Enterprise HBAs and/or expanders and in multi-bay
chassis, as
>> >  they are not designed for, nor tested in, these specific types of
RAID
>> >  applications. For all Business Critical RAID applications, please
>> >  consider WD?s Enterprise Hard Drives that are specifically
designed with
>> >  RAID-specific, time-limited error recovery (TLER), are tested
>> >  extensively in 24x7 RAID applications, and include features like
>> >  enhanced RAFF technology and thermal extended burn-in
testing."
> Wonderful... NOT. We've got a number of Caviar Green, so I looked up
its
> datasheet... and it says the same.
>
> That rebuild of my system at home? I think I'll look at commercial
grade
> drives....
>
>         mark
>Interesting thread ... I have had problems with SATA cables in the past, 
and prefer those with the little metal latches.  The problem is that you 
can't easily tell by looking at the connectors whether or not they're 
flakey.

I've had positive experience with Caviar Black and Scorpio Black drives.

The WD Green and Blue drives are built more cheaply than the Blacks 
(which have close to enterprise-grade construction).  The dealer I buy 
drives from has told me that the Blacks have far lower return/defect 
rates.  Of the approximately 30 2TB Blacks I have in RAID-6 service, 
I've only experienced two failures, which were handled quickly by the WD 
warranty program.  It's interesting to note that while all the drive 
manufacturers are going back to 1 or 2-year warranties, the WD Black 
series remains at 5 years.

A friend of mine has had a couple of strange problems with the RE (RAID) 
series of Caviars, which utilize the same mechanics as the non-RE 
Blacks.  For software RAID, I would recommend that you stick with the 
non-RE versions because of differences in the firmware.

It has come down to me buying *only* WD Black-series drives and nothing 
else.  If I could afford them, I'd consider enterprise-grade drives.

Having said that, I have a pair of 1TB Green drives in RAID-1 for the 
TimeMachine backups on my Mac, and they've been spinning 24x7 non-stop 
for 3 years without failure.  I'm almost afraid to switch them off.

Now, if WD can just get their post-flood production back in gear so 
prices can drop.  My 2c, FWIW  :-)

Chuck

Apparently Analagous Threads

Search for more maybe matching threads

CentOS - Feb 2012 - Software RAID1 with CentOS-6.2

[CentOS] Software RAID1 with CentOS-6.2

[CentOS] Software RAID1 with CentOS-6.2

[CentOS] Software RAID1 with CentOS-6.2

[CentOS] Software RAID1 with CentOS-6.2

[CentOS] Software RAID1 with CentOS-6.2

[CentOS] Software RAID1 with CentOS-6.2

[CentOS] Software RAID1 with CentOS-6.2

[CentOS] Software RAID1 with CentOS-6.2

Apparently Analagous Threads