Michael A. Peters
2009-Jun-02 02:52 UTC
[CentOS] Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)
-=- starting as new thread as it is off topic from controller thread -=- Ross Walker wrote: > > The real key is the controller though. Get one that can do hardware > RAID1/10, 5/50, 6/60, if it can do both SATA and SAS even better and > get a battery backed write-back cache, the bigger the better, 256MB > good, 512MB better, 1GB best. I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid. The basic argument seems to be that CPU's are fast enough now that the limitation on throughput is the drive itself, and that SATA resolved the bottleneck that PATA caused with kernel raid. The arguments then go on to give numerous examples where a failing hardware raid controller CAUSED data loss, where a raid card died and an identical raid card had to be scrounged from eBay to even read the data on the drives, etc. - problems that apparently don't happen with kernel software raid. The main exception I've seen to using software raid are high availability setups where a separate external unit ($$$) provides the same hard disk to multiple servers. Then the raid can't really be in the kernel but has to be in the hardware. I'd be very interested in hearing opinions on this subject.
nate
2009-Jun-02 03:33 UTC
[CentOS] Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)
Michael A. Peters wrote:> I'd be very interested in hearing opinions on this subject.I mainly like hardware raid (good hardware raid not hybrid software/hardware raid) because of the simplicity, the system can easily boot from it, in many cases drives are hot swappable and you don't have to touch the software/driver you just yank the disk and put in a new one. In the roughly 600 server class systems I've been exposed to over the years I have seen only one or two bad RAID cards, one of them I specifically remember was caught being bad during a burn-in test so it never went live, I think the other went bad after several years of service. While problems certainly can happen, the raid card seems to not be an issue provided your using a good one. I recall the one being "DOA" was a 3Ware 8006-2 and the other one was an HP, I believe a DL360G1. The most crazy thing I've experienced on a RAID array was on some cheap shit LSI logic storage systems where a single disk failure somehow crippled it's storage controllers(both of them) knocking the entire array offline for an extended period of time. I think the drive spat out a bunch of errors on the fiber bus causing the controllers to flip out. The system eventually recovered on it's own. I have been told similar stories about other LSI logic systems(several big companies OEM them), though I'm sure the problem isn't limited to them, it's an architectural problem rather than an implementation issue. The only time in my experience where we actually lost data (that I'm aware of) due to a storage/RAID/controller issue was back in 2004 with an EMC CLARiiON CX600, where a misconfiguration by the storage admin caused a catastrophic failure of the backup controller when the primary controller crashed. We spent a good 60 hours of downtime the following week rebuilding corrupt portions of the database as we came across them. More than a year later we still occasionally found corruption from that incident. Fortunately the data on the volumes that suffered corruption was quite old and rarely accessed. Ideally the array should of made the configuration error obvious or better yet prevented the error from occurring in the first place. Those old style enterprise arrays were too overly complicated(and yes that CX600 ran embedded Windows NT as it's OS!) For servers, I like 3Ware for SATA and HP for SAS. Though these days the only things that sit on internal storage is the operating system. All important data is on enterprise grade storage systems, which for me means 3PAR(not to be confused with 3Ware), which get upwards of double the usable capacity vs any other system in the world while still being dead easy to use and the fastest arrays in the world(priced pretty good too), and the drives have point to point switched connections, they don't sit on a shared bus. Our array can recover from a failed 750GB SATA drive in (worst case) roughly 3.5 hours with no performance impact to the system. Our previous array would take more than 24 hours to rebuild a 400GB SATA drive, with a major performance hit to the array. I could go on all day why their arrays are so great! My current company has mostly dell servers, and so far I don't have many good things to say about their controllers or drives(drives themselves are "OK" though Dell doesn't do a good enough job on QA with them, we had to manually flash dozens of drive firmwares because of performance problems, and the only way to flash the disk firmware is to boot to DOS, unlike flashing the BIOS or controller firmware). I believe the Dell SAS/SATA controllers are LSI logic. I have seen several kernel panics that seem to point to the storage array on the Dell systems. HP is coming out with their G6 servers tomorrow, the new SmartArray controllers sound pretty nice, though I have had a couple incidents with older HP arrays where a failing drive caused massive performance problems on the array, and we weren't able to force fail the drive from remote we had to send someone on site to yank it out. No data loss though. Funny that the controller detected the drive was failing, but didn't give us the ability to take it off line. Support said it was fixed in a newer version of firmware, which of course required downtime to install. nate
Chris Boyd
2009-Jun-02 15:36 UTC
[CentOS] Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)
On Jun 1, 2009, at 9:52 PM, Michael A. Peters wrote:> I've read a lot of different reports that suggest at this point in > time, > kernel software raid is in most cases better than controller raid.I manage systems with both. I like hardware RAID controllers. Yes, they do cost money up front, but when you have a failure you can get a replacement drive, give it to a low level tech, and say "Go to server A41, pull the drive with the solid red light and plug this one in." Then the controller will take over, format the drive and put it back into service. With software RAID, you have to have a sysadmin log in to the box and do rootly things that require careful thought :-) When these events are happening in the wee hours and there are other possible human factors like fatigue or stress, the first scenario is less risky and costly in the long run. --Chris
Ross Walker
2009-Jun-02 15:38 UTC
[CentOS] Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)
On Mon, Jun 1, 2009 at 10:52 PM, Michael A. Peters <mpeters at mac.com> wrote:> -=- starting as new thread as it is off topic from controller thread -=- > > Ross Walker wrote: > > ?> > ?> The real key is the controller though. Get one that can do hardware > ?> RAID1/10, 5/50, 6/60, if it can do both SATA and SAS even better and > ?> get a battery backed write-back cache, the bigger the better, 256MB > ?> good, 512MB better, 1GB best. > > I've read a lot of different reports that suggest at this point in time, > kernel software raid is in most cases better than controller raid. > > The basic argument seems to be that CPU's are fast enough now that the > limitation on throughput is the drive itself, and that SATA resolved the > bottleneck that PATA caused with kernel raid. The arguments then go on > to give numerous examples where a failing hardware raid controller > CAUSED data loss, where a raid card died and an identical raid card had > to be scrounged from eBay to even read the data on the drives, etc. - > problems that apparently don't happen with kernel software raid. > > The main exception I've seen to using software raid are high > availability setups where a separate external unit ($$$) provides the > same hard disk to multiple servers. Then the raid can't really be in the > kernel but has to be in the hardware. > > I'd be very interested in hearing opinions on this subject.The real reason I use hardware RAID is the write-back cache. Nothing beats it for shear write performance. Hell I don't even use the on-board RAID. I just export the drives as individual RAID0 disks, readable with a straight SAS controller if need be, and use ZFS for RAID. ZFS only has to resilver the existing data and not the whole drive on a drive failure which reduces the double failure window significantly and the added parity checking on each block gives me piece of mind that the data is uncorrupted. The 512MB of write back cache makes the ZFS logging fly without having to buy in to expensive SSD drives. I might explore using straight SAS controllers and MPIO with SSD drives for logging in the future once ZFS gets a way to disassociate a logging device from a storage pool after it's been associated in case the SSD device fails. But now things are way off topic. -Ross
Gordon Messmer
2009-Jun-02 16:59 UTC
[CentOS] Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)
On 06/01/2009 07:52 PM, Michael A. Peters wrote:> > I've read a lot of different reports that suggest at this point in time, > kernel software raid is in most cases better than controller raid.There are certainly a lot of people who feel that way. It depends on what your priorities are. Hardware RAID has the advantage of offloading some calculations from the CPU, and has a write cache which can decrease memory use. However, both of those are relatively expensive, and there's no clear evidence that your money is better put into the RAID card than into faster CPU and more memory. Another important consideration is that hardware RAID will (must!) have a battery backup so that any scheduled writes can be completed later in the case of power loss. If you decide to use software RAID, I would strongly advise you to use a UPS, and to make sure your system monitors it and shuts down in the event of power loss.
Chan Chung Hang Christopher
2009-Jun-02 22:15 UTC
[CentOS] Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)
> I've read a lot of different reports that suggest at this point in time, > kernel software raid is in most cases better than controller raid. >Let me define 'most cases' for you. Linux software raid can perform better or the same if you are using raid0/raid1/raid1+0 arrays. If you are using raid5/6 arrays, the most disks are involved, the better hardware raid (those with sufficient processing power and cache - a long time ago software raid 5 beat the pants of hardware raid cards based on Intel i960 chips) will perform. I have already posted on this and there are links to performance tests on this very subject. Let me look for the post.> The basic argument seems to be that CPU's are fast enough now that the > limitation on throughput is the drive itself, and that SATA resolved the > bottleneck that PATA caused with kernel raid. The arguments then go on >Complete bollocks. The bottleneck is not the drives themselves as whether it is SATA/PATA disk drive performance has not changed much which is why 15k RPM disks are still king. The bottleneck is the bus be it PCI-X or PCIe 16x/8x/4x or at least the latencies involved due to bus traffic.> to give numerous examples where a failing hardware raid controller > CAUSED data loss, where a raid card died and an identical raid card had > to be scrounged from eBay to even read the data on the drives, etc. - > problems that apparently don't happen with kernel software raid. > >Buy extra cards. Duh. Easy solution for what can be a very rare problem.