This morning as I was reading USENIX conference summaries which suggested that maybe SATA/SAS is not an optimimum interface for SSDs it came to mind that some out-of-the-box thinking is needed for hard drives as well. Hard drive storage densities have been increasing dramatically so that latest SATA drives are measured in terrabytes, just they were measured in gigabytes some years ago. A problem with huge hard drives is that the resilver times increase with drive size. Failure of hard drives with sizes in the terrabyte range lead to a long wait. Hard drives are comprised of multiple platters, with typically an independently navigated head on each side. Due to a mix of hardware and firmware, these disparate platters and heads are exposed as a simple logical linear device comprised of blocks. If one side of a platter, or a drive head fails, then the whole drive fails. My understanding is that most drives stripe logical blocks across the various platters such that the lower block addresses are on the outer edge of the disks to achieve fastest I/O transfer rate. This approach is great for large linear writes, but is not so great for random I/O, is not so great for when data becomes spread across the disk, or when the disk becomes almost full. The thought I had this morning is that perhaps the firmware on the disk drive can be updated to create a logical disk drive appareance for each drive head. Any bad block management (if enabled) would be done using the same platter side. With this approach a single physical drive could appear like two, four, or eight logical drives. ZFS is really good about scheduling I/O across many drives. Provided that care is taken to ensure that redundant data is appropriately distributed, it seems like subdividing the drives like this would allow ZFS to offer considerably improved performance, and resilver time of a logical drive would be reduced since it is smaller. If a drive head fails, then that logical drive could be marked permanently out of service, but the whole drive would not need to be immediately resigned to the dumpster. Does anyone have thoughts on the viability of this approach? Can existing drives be effectively subdivided like this by simply updating drive firmware? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Fri, May 1 at 11:44, Bob Friesenhahn wrote:> Hard drives are comprised of multiple platters, with typically an > independently navigated head on each side.This is a gap in your assumptions I believe. The headstack is a single physical entity, so all heads move in unison to the same position on all surfaces at the same time. Additionally, hard drives typically have a single channel, meaning only one head can be active at a time. With the nature of embedded position information on the same surface that contains user data, they haven''t come up with a practical design for doing multiple concurrent reads from different places. At least one vendor (connor?) tried to do a 2-actuator disk drive, and it was a mechanical resonance nightmare for the servo systems. I think that what you''re looking for, however, is already happening, with server farms moving to multiple 2.5" drives from the larger 3.5" drives. Even on SATA drives, with NCQ the rotational speed doesn''t matter as much for overall throughput, so there are a growing number of server applications that will be utilizing traditional "laptop" form factor devices, to increase the spindle:capacity ratio without blowing out their space budget. SAS and SATA are both shipping greater and greater volumes of SFF devices. For the budget minded, a 2U server with a bunch of mirrored-pair 2.5" laptop drives is a nice platform, since you can fit 8-12 spindles in that box. The storage per unit volume is basically identical, just that you get 2-4x the spindle count. --eric -- Eric D. Mudama edmudama at mail.bounceswoosh.org
>>>>> "edm" == Eric D Mudama <edmudama at bounceswoosh.org> writes:>> Hard drives are comprised of multiple platters, with typically >> an independently navigated head on each side. edm> This is a gap in your assumptions I believe. edm> The headstack is a single physical entity, so all heads move edm> in unison to the same position on all surfaces at the same edm> time. yes but AIUI switching heads requires resettling into the new track. The cylinders are not really cylindrical, just because of wear or temperature or whatever, so when switching heads the ``channel'''' has to use data from the head as part of a servo loop to settle on the other surface''s track. I guess the rules do keep changing though. edm> I think that what you''re looking for, however, is already edm> happening, with server farms moving to multiple 2.5" drives yeah but you''re reading him wrong. He is saying a failed drive may still be useful if you just avoid the one failed head. The problem currently is the LBA''s are laced through each cylinder, which is worth doing so that things like short-stroking make sense to reduce head movement. If you re-swizzled the LBA''s so that instead they filled each side of each platter in turn, like a dual-layer DVD, it wouldn''t change sequential throughput at all, and would have the benefit that ZFS''s existing tendency to put redundant metadata copies far apart in LBA would end up getting them on different heads, which actually *is* helpful given known failure modes tend to be head crashing, head falling off, u.s.w. I think the idea is doomed firstly because these days when a single head goes bad, the drive firmware, host adapter, driver, and even the zfs maintenance commands, all the way up the storage stack to the sysadmin''s keyboard, all shit their pants and become useless. You have to find the bad drive, remove it, then move on. Secondly I''m not sure I buy the USENIX claim that you can limp along less one head. The last failed drive I took apart, was indeed failed on just one head, but it had scraped all the rust off the platter (down to glass! it was really glass!), and the inside of the thing was filled with microscopic grey facepaint. It had slathered the air filtering pillow and coated all kinds of other surfaces. so...I would expect the other recording surfaces were not doing too well either, but I could be wrong. It does match experience, though, of drives going from partly-failed to completely-failed in a day or a week. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 304 bytes Desc: not available URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20090501/62113d35/attachment.bin>
On Fri, 1 May 2009, Eric D. Mudama wrote:> On Fri, May 1 at 11:44, Bob Friesenhahn wrote: >> Hard drives are comprised of multiple platters, with typically an >> independently navigated head on each side. > > This is a gap in your assumptions I believe. > > The headstack is a single physical entity, so all heads move in unison > to the same position on all surfaces at the same time.Ahhh. I see. That would explain why the idea has not been explored already. :-)> I think that what you''re looking for, however, is already happening, > with server farms moving to multiple 2.5" drives from the larger 3.5" > drives. Even on SATA drives, with NCQ the rotational speed doesn''tYes. I was hoping to hasten things along with a firmware/software update rather than forklift replacement. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Fri, May 1 at 14:19, Miles Nordin wrote:>Secondly I''m not sure I buy the USENIX claim that you can limp along >less one head. The last failed drive I took apart, was indeed failed >on just one head, but it had scraped all the rust off the platter >(down to glass! it was really glass!), and the inside of the thing >was filled with microscopic grey facepaint. It had slathered the air >filtering pillow and coated all kinds of other surfaces. so...I would >expect the other recording surfaces were not doing too well either, >but I could be wrong. It does match experience, though, of drives >going from partly-failed to completely-failed in a day or a week.Your point here is 100% accurate. Any physical damage inside the drive, even if initially constrained to a single head, quickly becomes a huge problem for everything inside the drive. Once you''re looking for physically isolated heads and platters, you might as well just buy multiple smaller drives. -- Eric D. Mudama edmudama at mail.bounceswoosh.org