Jonathan Loran
2010-Aug-02 21:49 UTC
[zfs-discuss] Using multiple logs on single SSD devices
Hi ZFS Guru''s. I have a questions, that I can''t find a definitive answer for searching the list. Perhaps I will find true enlightenment here, at least on this question :) I have a T2000 running Sol 10u8. Thus it is is running zpool v15. The system has four SAS bays, two of which are the rpool drives, and the other two are to be used as log devices. This guy is a large NFS server. About four months ago, I did the analysis with zilstat.ksh and determined that a log device would help, which it certainly has. So I have one Intel X25e in use in the third slot. But now we are wanting to add another pool (for various reasons, it is better to keep this new pool separate,but that''s a subject for another email). I want to add a log device for the new pool, which is fine, since I can use the forth and final SAS slot for that. But here''s what''s keeping me up at night: We''re running zpool v15, which as I understand it means if an X25e log fails, the pool is toast. Obviously, the log devices are not mirrored. My bad :( I guess this begs the first question, which is: - if the machine is running, and the log device fails, AND the failure is detected as such, will the ZIL roll back into the main pool drives? If so, are we saved? - Second question, how about this: partition the two X25E drives into two, and then mirror each half of each drive as log devices for each pool. Am I missing something with this scheme? On boot, will the GUID for each pool get found by the system from the partitioned log drives? Please give me your sage advice. Really appreciate it. Jon - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100802/33b23c23/attachment.html>
Edward Ned Harvey
2010-Aug-03 03:18 UTC
[zfs-discuss] Using multiple logs on single SSD devices
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Jonathan Loran > > But here''s what''s keeping me up at night: ?We''re running zpool v15, > which as I understand it means if an X25e log fails, the pool is toast. > ?Obviously, the log devices are not mirrored. ?My bad :( ?I guess this > begs the first question, which is: > > - if the machine is running, and the log device fails, AND the failure > is detected as such, will the ZIL roll back into the main pool drives? > ?If so, are we saved?Because you''re at pool v15, it does not matter if the log device fails while you''re running, or you''re offline and trying to come online, or whatever. Simply if the log device fails, unmirrored, and the version is less than 19, the pool is simply lost. There are supposedly techniques to recover, so it''s not necessarily a "data unrecoverable by any means" situation, but you certainly couldn''t recover without a server crash, or at least shutdown. And it would certainly be a nightmare, at best. The system will not "fall back" to ZIL in the main pool. That was a feature created in v19.> - Second question, how about this: partition the two X25E drives into > two, and then mirror each half of each drive as log devices for each > pool. ?Am I missing something with this scheme? ?On boot, will the GUID > for each pool get found by the system from the partitioned log drives?I''m afraid it''s too late for that, unless you''re willing to destroy & recreate your pool. You cannot remove the existing log device. You cannot shrink it. You cannot replace it with a smaller one. The only things you can do right now are: (a) Start mirroring that log device with another device of the same size or larger. or (b) Buy another SSD which is larger than the first. Create a slice on the 2nd which is equal to the size of the first. Mirror the first onto the slice of the 2nd. After resilver, detach the first drive, and replace it with another one of the larger drives. Slice the 3rd drive just like the 2nd, and mirror the 2nd drive slice onto it. Now you''ve got a mirrored & sliced device, without any downtime, but you had to buy 2x 2x larger drives in order to do it. or (c) Destroy & recreate your whole pool, but learn from your mistake. This time, slice each SSD, and mirror the slices to form the log device. BTW, ask me how I know this in such detail? It''s cuz I made the same mistake last year. There was one interesting possibility we considered, but didn''t actually implement: We are running a stripe of mirrors. We considered the possibility of breaking the mirrors, creating a new pool out of the "other half" using the SSD properly sliced. Using "zfs send" to replicate all the snapshots over to the new pool, up to a very recent time. Then, we''d be able to make a very short service window. Shutdown briefly, send that one final snapshot to the new pool, destroy the old pool, rename the new pool to take the old name, and bring the system back up again. Instead of scheduling a long service window. As soon as the system is up again, start mirroring and resilvering (er ... initial silvering), and of course, slice the SSD before attaching the mirror. Naturally there is some risk, running un-mirrored long enough to send the snaps... and so forth. Anyway, just an option to consider.
Roy Sigurd Karlsbakk
2010-Aug-03 16:29 UTC
[zfs-discuss] Using multiple logs on single SSD devices
- Second question, how about this: partition the two X25E drives into two, and then mirror each half of each drive as log devices for each pool. Am I missing something with this scheme? On boot, will the GUID for each pool get found by the system from the partitioned log drives? IIRC several posts in here, some by Cindy, have been about using devices shared among pools, and what''s said is that this is not recommended because of potential deadlocks. If I were you, I''d get another couple of SSDs for the new pool. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk. -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100803/b137857d/attachment.html>
Jonathan Loran
2010-Aug-03 18:08 UTC
[zfs-discuss] Using multiple logs on single SSD devices
On Aug 2, 2010, at 8:18 PM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Jonathan Loran >> > Because you''re at pool v15, it does not matter if the log device fails while > you''re running, or you''re offline and trying to come online, or whatever. > Simply if the log device fails, unmirrored, and the version is less than 19, > the pool is simply lost. There are supposedly techniques to recover, so > it''s not necessarily a "data unrecoverable by any means" situation, but you > certainly couldn''t recover without a server crash, or at least shutdown. > And it would certainly be a nightmare, at best. The system will not "fall > back" to ZIL in the main pool. That was a feature created in v19.Yes, after sending my query yesterday, I found the zfs best practices guide, which I haven''t read for a long time, many update w/r to SSD devices (many by you Ed, no?). I also found the long thread on this list, which somehow I missed in my first pass about SSD best practices. After reading this, I became much more nervious. My previous assumption when I added the log was based upon the IOP rate I saw to the ZIL, and the number of IOP an Intel X25e could take, and it looked like the drive should last a few years, at least. But of course, that ssumes no other failure modes. Given the high price of failure, now that I know the system will suddenly go south, I realized that action needed to be taken ASAP to mirror the log.> I''m afraid it''s too late for that, unless you''re willing to destroy & > recreate your pool. You cannot remove the existing log device. You cannot > shrink it. You cannot replace it with a smaller one. The only things you > can do right now are: > > (a) Start mirroring that log device with another device of the same size or > larger. > or > (b) Buy another SSD which is larger than the first. Create a slice on the > 2nd which is equal to the size of the first. Mirror the first onto the > slice of the 2nd. After resilver, detach the first drive, and replace it > with another one of the larger drives. Slice the 3rd drive just like the > 2nd, and mirror the 2nd drive slice onto it. Now you''ve got a mirrored & > sliced device, without any downtime, but you had to buy 2x 2x larger drives > in order to do it. > or > (c) Destroy & recreate your whole pool, but learn from your mistake. This > time, slice each SSD, and mirror the slices to form the log device. > > BTW, ask me how I know this in such detail? It''s cuz I made the same > mistake last year. There was one interesting possibility we considered, but > didn''t actually implement: > > We are running a stripe of mirrors. We considered the possibility of > breaking the mirrors, creating a new pool out of the "other half" using the > SSD properly sliced. Using "zfs send" to replicate all the snapshots over > to the new pool, up to a very recent time. > > Then, we''d be able to make a very short service window. Shutdown briefly, > send that one final snapshot to the new pool, destroy the old pool, rename > the new pool to take the old name, and bring the system back up again. > Instead of scheduling a long service window. As soon as the system is up > again, start mirroring and resilvering (er ... initial silvering), and of > course, slice the SSD before attaching the mirror. > > Naturally there is some risk, running un-mirrored long enough to send the > snaps... and so forth. > > Anyway, just an option to consider. >Destroying this pool is very much off the table. It holds home directories for our whole lab, about 375 of them. If I take the system offline, then no one works until it''s back up. You could say this machine is mission critical. The host has been very reliable. Everyone is now spoiled by how it never goes down, and I''m very proud of that fact. The only way I could recreate the pool would be through some clever means like you give, or I thought perhaps using AVS to replicate one side of the mirror, then everything could be done through a quick reboot. One other idea I had was using a sparse zvol for the log, but I think eventually, the sparse volume would fill up beyond its physical capacity. On top of that, this would mean we would have a log that is a zvol from another zpool, which I think could a cause boot race condition. I think the real solution to my immediate problem is this: Bite the bullet, and add storage to the existing pool. It won''t be as clean as I like, and it would disturb my nicely balanced mirror stripe with new large empty vdevs, which I fear could impact performance down the road when the original stripe fills up, and all writes go to the new vdevs. Perhaps by the time that happens, the feature to rebalance the pool will be available, if that''s even being worked on. Maybe that''s wishful thinking. At any rate, if I don''t have to add another pool, I can mirror the logs I have: problem solved. Finally, I''m told by my SE that ZFS in Sol 10 u9 should be well past snv125, and have log device removal. It''s possible that will be released at Openworld next month (?) and I can upgrade this system past zpool v19. Tough choices. Thanks for the help. Jon - _____/ _____/ / - Jonathan Loran - - - / / / IT Manager - - _____ / _____ / / Space Sciences Laboratory, UC Berkeley - / / / (510) 643-5146 jloran at ssl.berkeley.edu - ______/ ______/ ______/ AST:7731^29u18e3 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100803/d46a4ad6/attachment.html>
Richard Elling
2010-Aug-03 20:00 UTC
[zfs-discuss] Using multiple logs on single SSD devices
On Aug 3, 2010, at 9:29 AM, Roy Sigurd Karlsbakk wrote:> > - Second question, how about this: partition the two X25E drives into two, and then mirror each half of each drive as log devices for each pool. Am I missing something with this scheme? On boot, will the GUID for each pool get found by the system from the partitioned log drives? > IIRC several posts in here, some by Cindy, have been about using devices shared among pools, and what''s said is that this is not recommended because of potential deadlocks.No, you misunderstand. The potential deadlock condition occurs when you use ZFS in a single system to act as both the file system and a device. For example, using a zvol on rpool as a ZIL for another pool. For devices themselves, ZFS has absolutely no problem using block devices as presented by partitions or slices. This has been true for all file systems for all time. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com