----- "Edward Ned Harvey" <solaris2 at nedharvey.com> skrev:> > What build were you running? The should have been addressed by > > CR6844090 > > that went into build 117. > > I''m running solaris, but that''s irrelevant. The storagetek array > controller > itself reports the new disk as infinitesimally smaller than the one > which I > want to mirror. Even before the drive is given to the OS, that''s the > way it > is. Sun X4275 server. > > BTW, I''m still degraded. Haven''t found an answer yet, and am > considering > breaking all my mirrors, to create a new pool on the freed disks, and > using > partitions in those disks, for the sake of rebuilding my pool using > partitions on all disks. The aforementioned performance problem is > not as > scary to me as running in degraded redundancy.I would return the drive to get a bigger one before doing something as drastic as that. There might have been a hichup in the production line, and that''s not your fault. roy
> Oh, I managed to find a really good answer to this question. Several > sources all say to do precisely the same procedure, and when I did it > on a > test system, it worked perfectly. Simple and easy to repeat. So I > think > this is the gospel method to create the slices, if you''re going to > createSeems like a clumsy workaround for a hardware problem. It will also disable the drives'' cache, which is not a good idea. Why not just get a new drive? Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
Momentarily, I will begin scouring the omniscient interweb for information, but I''d like to know a little bit of what people would say here. The question is to slice, or not to slice, disks before using them in a zpool. One reason to slice comes from recent personal experience. One disk of a mirror dies. Replaced under contract with an identical disk. Same model number, same firmware. Yet when it''s plugged into the system, for an unknown reason, it appears 0.001 Gb smaller than the old disk, and therefore unable to attach and un-degrade the mirror. It seems logical this problem could have been avoided if the device added to the pool originally had been a slice somewhat smaller than the whole physical device. Say, a slice of 28G out of the 29G physical disk. Because later when I get the infinitesimally smaller disk, I can always slice 28G out of it to use as the mirror device. There is some question about performance. Is there any additional overhead caused by using a slice instead of the whole physical device? There is another question about performance. One of my colleagues said he saw some literature on the internet somewhere, saying ZFS behaves differently for slices than it does on physical devices, because it doesn''t assume it has exclusive access to that physical device, and therefore caches or buffers differently ... or something like that. Any other pros/cons people can think of? And finally, if anyone has experience doing this, and process recommendations? That is ... My next task is to go read documentation again, to refresh my memory from years ago, about the difference between "format," "partition," "label," "fdisk," because those terms don''t have the same meaning that they do in other OSes... And I don''t know clearly right now, which one(s) I want to do, in order to create the large slice of my disks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100402/61db5bfb/attachment.html>
Momentarily, I will begin scouring the omniscient interweb for information, but I''d like to know a little bit of what people would say here. The question is to slice, or not to slice, disks before using them in a zpool. One reason to slice comes from recent personal experience. One disk of a mirror dies. Replaced under contract with an identical disk. Same model number, same firmware. Yet when it''s plugged into the system, for an unknown reason, it appears 0.001 Gb smaller than the old disk, and therefore unable to attach and un-degrade the mirror. It seems logical this problem could have been avoided if the device added to the pool originally had been a slice somewhat smaller than the whole physical device. Say, a slice of 28G out of the 29G physical disk. Because later when I get the infinitesimally smaller disk, I can always slice 28G out of it to use as the mirror device. There is some question about performance. Is there any additional overhead caused by using a slice instead of the whole physical device? There is another question about performance. One of my colleagues said he saw some literature on the internet somewhere, saying ZFS behaves differently for slices than it does on physical devices, because it doesn''t assume it has exclusive access to that physical device, and therefore caches or buffers differently . or something like that. Any other pros/cons people can think of? And finally, if anyone has experience doing this, and process recommendations? That is . My next task is to go read documentation again, to refresh my memory from years ago, about the difference between "format," "partition," "label," "fdisk," because those terms don''t have the same meaning that they do in other OSes. And I don''t know clearly right now, which one(s) I want to do, in order to create the large slice of my disks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100402/bda92807/attachment.html>
This might be unrelated, but along similar lines . I''ve also heard that the risk for unexpected failure of your pool is higher if/when you reach 100% capacity. I''ve heard that you should always create a small ZFS filesystem within a pool, and give it some reserved space, along with the filesystem that you actually plan to use in your pool. Anyone care to offer any comments on that? From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss-bounces at opensolaris.org] On Behalf Of Edward Ned Harvey Sent: Friday, April 02, 2010 5:23 PM To: zfs-discuss at opensolaris.org Subject: [zfs-discuss] To slice, or not to slice Momentarily, I will begin scouring the omniscient interweb for information, but I''d like to know a little bit of what people would say here. The question is to slice, or not to slice, disks before using them in a zpool. One reason to slice comes from recent personal experience. One disk of a mirror dies. Replaced under contract with an identical disk. Same model number, same firmware. Yet when it''s plugged into the system, for an unknown reason, it appears 0.001 Gb smaller than the old disk, and therefore unable to attach and un-degrade the mirror. It seems logical this problem could have been avoided if the device added to the pool originally had been a slice somewhat smaller than the whole physical device. Say, a slice of 28G out of the 29G physical disk. Because later when I get the infinitesimally smaller disk, I can always slice 28G out of it to use as the mirror device. There is some question about performance. Is there any additional overhead caused by using a slice instead of the whole physical device? There is another question about performance. One of my colleagues said he saw some literature on the internet somewhere, saying ZFS behaves differently for slices than it does on physical devices, because it doesn''t assume it has exclusive access to that physical device, and therefore caches or buffers differently . or something like that. Any other pros/cons people can think of? And finally, if anyone has experience doing this, and process recommendations? That is . My next task is to go read documentation again, to refresh my memory from years ago, about the difference between "format," "partition," "label," "fdisk," because those terms don''t have the same meaning that they do in other OSes. And I don''t know clearly right now, which one(s) I want to do, in order to create the large slice of my disks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100402/fef1f317/attachment.html>
On 04/ 3/10 10:23 AM, Edward Ned Harvey wrote:> > Momentarily, I will begin scouring the omniscient interweb for > information, but I?d like to know a little bit of what people would > say here. The question is to slice, or not to slice, disks before > using them in a zpool. >Not.> One reason to slice comes from recent personal experience. One disk of > a mirror dies. Replaced under contract with an identical disk. Same > model number, same firmware. Yet when it?s plugged into the system, > for an unknown reason, it appears 0.001 Gb smaller than the old disk, > and therefore unable to attach and un-degrade the mirror. It seems > logical this problem could have been avoided if the device added to > the pool originally had been a slice somewhat smaller than the whole > physical device. Say, a slice of 28G out of the 29G physical disk. > Because later when I get the infinitesimally smaller disk, I can > always slice 28G out of it to use as the mirror device. >What build were you running? The should have been addressed by CR6844090 that went into build 117.> There is some question about performance. Is there any additional > overhead caused by using a slice instead of the whole physical device? > > There is another question about performance. One of my colleagues said > he saw some literature on the internet somewhere, saying ZFS behaves > differently for slices than it does on physical devices, because it > doesn?t assume it has exclusive access to that physical device, and > therefore caches or buffers differently ? or something like that. >it''s well documented. ZFS won''t attempt to enable the drive''s cache unless it has the physical device. See http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#Storage_Pools -- Ian.
On Fri, Apr 2, 2010 at 2:23 PM, Edward Ned Harvey <guacamole at nedharvey.com>wrote:> There is some question about performance. Is there any additional > overhead caused by using a slice instead of the whole physical device? >zfs will disable the write cache when it''s not working with whole disks, which may reduce performance. You can turn the cache back on however. I don''t remember the exact incantation to do so, but "format -e" springs to mind. And finally, if anyone has experience doing this, and process> recommendations? That is ? My next task is to go read documentation again, > to refresh my memory from years ago, about the difference between ?format,? > ?partition,? ?label,? ?fdisk,? because those terms don?t have the same > meaning that they do in other OSes? And I don?t know clearly right now, > which one(s) I want to do, in order to create the large slice of my disks. >The whole partition vs. slice thing is a bit fuzzy to me, so take this with a grain of salt. You can create partitions using fdisk, or slices using format. The BIOS and other operating systems (windows, linux, etc) will be able to recognize partitions, while they won''t be able to make sense of slices. If you need to boot from the drive or share it with another OS, then partitions are the way to go. If it''s exclusive to solaris, then you can use slices. You can (but shouldn''t) use slices and partitions from the same device (eg: c5t0d0s0 and c5t0d0p0). -B -- Brandon High : bhigh at freaks.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100402/8020b3b2/attachment.html>
On Fri, Apr 2, 2010 at 2:29 PM, Edward Ned Harvey <solaris2 at nedharvey.com>wrote:> I?ve also heard that the risk for unexpected failure of your pool is > higher if/when you reach 100% capacity. I?ve heard that you should always > create a small ZFS filesystem within a pool, and give it some reserved > space, along with the filesystem that you actually plan to use in your > pool. Anyone care to offer any comments on that? >I think you can just create a dataset with a reservation to avoid the issue. As I understand it, zfs doesn''t automatically set aside a few percent of reserved space like ufs does. -B -- Brandon High : bhigh at freaks.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100402/52a334bd/attachment.html>
On Apr 2, 2010, at 2:29 PM, Edward Ned Harvey wrote:> I?ve also heard that the risk for unexpected failure of your pool is higher if/when you reach 100% capacity. I?ve heard that you should always create a small ZFS filesystem within a pool, and give it some reserved space, along with the filesystem that you actually plan to use in your pool. Anyone care to offer any comments on that?Define "failure" in this context? I am not aware of a data loss failure when near full. However, all file systems will experience performance degradation for write operations as they become full. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
> > One reason to slice comes from recent personal experience. One disk > of > > a mirror dies. Replaced under contract with an identical disk. Same > > model number, same firmware. Yet when it''s plugged into the system, > > for an unknown reason, it appears 0.001 Gb smaller than the old disk, > > and therefore unable to attach and un-degrade the mirror. It seems > > logical this problem could have been avoided if the device added to > > the pool originally had been a slice somewhat smaller than the whole > > physical device. Say, a slice of 28G out of the 29G physical disk. > > Because later when I get the infinitesimally smaller disk, I can > > always slice 28G out of it to use as the mirror device. > > > > What build were you running? The should have been addressed by > CR6844090 > that went into build 117.I''m running solaris, but that''s irrelevant. The storagetek array controller itself reports the new disk as infinitesimally smaller than the one which I want to mirror. Even before the drive is given to the OS, that''s the way it is. Sun X4275 server. BTW, I''m still degraded. Haven''t found an answer yet, and am considering breaking all my mirrors, to create a new pool on the freed disks, and using partitions in those disks, for the sake of rebuilding my pool using partitions on all disks. The aforementioned performance problem is not as scary to me as running in degraded redundancy.> it''s well documented. ZFS won''t attempt to enable the drive''s cache > unless it has the physical device. See > > http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide > #Storage_PoolsNice. Thank you.
>> And finally, if anyone has experience doing this, and process >> recommendations?? That isMy next task is to go read documentation>> again, to refresh my memory from years ago, about the difference >> between ?format,? ?partition,? ?label,? ?fdisk,? because those terms >> don?t have the same meaning that they do in other OSes? And I don?t>> know clearly right now, which one(s) I want to do, in order to create >> the large slice of my disks. > > The whole partition vs. slice thing is a bit fuzzy to me, so take this > with a grain of salt. You can create partitions using fdisk, or slices > using format. The BIOS and other operating systems (windows, linux, > etc) will be able to recognize partitions, while they won''t be able to > make sense of slices. If you need to boot from the drive or share it > with another OS, then partitions are the way to go. If it''s exclusive > to solaris, then you can use slices. You can (but shouldn''t) use slices > and partitions from the same device (eg:?c5t0d0s0 and?c5t0d0p0).Oh, I managed to find a really good answer to this question. Several sources all say to do precisely the same procedure, and when I did it on a test system, it worked perfectly. Simple and easy to repeat. So I think this is the gospel method to create the slices, if you''re going to create slices: http://docs.sun.com/app/docs/doc/806-4073/6jd67r9hu and http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Rep lacing.2FRelabeling_the_Root_Pool_Disk
> On Apr 2, 2010, at 2:29 PM, Edward Ned Harvey wrote: > > I''ve also heard that the risk for unexpected failure of your pool is > higher if/when you reach 100% capacity. I''ve heard that you should > always create a small ZFS filesystem within a pool, and give it some > reserved space, along with the filesystem that you actually plan to use > in your pool. Anyone care to offer any comments on that? > > Define "failure" in this context? > > I am not aware of a data loss failure when near full. However, all > file systems > will experience performance degradation for write operations as they > become > full.To tell the truth, I''m not exactly sure. Because I''ve never lost any ZFS pool or filesystem. I only have it deployed on 3 servers, and only one of those gets heavy use. It only filled up once, and it didn''t have any problem. So I''m only trying to understand "the great beyond," that which I have never known myself. Learn from other peoples'' experience, preventively. Yes, I do embrace a lot of voodoo and superstition in doing sysadmin, but that''s just cuz stuff ain''t perfect, and I''ve seen so many things happen that were supposedly not possible. (Not talking about ZFS in that regard... yet.) Well, unless you count the issue I''m having right now, with two identical disks appearing as different sizes... But I don''t think that''s a zfs problem. I recall some discussion either here or on opensolaris-discuss or opensolaris-help, where at least one or a few people said they had some sort of problem or problems, and they were suspicious about the correlation between it happening, and the disk being full. I also recall talking to some random guy at a conference who said something similar. But it''s all vague. I really don''t know. And I have nothing concrete. Hence the post asking for peoples'' comments. Somebody might relate something they experienced less vague than what I know.
> I would return the drive to get a bigger one before doing something as > drastic as that. There might have been a hichup in the production line, > and that''s not your fault.Yeah, but I already have 2 of the replacement disks, both doing the same thing. One has a firmware newer than my old disk (so originally I thought that was the cause, and requested another replacement disk). But then we got a replacement disk which is identical in every way to the failed disk ... but it still appears smaller for some reason. So this happened on my SSD. What''s to prevent it from happening on one of the spindle disks in the future? Nothing that I know of ... So far, the idea of slicing seems to be the only preventive or corrective measure. Hence, wondering what pros/cons people would describe, beyond what I''ve already thought up myself.
On Sat, 3 Apr 2010, Edward Ned Harvey wrote:>> I would return the drive to get a bigger one before doing something as >> drastic as that. There might have been a hichup in the production line, >> and that''s not your fault. > > Yeah, but I already have 2 of the replacement disks, both doing the same > thing. One has a firmware newer than my old disk (so originally I thought > that was the cause, and requested another replacement disk). But then we > got a replacement disk which is identical in every way to the failed disk > ... but it still appears smaller for some reason. > > So this happened on my SSD. What''s to prevent it from happening on one of > the spindle disks in the future? Nothing that I know of ...Just keep in mind that this has been fixed in OpenSolaris for some time, and will surely be fixed in Solaris 10, if not already. The annoying issue is that you probably need to add all of the vdev devices using an OS which already has the fix. I don''t know if it can "repair" a slightly overly-large device. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Fri, Apr 2, 2010 at 4:05 PM, Edward Ned Harvey <guacamole at nedharvey.com>wrote:> Momentarily, I will begin scouring the omniscient interweb for > information, but I?d like to know a little bit of what people would say > here. The question is to slice, or not to slice, disks before using them in > a zpool. > > > > One reason to slice comes from recent personal experience. One disk of a > mirror dies. Replaced under contract with an identical disk. Same model > number, same firmware. Yet when it?s plugged into the system, for an > unknown reason, it appears 0.001 Gb smaller than the old disk, and therefore > unable to attach and un-degrade the mirror. It seems logical this problem > could have been avoided if the device added to the pool originally had been > a slice somewhat smaller than the whole physical device. Say, a slice of > 28G out of the 29G physical disk. Because later when I get the > infinitesimally smaller disk, I can always slice 28G out of it to use as the > mirror device. > > > > There is some question about performance. Is there any additional overhead > caused by using a slice instead of the whole physical device? > > > > There is another question about performance. One of my colleagues said he > saw some literature on the internet somewhere, saying ZFS behaves > differently for slices than it does on physical devices, because it doesn?t > assume it has exclusive access to that physical device, and therefore caches > or buffers differently ? or something like that. > > > > Any other pros/cons people can think of? > > > > And finally, if anyone has experience doing this, and process > recommendations? That is ? My next task is to go read documentation again, > to refresh my memory from years ago, about the difference between ?format,? > ?partition,? ?label,? ?fdisk,? because those terms don?t have the same > meaning that they do in other OSes? And I don?t know clearly right now, > which one(s) I want to do, in order to create the large slice of my disks. > >Your experience is exactly why I suggested ZFS start doing some "right sizing" if you will. Chop off a bit from the end of any disk so that we''re guaranteed to be able to replace drives from different manufacturers. The excuse being "no reason to, Sun drives are always of identical size". If your drives did indeed come from Sun, their response is clearly not true. Regardless, I guess I still think it should be done. Figure out what the greatest variation we''ve seen from drives that are supposedly of the exact same size, and chop it off the end of every disk. I''m betting it''s no more than 1GB, and probably less than that. When we''re talking about a 2TB drive, I''m willing to give up a gig to be guaranteed I won''t have any issues when it comes time to swap it out. --Tim --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100403/d54fee4b/attachment.html>
On 03/04/2010 19:24, Tim Cook wrote:> > > On Fri, Apr 2, 2010 at 4:05 PM, Edward Ned Harvey > <guacamole at nedharvey.com <mailto:guacamole at nedharvey.com>> wrote: > > Momentarily, I will begin scouring the omniscient interweb for > information, but I?d like to know a little bit of what people > would say here. The question is to slice, or not to slice, disks > before using them in a zpool. > > One reason to slice comes from recent personal experience. One > disk of a mirror dies. Replaced under contract with an identical > disk. Same model number, same firmware. Yet when it?s plugged > into the system, for an unknown reason, it appears 0.001 Gb > smaller than the old disk, and therefore unable to attach and > un-degrade the mirror. It seems logical this problem could have > been avoided if the device added to the pool originally had been a > slice somewhat smaller than the whole physical device. Say, a > slice of 28G out of the 29G physical disk. Because later when I > get the infinitesimally smaller disk, I can always slice 28G out > of it to use as the mirror device. > > There is some question about performance. Is there any additional > overhead caused by using a slice instead of the whole physical device? > > There is another question about performance. One of my colleagues > said he saw some literature on the internet somewhere, saying ZFS > behaves differently for slices than it does on physical devices, > because it doesn?t assume it has exclusive access to that physical > device, and therefore caches or buffers differently ? or something > like that. > > Any other pros/cons people can think of? > > And finally, if anyone has experience doing this, and process > recommendations? That is ? My next task is to go read > documentation again, to refresh my memory from years ago, about > the difference between ?format,? ?partition,? ?label,? ?fdisk,? > because those terms don?t have the same meaning that they do in > other OSes? And I don?t know clearly right now, which one(s) I > want to do, in order to create the large slice of my disks. > > > Your experience is exactly why I suggested ZFS start doing some "right > sizing" if you will. Chop off a bit from the end of any disk so that > we''re guaranteed to be able to replace drives from different > manufacturers. The excuse being "no reason to, Sun drives are always > of identical size". If your drives did indeed come from Sun, their > response is clearly not true. Regardless, I guess I still think it > should be done. Figure out what the greatest variation we''ve seen > from drives that are supposedly of the exact same size, and chop it > off the end of every disk. I''m betting it''s no more than 1GB, and > probably less than that. When we''re talking about a 2TB drive, I''m > willing to give up a gig to be guaranteed I won''t have any issues when > it comes time to swap it out. > >that''s what open solaris is doing more or less for some time now. look in the archives of this mailing list for more information. -- Robert Milkowski http://milek.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100404/d2497e87/attachment.html>
On Sat, Apr 3, 2010 at 6:53 PM, Robert Milkowski <milek at task.gda.pl> wrote:> On 03/04/2010 19:24, Tim Cook wrote: > > > > On Fri, Apr 2, 2010 at 4:05 PM, Edward Ned Harvey <guacamole at nedharvey.com > > wrote: > >> Momentarily, I will begin scouring the omniscient interweb for >> information, but I?d like to know a little bit of what people would say >> here. The question is to slice, or not to slice, disks before using them in >> a zpool. >> >> >> >> One reason to slice comes from recent personal experience. One disk of a >> mirror dies. Replaced under contract with an identical disk. Same model >> number, same firmware. Yet when it?s plugged into the system, for an >> unknown reason, it appears 0.001 Gb smaller than the old disk, and therefore >> unable to attach and un-degrade the mirror. It seems logical this problem >> could have been avoided if the device added to the pool originally had been >> a slice somewhat smaller than the whole physical device. Say, a slice of >> 28G out of the 29G physical disk. Because later when I get the >> infinitesimally smaller disk, I can always slice 28G out of it to use as the >> mirror device. >> >> >> >> There is some question about performance. Is there any additional >> overhead caused by using a slice instead of the whole physical device? >> >> >> >> There is another question about performance. One of my colleagues said he >> saw some literature on the internet somewhere, saying ZFS behaves >> differently for slices than it does on physical devices, because it doesn?t >> assume it has exclusive access to that physical device, and therefore caches >> or buffers differently ? or something like that. >> >> >> >> Any other pros/cons people can think of? >> >> >> >> And finally, if anyone has experience doing this, and process >> recommendations? That is ? My next task is to go read documentation again, >> to refresh my memory from years ago, about the difference between ?format,? >> ?partition,? ?label,? ?fdisk,? because those terms don?t have the same >> meaning that they do in other OSes? And I don?t know clearly right now, >> which one(s) I want to do, in order to create the large slice of my disks. >> > > Your experience is exactly why I suggested ZFS start doing some "right > sizing" if you will. Chop off a bit from the end of any disk so that we''re > guaranteed to be able to replace drives from different manufacturers. The > excuse being "no reason to, Sun drives are always of identical size". If > your drives did indeed come from Sun, their response is clearly not true. > Regardless, I guess I still think it should be done. Figure out what the > greatest variation we''ve seen from drives that are supposedly of the exact > same size, and chop it off the end of every disk. I''m betting it''s no more > than 1GB, and probably less than that. When we''re talking about a 2TB > drive, I''m willing to give up a gig to be guaranteed I won''t have any issues > when it comes time to swap it out. > > > that''s what open solaris is doing more or less for some time now. > > look in the archives of this mailing list for more information. > -- > Robert Milkowski > http://milek.blogspot.com > >Since when? It isn''t doing it on any of my drives, build 134, and judging by the OP''s issues, it isn''t doing it for him either... I try to follow this list fairly closely and I''ve never seen anyone at Sun/Oracle say they were going to start doing it after I was shot down the first time. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100403/606fb598/attachment.html>
On Sat, Apr 3, 2010 at 7:50 PM, Tim Cook <tim at cook.ms> wrote:> > > On Sat, Apr 3, 2010 at 6:53 PM, Robert Milkowski <milek at task.gda.pl>wrote: > >> On 03/04/2010 19:24, Tim Cook wrote: >> >> >> >> On Fri, Apr 2, 2010 at 4:05 PM, Edward Ned Harvey < >> guacamole at nedharvey.com> wrote: >> >>> Momentarily, I will begin scouring the omniscient interweb for >>> information, but I?d like to know a little bit of what people would say >>> here. The question is to slice, or not to slice, disks before using them in >>> a zpool. >>> >>> >>> >>> One reason to slice comes from recent personal experience. One disk of a >>> mirror dies. Replaced under contract with an identical disk. Same model >>> number, same firmware. Yet when it?s plugged into the system, for an >>> unknown reason, it appears 0.001 Gb smaller than the old disk, and therefore >>> unable to attach and un-degrade the mirror. It seems logical this problem >>> could have been avoided if the device added to the pool originally had been >>> a slice somewhat smaller than the whole physical device. Say, a slice of >>> 28G out of the 29G physical disk. Because later when I get the >>> infinitesimally smaller disk, I can always slice 28G out of it to use as the >>> mirror device. >>> >>> >>> >>> There is some question about performance. Is there any additional >>> overhead caused by using a slice instead of the whole physical device? >>> >>> >>> >>> There is another question about performance. One of my colleagues said >>> he saw some literature on the internet somewhere, saying ZFS behaves >>> differently for slices than it does on physical devices, because it doesn?t >>> assume it has exclusive access to that physical device, and therefore caches >>> or buffers differently ? or something like that. >>> >>> >>> >>> Any other pros/cons people can think of? >>> >>> >>> >>> And finally, if anyone has experience doing this, and process >>> recommendations? That is ? My next task is to go read documentation again, >>> to refresh my memory from years ago, about the difference between ?format,? >>> ?partition,? ?label,? ?fdisk,? because those terms don?t have the same >>> meaning that they do in other OSes? And I don?t know clearly right now, >>> which one(s) I want to do, in order to create the large slice of my disks. >>> >> >> Your experience is exactly why I suggested ZFS start doing some "right >> sizing" if you will. Chop off a bit from the end of any disk so that we''re >> guaranteed to be able to replace drives from different manufacturers. The >> excuse being "no reason to, Sun drives are always of identical size". If >> your drives did indeed come from Sun, their response is clearly not true. >> Regardless, I guess I still think it should be done. Figure out what the >> greatest variation we''ve seen from drives that are supposedly of the exact >> same size, and chop it off the end of every disk. I''m betting it''s no more >> than 1GB, and probably less than that. When we''re talking about a 2TB >> drive, I''m willing to give up a gig to be guaranteed I won''t have any issues >> when it comes time to swap it out. >> >> >> that''s what open solaris is doing more or less for some time now. >> >> look in the archives of this mailing list for more information. >> -- >> Robert Milkowski >> http://milek.blogspot.com >> >> > > Since when? It isn''t doing it on any of my drives, build 134, and judging > by the OP''s issues, it isn''t doing it for him either... I try to follow this > list fairly closely and I''ve never seen anyone at Sun/Oracle say they were > going to start doing it after I was shot down the first time. > > --Tim >Oh... and after 15 minutes of searching for everything from ''right-sizing'' to ''block reservation'' to ''replacement disk smaller size fewer blocks'' etc. etc. I don''t see a single thread on it. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100403/86dfa463/attachment.html>
On Apr 3, 2010, at 5:56 PM, Tim Cook wrote:> > On Sat, Apr 3, 2010 at 7:50 PM, Tim Cook <tim at cook.ms> wrote: >> Your experience is exactly why I suggested ZFS start doing some "right sizing" if you will. Chop off a bit from the end of any disk so that we''re guaranteed to be able to replace drives from different manufacturers. The excuse being "no reason to, Sun drives are always of identical size". If your drives did indeed come from Sun, their response is clearly not true. Regardless, I guess I still think it should be done. Figure out what the greatest variation we''ve seen from drives that are supposedly of the exact same size, and chop it off the end of every disk. I''m betting it''s no more than 1GB, and probably less than that. When we''re talking about a 2TB drive, I''m willing to give up a gig to be guaranteed I won''t have any issues when it comes time to swap it out. >> >> > that''s what open solaris is doing more or less for some time now. > > look in the archives of this mailing list for more information. > -- > Robert Milkowski > http://milek.blogspot.com > > > > Since when? It isn''t doing it on any of my drives, build 134, and judging by the OP''s issues, it isn''t doing it for him either... I try to follow this list fairly closely and I''ve never seen anyone at Sun/Oracle say they were going to start doing it after I was shot down the first time. > > --Tim > > > Oh... and after 15 minutes of searching for everything from ''right-sizing'' to ''block reservation'' to ''replacement disk smaller size fewer blocks'' etc. etc. I don''t see a single thread on it.CR 6844090, zfs should be able to mirror to a smaller disk http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 b117, June 2009 -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
On Apr 2, 2010, at 2:05 PM, Edward Ned Harvey wrote:> Momentarily, I will begin scouring the omniscient interweb for information, but I?d like to know a little bit of what people would say here. The question is to slice, or not to slice, disks before using them in a zpool. > > One reason to slice comes from recent personal experience. One disk of a mirror dies. Replaced under contract with an identical disk. Same model number, same firmware. Yet when it?s plugged into the system, for an unknown reason, it appears 0.001 Gb smaller than the old disk, and therefore unable to attach and un-degrade the mirror. It seems logical this problem could have been avoided if the device added to the pool originally had been a slice somewhat smaller than the whole physical device. Say, a slice of 28G out of the 29G physical disk. Because later when I get the infinitesimally smaller disk, I can always slice 28G out of it to use as the mirror device.If the HBA is configured for RAID mode, then it will reserve some space on disk for its metadata. This occurs no matter what type of disk you attach.> There is some question about performance. Is there any additional overhead caused by using a slice instead of the whole physical device?No.> There is another question about performance. One of my colleagues said he saw some literature on the internet somewhere, saying ZFS behaves differently for slices than it does on physical devices, because it doesn?t assume it has exclusive access to that physical device, and therefore caches or buffers differently ? or something like that. > > Any other pros/cons people can think of?If the disk is only used for ZFS, then it is ok to enable volatile disk write caching if the disk also supports write cache flush requests. If the disk is shared with UFS, then it is not ok to enable volatile disk write caching. -- richard> And finally, if anyone has experience doing this, and process recommendations? That is ? My next task is to go read documentation again, to refresh my memory from years ago, about the difference between ?format,? ?partition,? ?label,? ?fdisk,? because those terms don?t have the same meaning that they do in other OSes? And I don?t know clearly right now, which one(s) I want to do, in order to create the large slice of my disks. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
On Sat, Apr 3, 2010 at 9:52 PM, Richard Elling <richard.elling at gmail.com>wrote:> On Apr 3, 2010, at 5:56 PM, Tim Cook wrote: > > > > On Sat, Apr 3, 2010 at 7:50 PM, Tim Cook <tim at cook.ms> wrote: > >> Your experience is exactly why I suggested ZFS start doing some "right > sizing" if you will. Chop off a bit from the end of any disk so that we''re > guaranteed to be able to replace drives from different manufacturers. The > excuse being "no reason to, Sun drives are always of identical size". If > your drives did indeed come from Sun, their response is clearly not true. > Regardless, I guess I still think it should be done. Figure out what the > greatest variation we''ve seen from drives that are supposedly of the exact > same size, and chop it off the end of every disk. I''m betting it''s no more > than 1GB, and probably less than that. When we''re talking about a 2TB > drive, I''m willing to give up a gig to be guaranteed I won''t have any issues > when it comes time to swap it out. > >> > >> > > that''s what open solaris is doing more or less for some time now. > > > > look in the archives of this mailing list for more information. > > -- > > Robert Milkowski > > http://milek.blogspot.com > > > > > > > > Since when? It isn''t doing it on any of my drives, build 134, and > judging by the OP''s issues, it isn''t doing it for him either... I try to > follow this list fairly closely and I''ve never seen anyone at Sun/Oracle say > they were going to start doing it after I was shot down the first time. > > > > --Tim > > > > > > Oh... and after 15 minutes of searching for everything from > ''right-sizing'' to ''block reservation'' to ''replacement disk smaller size > fewer blocks'' etc. etc. I don''t see a single thread on it. > > CR 6844090, zfs should be able to mirror to a smaller disk > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 > b117<http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090%0Ab117>, > June 2009 > -- richard > >Unless the bug description is incomplete, that''s talking about adding a mirror to an existing drive. Not about replacing a failed drive in an existing vdev that could be raid-z#. I''m almost positive I had an issue post b117 with replacing a failed drive in a raid-z2 vdev. I''ll have to see if I can dig up a system to test the theory on. --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100403/e96cb7a4/attachment.html>
On Apr 3, 2010, at 8:00 PM, Tim Cook wrote:> On Sat, Apr 3, 2010 at 9:52 PM, Richard Elling <richard.elling at gmail.com> wrote: > On Apr 3, 2010, at 5:56 PM, Tim Cook wrote: > > > > On Sat, Apr 3, 2010 at 7:50 PM, Tim Cook <tim at cook.ms> wrote: > >> Your experience is exactly why I suggested ZFS start doing some "right sizing" if you will. Chop off a bit from the end of any disk so that we''re guaranteed to be able to replace drives from different manufacturers. The excuse being "no reason to, Sun drives are always of identical size". If your drives did indeed come from Sun, their response is clearly not true. Regardless, I guess I still think it should be done. Figure out what the greatest variation we''ve seen from drives that are supposedly of the exact same size, and chop it off the end of every disk. I''m betting it''s no more than 1GB, and probably less than that. When we''re talking about a 2TB drive, I''m willing to give up a gig to be guaranteed I won''t have any issues when it comes time to swap it out. > >> > >> > > that''s what open solaris is doing more or less for some time now. > > > > look in the archives of this mailing list for more information. > > -- > > Robert Milkowski > > http://milek.blogspot.com > > > > > > > > Since when? It isn''t doing it on any of my drives, build 134, and judging by the OP''s issues, it isn''t doing it for him either... I try to follow this list fairly closely and I''ve never seen anyone at Sun/Oracle say they were going to start doing it after I was shot down the first time. > > > > --Tim > > > > > > Oh... and after 15 minutes of searching for everything from ''right-sizing'' to ''block reservation'' to ''replacement disk smaller size fewer blocks'' etc. etc. I don''t see a single thread on it. > > CR 6844090, zfs should be able to mirror to a smaller disk > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 > b117, June 2009 > -- richard > > > > Unless the bug description is incomplete, that''s talking about adding a mirror to an existing drive. Not about replacing a failed drive in an existing vdev that could be raid-z#. I''m almost positive I had an issue post b117 with replacing a failed drive in a raid-z2 vdev.It is the same code. That said, I have experimented with various cases and I have not found prediction of tolerable size difference to be easy.> I''ll have to see if I can dig up a system to test the theory on.Works fine. # ramdiskadm -a rd1 100000k /dev/ramdisk/rd1 # ramdiskadm -a rd2 100000k /dev/ramdisk/rd2 # ramdiskadm -a rd3 100000k /dev/ramdisk/rd3 # ramdiskadm -a rd4 99900k /dev/ramdisk/rd4 # zpool create -o cachefile=none zwimming raidz /dev/ramdisk/rd1 /dev/ramdisk/rd2 /dev/ramdisk/rd3 # zpool status zwimming pool: zwimming state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zwimming ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 /dev/ramdisk/rd1 ONLINE 0 0 0 /dev/ramdisk/rd2 ONLINE 0 0 0 /dev/ramdisk/rd3 ONLINE 0 0 0 errors: No known data errors # zpool replace zwimming /dev/ramdisk/rd3 /dev/ramdisk/rd4 # zpool status zwimming pool: zwimming state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Sat Apr 3 20:08:35 2010 config: NAME STATE READ WRITE CKSUM zwimming ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 /dev/ramdisk/rd1 ONLINE 0 0 0 /dev/ramdisk/rd2 ONLINE 0 0 0 /dev/ramdisk/rd4 ONLINE 0 0 0 45K resilvered errors: No known data errors -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
> Your experience is exactly why I suggested ZFS start doing some "right > sizing" if you will. ?Chop off a bit from the end of any disk so that > we''re guaranteed to be able to replace drives from different > manufacturers. ?The excuse being "no reason to, Sun drives are always > of identical size". ?If your drives did indeed come from Sun, their > response is clearly not true. ?Regardless, I guess I still think it > should be done. ?Figure out what the greatest variation we''ve seen from > drives that are supposedly?of?the exact same size, and chop it off the > end of every disk. ?I''m betting it''s no more than 1GB, and probably > less than that. ?When we''re talking about a 2TB drive, I''m willing to > give up a gig to be guaranteed I won''t have any issues when it comes > time to swap it out.My disks are sun branded intel disks. Same model number. The first replacement disk had a newer firmware, so we jumped to conclusion that was the cause of the problem, and caused oracle plenty of trouble in locating an older firmware drive in some warehouse somewhere. But the second replacement disk is truly identical to the original. Same firmware and everything. Only the serial number is different. Still the same problem behavior. I have reason to believe that both the drive, and the OS are correct. I have suspicion that the HBA simply handled the creation of this volume somehow differently than how it handled the original. Don''t know the answer for sure yet. Either way, yes, I would love zpool to automatically waste a little space at the end of the drive, to avoid this sort of situation, whether it''s caused by drive manufacturers, or HBA, or any other factor.
> CR 6844090, zfs should be able to mirror to a smaller disk > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 > b117, June 2009Awesome. Now if someone would only port that to solaris, I''d be a happy man. ;-)
On Sun, Apr 4, 2010 at 9:46 PM, Edward Ned Harvey <solaris2 at nedharvey.com>wrote:> > CR 6844090, zfs should be able to mirror to a smaller disk > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 > > b117, June 2009 > > Awesome. Now if someone would only port that to solaris, I''d be a happy > man. ;-) > >Have you tried pointing that bug out to the support engineers who have your case at Oracle? If the fixed code is already out there, it''s just a matter of porting the code, right? :) --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100404/573234ac/attachment.html>
> > There is some question about performance. Is there any additional > overhead caused by using a slice instead of the whole physical device? > > No. > > If the disk is only used for ZFS, then it is ok to enable volatile disk > write caching > if the disk also supports write cache flush requests. > > If the disk is shared with UFS, then it is not ok to enable volatile > disk write caching.Thank you. If you don''t know the answer to this off the top of your head, I''ll go attempt the internet, but thought you might just know the answer in 2 seconds ... Assuming the disk''s write cache is disabled because of the slice (as documented in the Best Practices Guide) how do you enable it? I would only be using ZFS on the drive. The existence of a slice is purely to avoid future mirror problems and the like.
I haven''t taken that approach, but I guess I''ll give it a try. From: Tim Cook [mailto:tim at cook.ms] Sent: Sunday, April 04, 2010 11:00 PM To: Edward Ned Harvey Cc: Richard Elling; zfs-discuss at opensolaris.org Subject: Re: [zfs-discuss] To slice, or not to slice On Sun, Apr 4, 2010 at 9:46 PM, Edward Ned Harvey <solaris2 at nedharvey.com> wrote:> CR 6844090, zfs should be able to mirror to a smaller disk > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6844090 > b117, June 2009Awesome. Now if someone would only port that to solaris, I''d be a happy man. ;-) Have you tried pointing that bug out to the support engineers who have your case at Oracle? If the fixed code is already out there, it''s just a matter of porting the code, right? :) --Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100404/724705b6/attachment.html>
On Apr 4, 2010, at 8:11 PM, Edward Ned Harvey wrote:>>> There is some question about performance. Is there any additional >> overhead caused by using a slice instead of the whole physical device? >> >> No. >> >> If the disk is only used for ZFS, then it is ok to enable volatile disk >> write caching >> if the disk also supports write cache flush requests. >> >> If the disk is shared with UFS, then it is not ok to enable volatile >> disk write caching. > > Thank you. If you don''t know the answer to this off the top of your head, > I''ll go attempt the internet, but thought you might just know the answer in > 2 seconds ... > > Assuming the disk''s write cache is disabled because of the slice (as > documented in the Best Practices Guide) how do you enable it? I would only > be using ZFS on the drive. The existence of a slice is purely to avoid > future mirror problems and the like.This is a trick question -- some drives ignore efforts to disable the write cache :-P Use "format -e" for access to the expert mode where you can enable the write cache. As for performance benefits, YMMV. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com
> I have reason to believe that both the drive, and the OS are correct. > I have suspicion that the HBA simply handled the creation of this > volume somehow differently than how it handled the original. Don''t > know the answer for sure yet.Ok, that''s confirmed now. Apparently when the drives ship from the factory, they''re pre-initialized for the HBA, so the HBA happily imports them and "creates simple volume" (aka jbod) using the factory initialization. Unfortunately, the factory init includes HBA metadata at both the start and end of the drive ... so I lose 1MB. The fix to the problem is to initialize the disk again with the HBA, and then create a new simple volume.