Has anyone seen a resilver longer than this for a 500G drive in a riadz2 vdev? scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37 2011 c0t0d0 ONLINE 0 0 0 769G resilvered and I told the client it would take 3 to 4 days! :) -- Ian.
> Has anyone seen a resilver longer than this for a 500G drive in a > riadz2 vdev? > > scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 > 19:57:37 2011 > c0t0d0 ONLINE 0 0 0 769G resilvered > > and I told the client it would take 3 to 4 days!It all depends on the number of drives in the VDEV(s), traffic patterns during resilver, speed VDEV fill, of drives etc. Still, close to 6 days is a lot. Can you detail your configuration? Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
769G resilvered on a 500G drive? I''m guessing there was a whole bunch of activity (and probably snapshot creation) happening alongside the resilver. On 20 March 2011 18:57, Ian Collins <ian at ianshome.com> wrote:> Has anyone seen a resilver longer than this for a 500G drive in a riadz2 > vdev? > > scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 > 19:57:37 2011 > c0t0d0 ONLINE 0 0 0 769G resilvered > > and I told the client it would take 3 to 4 days! > > :) > > -- > Ian. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110320/c8fcd328/attachment-0001.html>
On Mar 20, 2011, at 4:33 AM, Roy Sigurd Karlsbakk <roy at karlsbakk.net> wrote:>> Has anyone seen a resilver longer than this for a 500G drive in a >> riadz2 vdev?Depending on the ZFS implementation, this is expected. Later builds have the resilver throttle.>> scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 >> 19:57:37 2011 >> c0t0d0 ONLINE 0 0 0 769G resilvered >> >> and I told the client it would take 3 to 4 days! > > It all depends on the number of drives in the VDEV(s), traffic patterns during resilver, speed VDEV fill, of drives etc. Still, close to 6 days is a lot. Can you detail your configuration?How many times do we have to rehash this? The speed of resilver is dependent on the amount of data, the distribution of data on the resilvering device, speed of the resilvering device, and the throttle. It is NOT dependent on the number of drives in the vdev. -- richard>
> > It all depends on the number of drives in the VDEV(s), traffic > > patterns during resilver, speed VDEV fill, of drives etc. Still, > > close to 6 days is a lot. Can you detail your configuration? > > How many times do we have to rehash this? The speed of resilver is > dependent on the amount of data, the distribution of data on the > resilvering > device, speed of the resilvering device, and the throttle. It is NOT > dependent > on the number of drives in the vdev.Thanks for clearing this up - I''ve been told large VDEVs lead to long resilver times, but then, I guess that was wrong. Btw after replacing some 2TB drives with 3TB ones in three VDEVs that were 95% full at the time, resilver times dropped by 30%, so I guess very full VDEVs aren''t much fun even on the resilver side. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On Mar 20, 2011, at 14:24, Roy Sigurd Karlsbakk wrote:>>> It all depends on the number of drives in the VDEV(s), traffic >>> patterns during resilver, speed VDEV fill, of drives etc. Still, >>> close to 6 days is a lot. Can you detail your configuration? >> >> How many times do we have to rehash this? The speed of resilver is >> dependent on the amount of data, the distribution of data on the >> resilvering device, speed of the resilvering device, and the throttle. It is NOT >> dependent on the number of drives in the vdev. > > Thanks for clearing this up - I''ve been told large VDEVs lead to long resilver times, but then, I guess that was wrong.There was a thread ("Suggested RaidZ configuration...") a little while back where the topic of IOps and resilver time came up: http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/thread.html#44633 I think this message by Erik Trimble is a good summary:> Scenario 1: I have 5 1TB disks in a raidz1, and I assume I have 128k slab sizes. Thus, I have 32k of data for each slab written to each disk. (4x32k data + 32k parity for a 128k slab size). So, each IOPS gets to reconstruct 32k of data on the failed drive. It thus takes about 1TB/32k = 31e6 IOPS to reconstruct the full 1TB drive. > > Scenario 2: I have 10 1TB drives in a raidz1, with the same 128k slab sizes. In this case, there''s only about 14k of data on each drive for a slab. This means, each IOPS to the failed drive only write 14k. So, it takes 1TB/14k = 71e6 IOPS to complete. > > From this, it can be pretty easy to see that the number of required IOPS to the resilvered disk goes up linearly with the number of data drives in a vdev. Since you''re always going to be IOPS bound by the single disk resilvering, you have a fixed limit.http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/044660.html Also, a post by Jeff Bonwick on resilvering: http://blogs.sun.com/bonwick/entry/smokin_mirrors Between Richard''s and Eric''s statements, I would say that while resilver time is not dependent "number of drives in the vdev", the pool configuration can affect the IOps rate, and /that/ can affect the time it takes to finish a resilver. Is that a decent summary? I think maybe the "number of drives in the vdev" perhaps come into play because that when people have a lot of disks, they often put them into RAIDZ[123] configurations. So it''s just a matter of confusing the (IOps limiting) configuration with the fact that one may have many disks.
> I think maybe the "number of drives in the vdev" perhaps come into > play because that when people have a lot of disks, they often put them > into RAIDZ[123] configurations. So it''s just a matter of confusing the > (IOps limiting) configuration with the fact that one may have many > disks.My answer was not meant to be a generic one, but based on the original question, which was about a raidz2 VDEV, but then, thanks for the info Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
On Mar 20, 2011, at 12:48 PM, David Magda wrote:> On Mar 20, 2011, at 14:24, Roy Sigurd Karlsbakk wrote: > >>>> It all depends on the number of drives in the VDEV(s), traffic >>>> patterns during resilver, speed VDEV fill, of drives etc. Still, >>>> close to 6 days is a lot. Can you detail your configuration? >>> >>> How many times do we have to rehash this? The speed of resilver is >>> dependent on the amount of data, the distribution of data on the >>> resilvering device, speed of the resilvering device, and the throttle. It is NOT >>> dependent on the number of drives in the vdev. >> >> Thanks for clearing this up - I''ve been told large VDEVs lead to long resilver times, but then, I guess that was wrong. > > There was a thread ("Suggested RaidZ configuration...") a little while back where the topic of IOps and resilver time came up: > > http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/thread.html#44633 > > I think this message by Erik Trimble is a good summary:hmmm... I must''ve missed that one, otherwise I would have said...> >> Scenario 1: I have 5 1TB disks in a raidz1, and I assume I have 128k slab sizes. Thus, I have 32k of data for each slab written to each disk. (4x32k data + 32k parity for a 128k slab size). So, each IOPS gets to reconstruct 32k of data on the failed drive. It thus takes about 1TB/32k = 31e6 IOPS to reconstruct the full 1TB drive.Here, the IOPS doesn''t matter because the limit will be the media write speed of the resilvering disk -- bandwidth.>> >> Scenario 2: I have 10 1TB drives in a raidz1, with the same 128k slab sizes. In this case, there''s only about 14k of data on each drive for a slab. This means, each IOPS to the failed drive only write 14k. So, it takes 1TB/14k = 71e6 IOPS to complete.Here, IOPS might matter, but I doubt it. Where we see IOPS matter is when the block sizes are small (eg. metadata). In some cases you can see widely varying resilver times when the data is large versus small. These changes follow the temporal distribution of the original data. For example, if a pool''s life begins with someone loading their MP3 collection (large blocks, mostly sequential) and then working on source code (small blocks, more random, lots of creates/unlinks) then the resilver will be bandwidth bound as it resilvers the MP3s and then IOPS bound as it resilvers the source. Hence, the prediction of when resilver will finish is not very accurate.>> >> From this, it can be pretty easy to see that the number of required IOPS to the resilvered disk goes up linearly with the number of data drives in a vdev. Since you''re always going to be IOPS bound by the single disk resilvering, you have a fixed limit.You will not always be IOPS bound by the resilvering disk. You will be speed bound by the resilvering disk, where speed is either write bandwidth or random write IOPS. -- richard
On 03/20/11 08:57 PM, Ian Collins wrote:> Has anyone seen a resilver longer than this for a 500G drive in a > riadz2 vdev? > > scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 > 19:57:37 2011 > c0t0d0 ONLINE 0 0 0 769G resilvered >I didn''t intend to start an argument, I was just very surprised the resilver took so long. This box is backup staging server (Solaris 10u8), so it does receive a lot of data. However it has lost a number of drives in the past and the resilver took around 100 hours hence my surprise. The drive is part of an 8 drive raidz2 vdev, not overly full: raidz2 3.40T 227G -- Ian.
On Mar 20, 2011, at 18:02, Ian Collins wrote:> I didn''t intend to start an argument, I was just very surprised the resilver took so long.ZFS is a relatively young file system, and it does a lot of things differently than what has been done in the past. Personally I think arguments / debates / discussions like this thread assist people in understanding how things work and help bring out any misconceptions that they may have, that can then be corrected.
On Mar 20, 2011, at 3:02 PM, Ian Collins wrote:> On 03/20/11 08:57 PM, Ian Collins wrote: >> Has anyone seen a resilver longer than this for a 500G drive in a riadz2 vdev? >> >> scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37 2011 >> c0t0d0 ONLINE 0 0 0 769G resilvered >> > I didn''t intend to start an argument, I was just very surprised the resilver took so long.I''d describe the thread as critical analysis, not argument. There are many facets of ZFS resilver and scrub that many people have never experienced, so it makes sense to explore the issue. Expect ZFS resilvers to take longer in the future for HDDs. Expect ZFS resilvers to remain quite fast for SSDs. Why? Because HDDs are getting bigger, but not faster, while SSDs are getting bigger and faster. I''ve done a number of studies of this and have a lot of data to describe what happens. I also work through performance analysis of resilver cases for my ZFS tutorials.> This box is backup staging server (Solaris 10u8), so it does receive a lot of data. However it has lost a number of drives in the past and the resilver took around 100 hours hence my surprise.We''ve thought about how to provide some sort of feedback on the progress of resilvers. It is relatively simple to know what has already been resilvered and how much throttling is currently active. But that info does not make future predictions more accurate. -- richard
On 03/21/11 12:20 PM, Richard Elling wrote:> On Mar 20, 2011, at 3:02 PM, Ian Collins wrote: > >> On 03/20/11 08:57 PM, Ian Collins wrote: >>> Has anyone seen a resilver longer than this for a 500G drive in a riadz2 vdev? >>> >>> scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37 2011 >>> c0t0d0 ONLINE 0 0 0 769G resilvered >>> >> I didn''t intend to start an argument, I was just very surprised the resilver took so long. > I''d describe the thread as critical analysis, not argument. There are many facets of ZFS > resilver and scrub that many people have never experienced, so it makes sense to > explore the issue. > > Expect ZFS resilvers to take longer in the future for HDDs. > Expect ZFS resilvers to remain quite fast for SSDs. > Why? Because HDDs are getting bigger, but not faster, while SSDs are getting bigger and faster. > > I''ve done a number of studies of this and have a lot of data to describe what happens. I also > work through performance analysis of resilver cases for my ZFS tutorials. >Does the throttling improve receive latency? The 30+ second latency I see on this system during a resilver renders it pretty useless as a staging server (lots of small snapshots). -- Ian.
On 3/20/2011 2:23 PM, Richard Elling wrote:> On Mar 20, 2011, at 12:48 PM, David Magda wrote: >> On Mar 20, 2011, at 14:24, Roy Sigurd Karlsbakk wrote: >> >>>>> It all depends on the number of drives in the VDEV(s), traffic >>>>> patterns during resilver, speed VDEV fill, of drives etc. Still, >>>>> close to 6 days is a lot. Can you detail your configuration? >>>> How many times do we have to rehash this? The speed of resilver is >>>> dependent on the amount of data, the distribution of data on the >>>> resilvering device, speed of the resilvering device, and the throttle. It is NOT >>>> dependent on the number of drives in the vdev. >>> Thanks for clearing this up - I''ve been told large VDEVs lead to long resilver times, but then, I guess that was wrong. >> There was a thread ("Suggested RaidZ configuration...") a little while back where the topic of IOps and resilver time came up: >> >> http://mail.opensolaris.org/pipermail/zfs-discuss/2010-September/thread.html#44633 >> >> I think this message by Erik Trimble is a good summary: > hmmm... I must''ve missed that one, otherwise I would have said... > >>> Scenario 1: I have 5 1TB disks in a raidz1, and I assume I have 128k slab sizes. Thus, I have 32k of data for each slab written to each disk. (4x32k data + 32k parity for a 128k slab size). So, each IOPS gets to reconstruct 32k of data on the failed drive. It thus takes about 1TB/32k = 31e6 IOPS to reconstruct the full 1TB drive. > Here, the IOPS doesn''t matter because the limit will be the media write > speed of the resilvering disk -- bandwidth. > >>> Scenario 2: I have 10 1TB drives in a raidz1, with the same 128k slab sizes. In this case, there''s only about 14k of data on each drive for a slab. This means, each IOPS to the failed drive only write 14k. So, it takes 1TB/14k = 71e6 IOPS to complete. > Here, IOPS might matter, but I doubt it. Where we see IOPS matter is when the block > sizes are small (eg. metadata). In some cases you can see widely varying resilver times when > the data is large versus small. These changes follow the temporal distribution of the original > data. For example, if a pool''s life begins with someone loading their MP3 collection (large > blocks, mostly sequential) and then working on source code (small blocks, more random, lots > of creates/unlinks) then the resilver will be bandwidth bound as it resilvers the MP3s and then > IOPS bound as it resilvers the source. Hence, the prediction of when resilver will finish is not > very accurate. > >>> From this, it can be pretty easy to see that the number of required IOPS to the resilvered disk goes up linearly with the number of data drives in a vdev. Since you''re always going to be IOPS bound by the single disk resilvering, you have a fixed limit. > You will not always be IOPS bound by the resilvering disk. You will be speed bound > by the resilvering disk, where speed is either write bandwidth or random write IOPS. > -- richard >Really? Can you really be bandwidth limited on a (typical) RAIDZ resilver? I can see where you might be on a mirror, with large slabs and essentially sequential read/write - that is, since the drivers can queue up several read/write requests at a time, you have the potential to be reading/writing several (let''s say 4) 128k slabs per single IOPS. That means you read/write at 512k per IOPS for a mirror (best case scenario). For a 7200RPM drive, that''s 100 IOPS x .5MB/IOPS = 50MB/s, which is lower than the maximum throughput of a modern SATA drive. For one of the 15k SAS drives able to do 300IOPS, you get 150MB/s, which indeed exceeds a SAS drive''s write bandwidth. For RAIDZn configs, however, you''re going to be limited on the size of an individual read/write. As Roy pointed out before, that max size of an individual portion of a slab is 128k/X, where X=number of data drives in RAIDZn. So, for a typical 4-data-drive RAIDZn, even in the best case scenario where I can queue multiple slab requests (say 4) into a single IOPS, that means I''m likely to top out at about 128k of data to write to the resilvered drive per IOPS. Which, leads to 12MB/s for the 7200RPM drive, and 36MB/s for the 15k drive, both well under their respective bandwidth capability. Even with large slab sizes, I really can''t see any place where a RAIDZ resilver isn''t going to be IOPS bound when using HDs as backing store. Mirrors are more likely, but still, even in that case, I think you''re going to hit the IOPS barrier far more often than the bandwidth barrier. Now, with SSDs as backing store, yes, you become bandwidth limited, because the IOPS values of SSDs are at least an order of magnitude greater than HDs, though both have the same max bandwidth characteristics. Now, the *total* time it takes to resilver either a mirror or RAIDZ is indeed primarily dependent on the number of allocated slabs in the vdev, and the level of fragmentation of slabs. That essentially defines the total amount of work that needs to be done. The above discussion compares resilver times based on IDENTICAL data - that is, I''m comparing how a RAIDZ and mirror resilver a given data pattern. So, if you want to come up with how much time it will take a resilver to complete, you have to worry about four things: (1) How many total slabs do I have to resilver? (total data size is irrelevant, it''s the number of slabs required to store that amount of data) (2) How fragmented are my files? (sequentially written, never re-written, few deletes will be much faster then heavy modified and deleted pools - essentially, how much seeking is my drive going to have to do?) (3) Do I have a Mirror or RAIDZ config (and, how many data drives in the RAIDZ) (4) What are the IOPS/bandwidth characteristics of the backing store I use in #3 -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Richard Elling > > How many times do we have to rehash this? The speed of resilver is > dependent on the amount of data, the distribution of data on theresilvering> device, speed of the resilvering device, and the throttle. It is NOTdependent> on the number of drives in the vdev.What the heck? Yes it is. Indirectly. When you say it depends on the amount of data, speed of resilvering device, etc, what you really mean (correctly) is that it depends on the total number of used blocks that must be resilvered on the resilvering device, multiplied by the access time for the resilvering device. And of course, throttling and usage during resilver can have a big impact. And various other factors. But the controllable big factor is the number of blocks used in the degraded vdev. So here is how the number of devices in the vdev matter: If you have your whole pool made of one vdev, then every block in the pool will be on the resilvering device. You must spend time resilvering every single block in the whole pool. If you have the same amount of data, on a pool broken into N smaller vdev''s, then approximately speaking, 1/N of the blocks in the pool must be resilvered on the resilvering vdev. And therefore the resilver goes approximately N times faster. So if you assume the size of the pool or the number of total disks is a given, determined by outside constraints and design requirements, and then you faced the decision of how to architect the vdev''s in your pool, then Yes. The number of devices in a vdev do dramatically impact the resilver time. Only because the number of blocks written in each vdev depend on these decisions you made earlier.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Edward Ned Harvey > > it depends on the total number of used blocks that must > be resilvered on the resilvering device, multiplied by the access time for > the resilvering device.It is a safe assumption, if you''ve got a lot of devices in a vdev, that you''ve probably got a lot of data in the vdev. And therefore the resilver time for that vdev will be large. If you break your pool up into a bunch of mirrors, then the most data you''ll have in any one vdev is 1-disk worth of data. If you have a vdev whose usable capacity is M times a single disk, chances are, the amount of data you have in the vdev is L times larger than the amount of data you would have had in each vdev if you were using mirrors. (I''m intentionally leaving the relationship between M and L vague, but both are assumed to be > 1 and approaching the number of devices in the vdev minus parity drives). Therefore the resilver time for that vdev will be roughly L times the resilver time of a mirror.
On Sun, Mar 20, 2011 at 7:20 PM, Richard Elling <richard.elling at gmail.com> wrote:> On Mar 20, 2011, at 3:02 PM, Ian Collins wrote: > >> On 03/20/11 08:57 PM, Ian Collins wrote: >>> Has anyone seen a resilver longer than this for a 500G drive in a riadz2 vdev? >>> >>> scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37 2011 >>> ? ? ? ? ? ? ?c0t0d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 ?769G resilvered >>> >> I didn''t intend to start an argument, I was just very surprised the resilver took so long. > > I''d describe the thread as critical analysis, not argument. There are many facets of ZFS > resilver and scrub that many people have never experienced, so it makes sense to > explore the issue. > > Expect ZFS resilvers to take longer in the future for HDDs. > Expect ZFS resilvers to remain quite fast for SSDs. > Why? Because HDDs are getting bigger, but not faster, while SSDs are getting bigger and faster. >Is resilver time related to the amount of data (TBs) or the number of objects (file + directory counts) ? I have seen zpools with lots of data in very few files resilver quickly while smaller pools with lots of tiny files take much longer (no hard data here, just recollection of how long things took). -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On Sun, Mar 20, 2011 at 12:57 AM, Ian Collins <ian at ianshome.com> wrote:> ?Has anyone seen a resilver longer than this for a 500G drive in a riadz2 > vdev? > > scrub: resilver completed after 169h25m with 0 errors on Sun Mar 20 19:57:37 > 2011 > ? ? ? ? ? ? ?c0t0d0 ?ONLINE ? ? ? 0 ? ? 0 ? ? 0 ?769G resilvered > > and I told the client it would take 3 to 4 days!Our main backups storage server has 3x 8-drive raidz2 vdevs. Was replacing the 500 GB drives in one vdev with 1 TB drives. The last 2 drives took just under 300 hours each. :( The first couple drives took approx 150 hours each, and then it just started taking longer and longer for each drive. -- Freddie Cash fjwcash at gmail.com
> The 30+ second latency I see on this system during a resilver renders > it pretty useless as a staging server (lots of small snapshots).I''ve seen similar numbers on a system during resilver, without L2ARC/SLOG. Adding L2ARC/SLOG made the system work quite well during resilver/scrub, but without them, it wasn''t very useful. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
> Our main backups storage server has 3x 8-drive raidz2 vdevs. Was > replacing the 500 GB drives in one vdev with 1 TB drives. The last 2 > drives took just under 300 hours each. :( The first couple drives > took approx 150 hours each, and then it just started taking longer and > longer for each drive.That''s strange indeed. I just replaced 21 drives (seven 2TB drives in three raidz2 VDEVs) drives with 3TB ones, and resilver times were quite stable, until the last replace, which was a bit faster. Have you checked ''iostat -en''? If one (or more) of the drives are having i/o errors, that may slow down the whole pool. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Paul Kraus > > Is resilver time related to the amount of data (TBs) or the number > of objects (file + directory counts) ? I have seen zpools with lots of > data in very few files resilver quickly while smaller pools with lots > of tiny files take much longer (no hard data here, just recollection > of how long things took).In some cases, it could be dependent on the total amount of data (TB) and be limited by sequential drive throughput. In that case, it will always be fast. In other cases, it could be dependent on a lot of small blocks scattered randomly about. In that case, it will be limited by random access time of the devices, and it''s certain to be painfully slow. But in this conversation, we''re trying to make a generalization. So let''s define "typical," and discuss how each of the above cases is possible, and reach a generalization: Note: There is another common usage scenario. The home video server, or large static sequential file store. Which would have precisely the opposite usage characteristics. But for me, that''s not typical, so when I''m the person writing, here is what I''m defining as "typical..." Typical: You have a nontrivial pool, with volatile data. Autosnapshots are on, which means snapshots are frequently created & destroyed. Some files & directories are deleted, created, and/or modified or appended to, in essentially random order. It is in the nature of COW (and therefore ZFS) to only write new copies of the changed blocks, while leaving old blocks in place, hence files become progressively more fragmented, as long as they are modified in the middles and ends (rather than deleted & recreated entirely). It is in the nature of ZFS small write aggregation into larger sequential blocks ... A bunch of small random writes are aggregated into a single larger sequential write ... And eventually those changes are changed or deleted, and snapshots destroyed, leaving a "hole" in the middle of what was formerly an aggregated sequential write... It''s in the nature of ZFS to become progressively more fragmented in these too. All of the above is normal for any snapshot-capable filesystem. (Different implementations reach the same result.) Here is the part which is both a ZFS strength and weakness: Upon scrub or resilver, ZFS will only scrub or resilver the used blocks. It will not do the unused space. If you have a really small percentage of pool utilization, or highly sequential data, this is a strength. Because you get to skip over all the unused portions of disk, it will complete faster than resilvering or scrubbing the whole disk sequentially. Unfortunately, in my "typical" usage scenario, a system has been in volatile production for an extended time, so there is significant usage in the pool, which is highly fragmented. Unfortunately, in ZFS resilver (and I think scrub too) the order of resilvering blocks is NOT based on disk order, which means you don''t get to simply perform a bunch of sequential disk reads and skip over all the unused sectors. Instead, your heads need to thrash around, randomly seeking small blocks all over the place, in essentially random order. So the answer to your question, assuming my "typical" usage and assuming hard drives (not SSD''s etc) is: Resilver is dependent on neither the total quantity of data, nor the total number of files/directories. It is dependent on the number of used blocks in the vdev, and dependent on precisely how fragmented and how randomly those blocks are scattered throughout the vdev, and limited by the random access time of the vdev. YMMV, but here is one of my experiences: In a given pool that I admin, if I needed to resilver a whole disk including unused space, the sequential IO of the disk would be the limiting factor, and the time would be approx 2 hours. Instead, I am using ZFS, and this sytem is in "typical" production usage, and I am using mirrors. Hence, this is the best case scenario for a "typical" ZFS server with volatile data. My resilver took 12 hours. If I had used raidz2 with 8-2=6 disks, then it would have taken 3 days. So the conclusion to draw is: Yes, there are situations where ZFS resilver is a strength, and limited by serial throughput. But for what I call "typical" usage patterns, it''s a weakness, and it''s dramatically much worse than resilvering the whole disk sequentially.
On 03/22/11 10:39 AM, Edward Ned Harvey wrote:> So the conclusion to draw is: > Yes, there are situations where ZFS resilver is a strength, and limited by > serial throughput. But for what I call "typical" usage patterns, it''s a > weakness, and it''s dramatically much worse than resilvering the whole disk > sequentially. >That probably correct. It certainly helps explain my recent experience. The total data in the pool has remained fairly constant over the past 6 months, but as the pool is on a staging server, it aggregates all of the churn form the servers that send data to it. So given the hardware, use and the total data hasn''t changed since the last resilver, the significant increase in resilver time must be down the increased data fragmentation. -- Ian.
On Mar 21, 2011, at 5:09 AM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Richard Elling >> >> How many times do we have to rehash this? The speed of resilver is >> dependent on the amount of data, the distribution of data on the > resilvering >> device, speed of the resilvering device, and the throttle. It is NOT > dependent >> on the number of drives in the vdev. > > What the heck? Yes it is. Indirectly. When you say it depends on the > amount of data, speed of resilvering device, etc, what you really mean > (correctly) is that it depends on the total number of used blocks that must > be resilvered on the resilvering device, multiplied by the access time for > the resilvering device. And of course, throttling and usage during resilver > can have a big impact. And various other factors. But the controllable big > factor is the number of blocks used in the degraded vdev.There is no direct correlation between the number of blocks and resilver time.> So here is how the number of devices in the vdev matter: > > If you have your whole pool made of one vdev, then every block in the pool > will be on the resilvering device. You must spend time resilvering every > single block in the whole pool. > > If you have the same amount of data, on a pool broken into N smaller vdev''s, > then approximately speaking, 1/N of the blocks in the pool must be > resilvered on the resilvering vdev. And therefore the resilver goes > approximately N times faster.Nope. The resilver time is dependent on the speed of the resilvering disk.> So if you assume the size of the pool or the number of total disks is a > given, determined by outside constraints and design requirements, and then > you faced the decision of how to architect the vdev''s in your pool, then > Yes. The number of devices in a vdev do dramatically impact the resilver > time. Only because the number of blocks written in each vdev depend on > these decisions you made earlier.I do not think it is wise to set the vdev configuration based on a model for resilver time. Choose the configuration to get the best data protection. -- richard
On Mar 21, 2011, at 5:32 AM, Edward Ned Harvey wrote:>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >> bounces at opensolaris.org] On Behalf Of Edward Ned Harvey >> >> it depends on the total number of used blocks that must >> be resilvered on the resilvering device, multiplied by the access time for >> the resilvering device. > > It is a safe assumption, if you''ve got a lot of devices in a vdev, that > you''ve probably got a lot of data in the vdev. And therefore the resilver > time for that vdev will be large.Several studies have shown no correlation between the size of disks and the amount of data used. Or, to look at it another way, boot disks grow faster than OSes.> If you break your pool up into a bunch of mirrors, then the most data you''ll > have in any one vdev is 1-disk worth of data.Fancy that, if you use raidz, the most data you will have to resilver is 1-disk worth of data. In the raidz case, the utilization of the resilvering disk is 100% and the utilization of the other disks is approximately (100% / N)> If you have a vdev whose usable capacity is M times a single disk, chances > are, the amount of data you have in the vdev is L times larger than the > amount of data you would have had in each vdev if you were using mirrors. > (I''m intentionally leaving the relationship between M and L vague, but both > are assumed to be > 1 and approaching the number of devices in the vdev > minus parity drives). Therefore the resilver time for that vdev will be > roughly L times the resilver time of a mirror. >For ZFS, usable capacity has no correlation to resilver time. -- richard
On 3/21/2011 3:25 PM, Richard Elling wrote:> On Mar 21, 2011, at 5:09 AM, Edward Ned Harvey wrote: > >>> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- >>> bounces at opensolaris.org] On Behalf Of Richard Elling >>> >>> How many times do we have to rehash this? The speed of resilver is >>> dependent on the amount of data, the distribution of data on the >> resilvering >>> device, speed of the resilvering device, and the throttle. It is NOT >> dependent >>> on the number of drives in the vdev. >> What the heck? Yes it is. Indirectly. When you say it depends on the >> amount of data, speed of resilvering device, etc, what you really mean >> (correctly) is that it depends on the total number of used blocks that must >> be resilvered on the resilvering device, multiplied by the access time for >> the resilvering device. And of course, throttling and usage during resilver >> can have a big impact. And various other factors. But the controllable big >> factor is the number of blocks used in the degraded vdev. > There is no direct correlation between the number of blocks and resilver time. >Just to be clear here, remember block != slab. Slab is the allocation unit often seen through the "recordsize" attribute. The number of data *slabs* directly correlates to resilver time.>> So here is how the number of devices in the vdev matter: >> >> If you have your whole pool made of one vdev, then every block in the pool >> will be on the resilvering device. You must spend time resilvering every >> single block in the whole pool. >> >> If you have the same amount of data, on a pool broken into N smaller vdev''s, >> then approximately speaking, 1/N of the blocks in the pool must be >> resilvered on the resilvering vdev. And therefore the resilver goes >> approximately N times faster. > Nope. The resilver time is dependent on the speed of the resilvering disk.Well, unless my previous posts are completely wrong, I can''t see how resilver time is primarily bounded by speed (i.e bandwidth/throughput) of the HD for the vast majority of use cases. The IOPS and raw speed of the underlying backing store help define how fast the workload (i.e. total used slabs) gets processed. The layout of the vdev, and the on-disk data distribution, will define the total IOPS required to resilver the slab workload. Most data distribution/vdev layout combinations will result in an IOPS-bound resilver disk, not a bandwidth-saturated resilver disk.>> So if you assume the size of the pool or the number of total disks is a >> given, determined by outside constraints and design requirements, and then >> you faced the decision of how to architect the vdev''s in your pool, then >> Yes. The number of devices in a vdev do dramatically impact the resilver >> time. Only because the number of blocks written in each vdev depend on >> these decisions you made earlier. > I do not think it is wise to set the vdev configuration based on a model for > resilver time. Choose the configuration to get the best data protection. > -- richardDepends on the needs of the end-user. I can certainly see places where it would be better to build a pool out of RAIDZ2 devices rather than RAIDZ3 devices. And, of course, the converse. Resilver times should be a consideration in building your pool, just like performance and disk costs are. How much you value it, of course, it up to the end-user. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA
> From: Richard Elling [mailto:richard.elling at gmail.com] > > There is no direct correlation between the number of blocks and resilver > time.Incorrect. Although there are possibly some cases where you could be bandwidth limited, it''s certainly not true in general. If Richard were correct, then a resilver would never take longer than resilvering an entire disk (including unused space) sequentially. The time to resilver an entire disk sequentially is easily calculated, if you know the sustained sequential speed of the disk and size of the disk. In my case, I have a 1TB mirror, where each disk can sustain 1Gbit/sec. Which means according to Richard, my max resilver time would be 133min. In reality, my system resilvered in 12 hours while otherwise idle. This can only be explained one way: As Erik says, the order in which my disks resilvered is not disk ordered. My disks resilver time was random access time limited. Not bandwidth limited.
[richard tries pushing the rope one more time] On Mar 21, 2011, at 8:40 PM, Edward Ned Harvey wrote:>> From: Richard Elling [mailto:richard.elling at gmail.com] >> >> There is no direct correlation between the number of blocks and resilver >> time. > > Incorrect. > > Although there are possibly some cases where you could be bandwidth limited, > it''s certainly not true in general. > > If Richard were correct, then a resilver would never take longer than > resilvering an entire disk (including unused space) sequentially.I can prove this to be true for a device that does not suffer from a seek penalty.> The time > to resilver an entire disk sequentially is easily calculated, if you know > the sustained sequential speed of the disk and size of the disk. In my > case, I have a 1TB mirror, where each disk can sustain 1Gbit/sec. Which > means according to Richard, my max resilver time would be 133min. In > reality, my system resilvered in 12 hours while otherwise idle.Bummer, your disk must have some sort of seek penalty... perhaps 8.2 ms?> This can > only be explained one way: As Erik says, the order in which my disks > resilvered is not disk ordered. My disks resilver time was random access > time limited. Not bandwidth limited.I have data that proves the resilver time depends on the data layout and that layout changes as your usage of the pool changes. Like most things in ZFS, it is dynamic. The data proves the resilver time is not correlated to the number of disks in a vdev. The data shows that the resilver time is dependent on the speed of the resilvering disk. I am glad that your experience confirms this. But why does it need to be rehashed every few months on the alias? -- richard -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20110321/47df3e03/attachment.html>
On Mon, Mar 21, 2011 at 3:45 PM, Roy Sigurd Karlsbakk <roy at karlsbakk.net> wrote:> > > Our main backups storage server has 3x 8-drive raidz2 vdevs. Was > > replacing the 500 GB drives in one vdev with 1 TB drives. The last 2 > > drives took just under 300 hours each. :( The first couple drives > > took approx 150 hours each, and then it just started taking longer and > > longer for each drive. > > That''s strange indeed. I just replaced 21 drives (seven 2TB drives in three raidz2 VDEVs) drives with 3TB ones, and resilver times were quite stable, until the last replace, which was a bit faster. Have you checked ''iostat -en''? If one (or more) of the drives are having i/o errors, that may slow down the whole pool.We''ve production servers with 9 vdev''s (mirrored) doing `zfs send` daily to backup servers with with 7 vdev''s (each 3-disk raidz1). Some backup servers that receive datasets with lots of small files (email/web) keep getting worse resilver times. # zpool status ? pool: backup ?state: DEGRADED status: One or more devices has been removed by the administrator. ??????? Sufficient replicas exist for the pool to continue functioning in a ??????? degraded state. action: Online the device using ''zpool online'' or replace the device with ??????? ''zpool replace''. ?scrub: resilver in progress for 646h13m, 100.00% done, 0h0m to go config: ??????? NAME????????? STATE???? READ WRITE CKSUM ??????? backup??????? DEGRADED???? 0???? 0???? 0 ????????? raidz1-0??? ONLINE?????? 0???? 0???? 0 ??????????? c4t2d0??? ONLINE?????? 0???? 0???? 0 ??????????? c4t3d0??? ONLINE?????? 0???? 0???? 0 ??????????? c4t4d0??? ONLINE?????? 0???? 0???? 0 ????????? raidz1-1??? ONLINE?????? 0???? 0???? 0 ??????????? c4t5d0??? ONLINE?????? 0???? 0???? 0 ??????????? c4t6d0??? ONLINE?????? 0???? 0???? 0 ??????????? c4t7d0??? ONLINE?????? 0???? 0???? 0 ????????? raidz1-2??? DEGRADED???? 0???? 0???? 0 ??????????? c4t8d0??? ONLINE?????? 0???? 0???? 0 ??????????? spare-1?? DEGRADED???? 0???? 0? 216M ????????????? c4t9d0? REMOVED????? 0???? 0???? 0 ????????????? c4t1d0? ONLINE?????? 0???? 0???? 0? 874G resilvered ??????????? c4t10d0?? ONLINE?????? 0???? 0???? 0 ????????? raidz1-3??? ONLINE?????? 0???? 0???? 0 ??????????? c4t11d0?? ONLINE?????? 0???? 0???? 0 ??????????? c4t12d0?? ONLINE?????? 0???? 0???? 0 ??????????? c4t13d0?? ONLINE?????? 0???? 0???? 0 ????????? raidz1-4??? ONLINE?????? 0???? 0???? 0 ??????????? c4t14d0?? ONLINE?????? 0???? 0???? 0 ??????????? c4t15d0?? ONLINE?????? 0???? 0???? 0 ??????????? c4t16d0?? ONLINE?????? 0???? 0???? 0 ????????? raidz1-5??? ONLINE?????? 0???? 0???? 0 ??????????? c4t17d0?? ONLINE?????? 0???? 0???? 0 ??????????? c4t18d0?? ONLINE?????? 0???? 0???? 0 ??????????? c4t19d0?? ONLINE?????? 0???? 0???? 0 ????????? raidz1-6??? ONLINE?????? 0???? 0???? 0 ??????????? c4t20d0?? ONLINE?????? 0???? 0???? 0 ??????????? c4t21d0?? ONLINE?????? 0???? 0???? 0 ??????????? c4t22d0?? ONLINE?????? 0???? 0???? 0 ??????? spares ????????? c4t1d0????? INUSE???? currently in use # zpool list backup NAME???? SIZE?? USED? AVAIL??? CAP? HEALTH? ALTROOT backup? 19.0T? 18.7T?? 315G??? 98%? DEGRADED? - Even though the pool is at 98% utilization, it''s usually not a problem if the production server is sending datasets which hold VM machines. Here we seem to be clearly maxing out on IOPS of the disks in the raidz1-2 vdev. It seems logical to go back to mirrors for this kind of workload (lots of small files, nothing sequential). What I cannot explain is why c4t1d0 is doings lots of reads, besides the expected reads. It seems to be holding back the resilver while I would expect only c4t9d0 and c4t10d0 should be reading. I do not understand the ZFS internals that are making this happen. Can anyone explain that? The server is doing nothing but the resilver (not even receiving new zfs send''s). By the way, since this is OpenSolaris 2009.6, there is a nasty bug that if I enable fmd, it''ll record billions of checksums errors until the disk is full (so I''ve had to disable it while resilvering is happening). # iostat -Xn 1 | egrep ''(c4t(8|10|1)d0|r/s)'' ??? r/s??? w/s?? kr/s?? kw/s wait actv wsvc_t asvc_t? %w? %b device ?? 35.2?? 14.9? 907.9? 135.8? 0.0? 0.4??? 0.1??? 8.6?? 1? 12 c4t1d0 ?? 44.7??? 4.0? 997.6?? 78.3? 0.0? 0.3??? 0.1??? 5.8?? 1? 10 c4t8d0 ?? 44.8??? 4.0? 997.6?? 78.3? 0.0? 0.3??? 0.1??? 5.8?? 1? 10 c4t10d0 ??? r/s??? w/s?? kr/s?? kw/s wait actv wsvc_t asvc_t? %w? %b device ?? 98.6?? 46.9 2628.2?? 52.7? 0.0? 1.3??? 0.2??? 8.6?? 2? 39 c4t1d0 ? 146.5??? 0.0 2739.2??? 0.0? 0.0? 0.8??? 0.1??? 5.1?? 2? 25 c4t8d0 ? 144.5??? 0.0 2805.9??? 0.0? 0.0? 0.7??? 0.1??? 5.1?? 2? 26 c4t10d0 ??? r/s??? w/s?? kr/s?? kw/s wait actv wsvc_t asvc_t? %w? %b device ? 108.6?? 45.7 2809.1?? 50.7? 0.0? 1.1??? 0.1??? 6.9?? 2? 35 c4t1d0 ? 146.2??? 0.0 2624.2??? 0.0? 0.0? 0.3??? 0.1??? 2.3?? 1? 18 c4t8d0 ? 149.2??? 0.0 2737.0??? 0.0? 0.0? 0.3??? 0.1??? 2.3?? 1? 16 c4t10d0 ??? r/s??? w/s?? kr/s?? kw/s wait actv wsvc_t asvc_t? %w? %b device ? 113.0?? 23.0 3226.9?? 28.0? 0.0? 1.2??? 0.1??? 8.9?? 2? 40 c4t1d0 ? 159.0??? 0.0 3286.9??? 0.0? 0.0? 0.6??? 0.1??? 3.9?? 2? 24 c4t8d0 ? 176.0??? 0.0 3545.9??? 0.0? 0.0? 0.5??? 0.1??? 3.0?? 2? 26 c4t10d0 ??? r/s??? w/s?? kr/s?? kw/s wait actv wsvc_t asvc_t? %w? %b device ? 147.4?? 34.4 3888.9?? 52.1? 0.0? 1.5??? 0.2??? 8.3?? 3? 43 c4t1d0 ? 181.7??? 0.0 3515.1??? 0.0? 0.0? 0.6??? 0.1??? 3.1?? 2? 24 c4t8d0 ? 193.5??? 0.0 3489.9??? 0.0? 0.0? 0.6??? 0.2??? 3.3?? 4? 22 c4t10d0 ??? r/s??? w/s?? kr/s?? kw/s wait actv wsvc_t asvc_t? %w? %b device ? 151.2?? 33.9 3792.7?? 42.7? 0.0? 1.5??? 0.1??? 7.9?? 1? 36 c4t1d0 ? 197.5??? 0.0 3856.9??? 0.0? 0.0? 0.4??? 0.1??? 2.3?? 2? 19 c4t8d0 ? 164.6??? 0.0 3928.1??? 0.0? 0.0? 0.7??? 0.1??? 4.2?? 1? 24 c4t10d0 ??? r/s??? w/s?? kr/s?? kw/s wait actv wsvc_t asvc_t? %w? %b device ? 171.0?? 90.0 4426.3? 121.5? 0.0? 1.3??? 0.1??? 4.9?? 3? 51 c4t1d0 ? 184.0??? 0.0 4426.8??? 0.0? 0.0? 0.7??? 0.1??? 4.0?? 2? 30 c4t8d0 ? 195.0??? 0.0 4430.3??? 0.0? 0.0? 0.7??? 0.1??? 3.7?? 2? 32 c4t10d0 ^C Anyone else with over 600 hours of resilver time? :-) Thank you, Giovanni Tirloni (gtirloni at sysdroid.com)
> From: zfs-discuss-bounces at opensolaris.org [mailto:zfs-discuss- > bounces at opensolaris.org] On Behalf Of Giovanni Tirloni > > We''ve production servers with 9 vdev''s (mirrored) doing `zfs send` > daily to backup servers with with 7 vdev''s (each 3-disk raidz1). Some > backup servers that receive datasets with lots of small files > (email/web) keep getting worse resilver times.It''s not because of the filesize, but the block size. Er, slab size. When you have a backup server, which does nothing but zfs receive, that''s probably your best case scenario. Because the data is as nonvolatile as possible. But indeed, because all the sends are incremental, fragmentation will accumulate. If you want to eliminate it once, but not once and for all, then you''ll have to occasionally do a full receive instead of incremental. If you''re more than 50% utilized on the receiving pool, you''ll have to zfs destroy or zpool destroy everything on the receiving end prior to doing the full receive. (zpool destroy is instant, while zfs destroy will take a long time). So you pay something in terms of increased temporary risk. Or add enough disks to do a full receive without destroying first - in which case you pay something in terms of additional hardware. The advantage of such a thing is - During the moment when you destroy everything, the resilver will be instantaneously completed. ;-)> # zpool list backup > NAME???? SIZE?? USED? AVAIL??? CAP? HEALTH? ALTROOT > backup? 19.0T? 18.7T?? 315G??? 98%? DEGRADED? -Woah, boy. That sure won''t help matters. If you are 98% full, that means there''s very little empty space. And therefore, everything you receive is going to be forcibly fragmented badly, as the receiving system searches and scavenges for empty blocks to store the received information. I am guessing you didn''t build the pool initially 98% full. I am guessing you slowly built up to this condition over time. Which is worse.> What I cannot explain is why c4t1d0 is doings lots of reads, besides > the expected reads.I don''t know the answer to that. Anybody? Perhaps the resilvering device is automatically verifying data written to it, like a scrub, while resilver is in progress?
On Thu, Mar 24, 2011 at 7:07 AM, Edward Ned Harvey <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote:> When you have a backup server, which does nothing but zfs receive, that''s > probably your best case scenario. ?Because the data is as nonvolatile as > possible. ?But indeed, because all the sends are incremental, fragmentation > will accumulate. ?If you want to eliminate it once, but not once and for > all, then you''ll have to occasionally do a full receive instead of > incremental. ?If you''re more than 50% utilized on the receiving pool, you''ll > have to zfs destroy or zpool destroy everything on the receiving end prior > to doing the full receive. ?(zpool destroy is instant, while zfs destroy > will take a long time). ?So you pay something in terms of increased > temporary risk. ?Or add enough disks to do a full receive without destroying > first - in which case you pay something in terms of additional hardware.Unfortunately, capacity is not the limiting factor in some cases. In my case we do not have the bandwidth to do a FULL send/recv, it would take weeks. Part of the reason we adopted using zfs send/recv to accomplish our offsite copies is that once the initial FULL is done (and we start that when the zpools are almost empty), we can keep up with incrementals. The data is mostly static and not growing that fast. The combination of zfs snapshots and the offsite copy provide all the backup capability we need. The snapshots provide day to day operation backups for the occasional "we corrupted that file and need to get it back from last Wednesday" and the DR protection against complete loss of the data. We can point the users at the backup copy across the WAN, and while not as fast, they at least have access to the data. Then we can do a reverse replication (including a local FULL and then ship the storage to the other location). Ahhh, for an online defragment function. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Coordinator, Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, RPI Players
On 03/29/11 02:52 AM, Paul Kraus wrote:> On Thu, Mar 24, 2011 at 7:07 AM, Edward Ned Harvey > <opensolarisisdeadlongliveopensolaris at nedharvey.com> wrote: > >> When you have a backup server, which does nothing but zfs receive, that''s >> probably your best case scenario. Because the data is as nonvolatile as >> possible. But indeed, because all the sends are incremental, fragmentation >> will accumulate. If you want to eliminate it once, but not once and for >> all, then you''ll have to occasionally do a full receive instead of >> incremental. If you''re more than 50% utilized on the receiving pool, you''ll >> have to zfs destroy or zpool destroy everything on the receiving end prior >> to doing the full receive. (zpool destroy is instant, while zfs destroy >> will take a long time). So you pay something in terms of increased >> temporary risk. Or add enough disks to do a full receive without destroying >> first - in which case you pay something in terms of additional hardware. > Unfortunately, capacity is not the limiting factor in some cases. > In my case we do not have the bandwidth to do a FULL send/recv, it > would take weeks.I''m not 100% sure how the data would end up (whether de-fragmentation would be achieved), but could you do a rend/receive to a new filesystem on the same host? -- Ian.
> > In my case we do not have the bandwidth to do a FULL send/recv, it > > would take weeks. > > I''m not 100% sure how the data would end up (whether de-fragmentation > would be achieved), but could you do a rend/receive to a new > filesystem on the same host?You could, but doing so on the same pool won''t lead to defragmentation. With a new pool, though, it''ll do the exact same as sending over the network. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 roy at karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et element?rt imperativ for alle pedagoger ? unng? eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer p? norsk.