Hi,
I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1  
volumes (of 7 x 2TB disks each):
# zpool iostat -v | grep -v c4
                  capacity     operations    bandwidth
pool           used  avail   read  write   read  write
------------  -----  -----  -----  -----  -----  -----
backup        35.2T  15.3T    602    272  15.3M  11.1M
   raidz1      11.6T  1.06T    138     49  2.99M  2.33M
   raidz1      11.8T   845G    163     54  3.82M  2.57M
   raidz1      6.00T  6.62T    161     84  4.50M  3.16M
   raidz1      5.88T  6.75T    139     83  4.01M  3.09M
------------  -----  -----  -----  -----  -----  -----
Originally there were only the first two raidz1 volumes, and the two  
from the bottom were added later.
You can notice that by the amount of used / free space. The first two  
volumes have ~11TB used and ~1TB free, while the other two have around  
~6TB used and ~6TB free.
I have hundreds of zfs''es storing backups from several servers. Each  
ZFS has about 7 snapshots of older backups.
I have the impression I''m getting degradation in performance due to  
the limited space in the first two volumes, specially the second,  
which has only 845GB free.
Is there any way to re-stripe the pool, so I can take advantage of all  
spindles across the raidz1 volumes? Right now it looks like the newer  
volumes are doing the heavy while the other two just hold old data.
Thanks,
Eduardo Bragatto
Short answer: No. Long answer: Not without rewriting the previously written data. Data is being striped over all of the top level VDEVs, or at least it should be. But there is no way, at least not built into ZFS, to re-allocate the storage to perform I/O balancing. You would basically have to do this manually. Either way, I''m guessing this isn''t the answer you wanted but hey, you get what you get. On Tue, Aug 3, 2010 at 13:52, Eduardo Bragatto <eduardo at bragatto.com> wrote:> Hi, > > I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1 > volumes (of 7 x 2TB disks each): > > # zpool iostat -v | grep -v c4 > capacity operations bandwidth > pool used avail read write read write > ------------ ----- ----- ----- ----- ----- ----- > backup 35.2T 15.3T 602 272 15.3M 11.1M > raidz1 11.6T 1.06T 138 49 2.99M 2.33M > raidz1 11.8T 845G 163 54 3.82M 2.57M > raidz1 6.00T 6.62T 161 84 4.50M 3.16M > raidz1 5.88T 6.75T 139 83 4.01M 3.09M > ------------ ----- ----- ----- ----- ----- ----- > > Originally there were only the first two raidz1 volumes, and the two from > the bottom were added later. > > You can notice that by the amount of used / free space. The first two > volumes have ~11TB used and ~1TB free, while the other two have around ~6TB > used and ~6TB free. > > I have hundreds of zfs''es storing backups from several servers. Each ZFS > has about 7 snapshots of older backups. > > I have the impression I''m getting degradation in performance due to the > limited space in the first two volumes, specially the second, which has only > 845GB free. > > Is there any way to re-stripe the pool, so I can take advantage of all > spindles across the raidz1 volumes? Right now it looks like the newer > volumes are doing the heavy while the other two just hold old data. > > Thanks, > Eduardo Bragatto > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it''s a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100803/404a66b1/attachment.html>
On Aug 3, 2010, at 10:08 PM, Khyron wrote:> Long answer: Not without rewriting the previously written data. Data > is being striped over all of the top level VDEVs, or at least it > should > be. But there is no way, at least not built into ZFS, to re- > allocate the > storage to perform I/O balancing. You would basically have to do > this manually. > > Either way, I''m guessing this isn''t the answer you wanted but hey, you > get what you get.Actually, that was the answer I was expecting, yes. The real question, then, is: what data should I rewrite? I want to rewrite data that''s written on the nearly full volumes so they get spread to the volumes with more space available. Should I simply do a " zfs send | zfs receive" on all ZFSes I have? (we are talking about 400 ZFSes with about 7 snapshots each, here)... Or is there a way to rearrange specifically the data from the nearly full volumes? Thanks, Eduardo Bragatto
On Aug 3, 2010, at 10:52 AM, Eduardo Bragatto wrote:> Hi, > > I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1 volumes (of 7 x 2TB disks each): > > # zpool iostat -v | grep -v c4Unfortunately, zpool iostat is completely useless at describing performance. The only thing it can do is show device bandwidth, and everyone here knows that bandwidth is not performance, right? Nod along, thank you.> capacity operations bandwidth > pool used avail read write read write > ------------ ----- ----- ----- ----- ----- ----- > backup 35.2T 15.3T 602 272 15.3M 11.1M > raidz1 11.6T 1.06T 138 49 2.99M 2.33M > raidz1 11.8T 845G 163 54 3.82M 2.57M > raidz1 6.00T 6.62T 161 84 4.50M 3.16M > raidz1 5.88T 6.75T 139 83 4.01M 3.09M > ------------ ----- ----- ----- ----- ----- ----- > > Originally there were only the first two raidz1 volumes, and the two from the bottom were added later. > > You can notice that by the amount of used / free space. The first two volumes have ~11TB used and ~1TB free, while the other two have around ~6TB used and ~6TB free.Yes, and you also notice that the writes are biased towards the raidz1 sets that are less full. This is exactly what you want :-) Eventually, when the less empty sets become more empty, the writes will rebalance. OTOH, reads will come from whence they were written.> > I have hundreds of zfs''es storing backups from several servers. Each ZFS has about 7 snapshots of older backups. > > I have the impression I''m getting degradation in performance due to the limited space in the first two volumes, specially the second, which has only 845GB free.Impressions work well for dating, but not so well for performance. Does your application run faster or slower?> > Is there any way to re-stripe the pool, so I can take advantage of all spindles across the raidz1 volumes? Right now it looks like the newer volumes are doing the heavy while the other two just hold old data.Yes, of course. But it requires copying the data, which probably isn''t feasible. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com
On Aug 3, 2010, at 10:57 PM, Richard Elling wrote:> Unfortunately, zpool iostat is completely useless at describing > performance. > The only thing it can do is show device bandwidth, and everyone here > knows > that bandwidth is not performance, right? Nod along, thank you.I totally understand that, I only used the output to show the space utilization per raidz1 volume.> Yes, and you also notice that the writes are biased towards the > raidz1 sets > that are less full. This is exactly what you want :-) Eventually, > when the less > empty sets become more empty, the writes will rebalance.Actually, if we are going to consider the values from zpool iostats, they are just slightly biased towards the volumes I would want -- for example, on the first post I''ve made, the volume with less free space had 845GB free.. that same volume now has 833GB -- I really would like to just stop writing to that volume at this point as I''ve experience very bad performance in the past when a volume gets nearly full. As a reference, here''s the information I posted less than 12 hours ago: # zpool iostat -v | grep -v c4 capacity operations bandwidth pool used avail read write read write ------------ ----- ----- ----- ----- ----- ----- backup 35.2T 15.3T 602 272 15.3M 11.1M raidz1 11.6T 1.06T 138 49 2.99M 2.33M raidz1 11.8T 845G 163 54 3.82M 2.57M raidz1 6.00T 6.62T 161 84 4.50M 3.16M raidz1 5.88T 6.75T 139 83 4.01M 3.09M ------------ ----- ----- ----- ----- ----- ----- And here''s the info from the same system, as I write now: # zpool iostat -v | grep -v c4 capacity operations bandwidth pool used avail read write read write ------------ ----- ----- ----- ----- ----- ----- backup 35.3T 15.2T 541 208 9.90M 6.45M raidz1 11.6T 1.06T 116 38 2.16M 1.41M raidz1 11.8T 833G 122 39 2.28M 1.49M raidz1 6.02T 6.61T 152 64 2.72M 1.78M raidz1 5.89T 6.73T 149 66 2.73M 1.77M ------------ ----- ----- ----- ----- ----- ----- As you can see, the second raidz1 volume is not being spared and has been providing with almost as much space as the others (and even more compared to the first volume).>> I have the impression I''m getting degradation in performance due to >> the limited space in the first two volumes, specially the second, >> which has only 845GB free. > > Impressions work well for dating, but not so well for performance. > Does your application run faster or slower?You''re a funny guy. :) Let me re-phrase it: I''m sure I''m getting degradation in performance as my applications are waiting more on I/O now than they used to do (based on CPU utilization graphs I have). The impression part, is that the reason is the limited space in those two volumes -- as I said, I already experienced bad performance on zfs systems running nearly out of space before.>> Is there any way to re-stripe the pool, so I can take advantage of >> all spindles across the raidz1 volumes? Right now it looks like the >> newer volumes are doing the heavy while the other two just hold old >> data. > > Yes, of course. But it requires copying the data, which probably > isn''t feasible.I''m willing to copy data around to get this accomplish, I''m really just looking for the best method -- I have more than 10TB free, so I have some space to play with if I have to duplicate some data and erase the old copy, for example. Thanks, Eduardo Bragatto
I notice you use the word "volume" which really isn''t accurate or appropriate here. If all of these VDEVs are part of the same pool, which as I recall you said they are, then writes are striped across all of them (with bias for the more empty aka less full VDEVs). You probably want to "zfs send" the oldest dataset (ZFS terminology for a file system) into a new dataset. That oldest dataset was created when there were only 2 top level VDEVs, most likely. If you have multiple datasets created when you had only 2 VDEVs, then send/receive them both (in serial fashion, one after the other). If you have room for the snapshots too, then send all of it and then delete the source dataset when done. I think this will achieve what you want. You may want to get a bit more specific and choose from the oldest datasets THEN find the smallest of those oldest datasets and send/receive it first. That way, the send/receive completes in less time, and when you delete the source dataset, you''ve now created more free space on the entire pool but without the risk of a single dataset exceeding your 10 TiB of workspace. ZFS'' copy-on-write nature really wants no less than 20% free because you never update data in place; a new copy is always written to disk. You might want to consider turning on compression on your new datasets too, especially if you have free CPU cycles to spare. I don''t know how compressible your data is, but if it''s fairly compressible, say lots of text, then you might get some added benefit when you copy the old data into the new datasets. Saving more space, then deleting the source dataset, should help your pool have more free space, and thus influence your writes for better I/O balancing when you do the next (and the next) dataset copies. HTH. On Tue, Aug 3, 2010 at 22:48, Eduardo Bragatto <eduardo at bragatto.com> wrote:> On Aug 3, 2010, at 10:08 PM, Khyron wrote: > > Long answer: Not without rewriting the previously written data. Data >> is being striped over all of the top level VDEVs, or at least it should >> be. But there is no way, at least not built into ZFS, to re-allocate the >> storage to perform I/O balancing. You would basically have to do >> this manually. >> >> Either way, I''m guessing this isn''t the answer you wanted but hey, you >> get what you get. >> > > Actually, that was the answer I was expecting, yes. The real question, > then, is: what data should I rewrite? I want to rewrite data that''s written > on the nearly full volumes so they get spread to the volumes with more space > available. > > Should I simply do a " zfs send | zfs receive" on all ZFSes I have? (we are > talking about 400 ZFSes with about 7 snapshots each, here)... Or is there a > way to rearrange specifically the data from the nearly full volumes? > > > Thanks, > Eduardo Bragatto > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it''s a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20100804/d7f370ea/attachment.html>
On Aug 3, 2010, at 8:55 PM, Eduardo Bragatto wrote:> On Aug 3, 2010, at 10:57 PM, Richard Elling wrote: > >> Unfortunately, zpool iostat is completely useless at describing performance. >> The only thing it can do is show device bandwidth, and everyone here knows >> that bandwidth is not performance, right? Nod along, thank you. > > I totally understand that, I only used the output to show the space utilization per raidz1 volume. > >> Yes, and you also notice that the writes are biased towards the raidz1 sets >> that are less full. This is exactly what you want :-) Eventually, when the less >> empty sets become more empty, the writes will rebalance. > > Actually, if we are going to consider the values from zpool iostats, they are just slightly biased towards the volumes I would want -- for example, on the first post I''ve made, the volume with less free space had 845GB free.. that same volume now has 833GB -- I really would like to just stop writing to that volume at this point as I''ve experience very bad performance in the past when a volume gets nearly full.The tipping point for the change in the first fit/best fit allocation algorithm is now 96%. Previously, it was 70%. Since you don''t specify which OS, build, or zpool version, I''ll assume you are on something modern. NB, "zdb -m" will show the pool''s metaslab allocations. If there are no 100% free metaslabs, then it is a clue that the allocator might be working extra hard.> As a reference, here''s the information I posted less than 12 hours ago: > > # zpool iostat -v | grep -v c4 > capacity operations bandwidth > pool used avail read write read write > ------------ ----- ----- ----- ----- ----- ----- > backup 35.2T 15.3T 602 272 15.3M 11.1M > raidz1 11.6T 1.06T 138 49 2.99M 2.33M > raidz1 11.8T 845G 163 54 3.82M 2.57M > raidz1 6.00T 6.62T 161 84 4.50M 3.16M > raidz1 5.88T 6.75T 139 83 4.01M 3.09M > ------------ ----- ----- ----- ----- ----- ----- > > And here''s the info from the same system, as I write now: > > # zpool iostat -v | grep -v c4 > capacity operations bandwidth > pool used avail read write read write > ------------ ----- ----- ----- ----- ----- ----- > backup 35.3T 15.2T 541 208 9.90M 6.45M > raidz1 11.6T 1.06T 116 38 2.16M 1.41M > raidz1 11.8T 833G 122 39 2.28M 1.49M > raidz1 6.02T 6.61T 152 64 2.72M 1.78M > raidz1 5.89T 6.73T 149 66 2.73M 1.77M > ------------ ----- ----- ----- ----- ----- ----- > > As you can see, the second raidz1 volume is not being spared and has been providing with almost as much space as the others (and even more compared to the first volume).Yes, perhaps 1.5-2x data written to the less full raidz1 sets. The exact amount of data is not shown, because zpool iostat doesn''t show how much data is written, it shows the bandwidth.>>> I have the impression I''m getting degradation in performance due to the limited space in the first two volumes, specially the second, which has only 845GB free. >> >> Impressions work well for dating, but not so well for performance. >> Does your application run faster or slower? > > You''re a funny guy. :) > > Let me re-phrase it: I''m sure I''m getting degradation in performance as my applications are waiting more on I/O now than they used to do (based on CPU utilization graphs I have). The impression part, is that the reason is the limited space in those two volumes -- as I said, I already experienced bad performance on zfs systems running nearly out of space before.OK, so how long are they waiting? Try "iostat -zxCn" and look at the asvc_t column. This will show how the disk is performing, though it won''t show the performance delivered by the file system to the application. To measure the latter, try "fsstat zfs" (assuming you are on a Solaris distro) Also, if these are HDDs, the media bandwidth decreases and seeks increase as they fill. ZFS tries to favor the outer cylinders (lower numbered metaslabs) to take this into account.>>> Is there any way to re-stripe the pool, so I can take advantage of all spindles across the raidz1 volumes? Right now it looks like the newer volumes are doing the heavy while the other two just hold old data. >> >> Yes, of course. But it requires copying the data, which probably isn''t feasible. > > I''m willing to copy data around to get this accomplish, I''m really just looking for the best method -- I have more than 10TB free, so I have some space to play with if I have to duplicate some data and erase the old copy, for example.zfs send/receive is usually the best method. -- richard -- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com
On Tue, 3 Aug 2010, Eduardo Bragatto wrote:> You''re a funny guy. :) > > Let me re-phrase it: I''m sure I''m getting degradation in performance as my > applications are waiting more on I/O now than they used to do (based on CPU > utilization graphs I have). The impression part, is that the reason is the > limited space in those two volumes -- as I said, I already experienced bad > performance on zfs systems running nearly out of space before.Assuming that your impressions are correct, are you sure that your new disk drives are similar to the older ones? Are they an identical model? Design trade-offs are now often resulting in larger capacity drives with reduced performance. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Aug 4, 2010, at 12:26 AM, Richard Elling wrote:> The tipping point for the change in the first fit/best fit > allocation algorithm is > now 96%. Previously, it was 70%. Since you don''t specify which OS, > build, > or zpool version, I''ll assume you are on something modern.I''m running Solaris 10 10/09 s10x_u8wos_08a, ZFS Pool version 15.> NB, "zdb -m" will show the pool''s metaslab allocations. If there are > no 100% > free metaslabs, then it is a clue that the allocator might be > working extra hard.On the first two VDEVs there are no allocations 100% free (most are nearly full)... The two newer ones, however, do have several allocations of 128GB each, 100% free. If I understand correctly in that scenario the allocator will work extra, is that correct?> OK, so how long are they waiting? Try "iostat -zxCn" and look at the > asvc_t column. This will show how the disk is performing, though it > won''t show the performance delivered by the file system to the > application. To measure the latter, try "fsstat zfs" (assuming you > are > on a Solaris distro)Checking with iostat, I noticed the average wait time to be between 40ms and 50ms for all disks. Which doesn''t seem too bad. And this is the output of fsstat: # fsstat zfs new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 3.26M 1.34M 3.22M 161M 13.4M 1.36G 9.6M 10.5M 899G 22.0M 625G zfs However I did have CPU spikes at 100% where the kernel was taking all cpu time. I have reduced my zfs_arc_max parameter as it seemed the applications were struggling for RAM and things are looking better now Thanks for your time, Eduardo Bragatto.
On Aug 4, 2010, at 12:20 AM, Khyron wrote:> I notice you use the word "volume" which really isn''t accurate or > appropriate here.Yeah, it didn''t seem right to me, but I wasn''t sure about the nomenclature, thanks for clarifying.> You may want to get a bit more specific and choose from the oldest > datasets THEN find the smallest of those oldest datasets and > send/receive it first. That way, the send/receive completes in less > time, and when you delete the source dataset, you''ve now created > more free space on the entire pool but without the risk of a single > dataset exceeding your 10 TiB of workspace.That makes sense, I''ll try send/receiving a few of those datasets and see how it goes. I believe I can find the ones that were created before the two new VDEVs were added, by comparing the creation time from "zfs get creation"> ZFS'' copy-on-write nature really wants no less than 20% free because > you never update data in place; a new copy is always written to disk.Right, and my problem is that I have two VDEVs with less than 10% free at this point -- although the other two have around 50% free each.> You might want to consider turning on compression on your new datasets > too, especially if you have free CPU cycles to spare. I don''t know > how > compressible your data is, but if it''s fairly compressible, say lots > of text, > then you might get some added benefit when you copy the old data into > the new datasets. Saving more space, then deleting the source > dataset, > should help your pool have more free space, and thus influence your > writes for better I/O balancing when you do the next (and the next) > dataset > copies.Unfortunately the data taking most of the space it already compressed, so while I would gain some space from many text files that I also have, those are not the majority of my content, and the effort would probably not justify the small gain. Thanks Eduardo Bragatto
On Wed, 4 Aug 2010, Eduardo Bragatto wrote:> > Checking with iostat, I noticed the average wait time to be between 40ms and > 50ms for all disks. Which doesn''t seem too bad.Actually, this is quite high. I would not expect such long wait times except for when under extreme load such as a benchmark. If the wait times are this long under normal use, then there is something wrong.> However I did have CPU spikes at 100% where the kernel was taking all cpu > time. > > I have reduced my zfs_arc_max parameter as it seemed the applications were > struggling for RAM and things are looking better nowOdd. What type of applications are you running on this system? Are applications running on the server competing with client accesses? Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Aug 4, 2010, at 11:18 AM, Bob Friesenhahn wrote:> Assuming that your impressions are correct, are you sure that your > new disk drives are similar to the older ones? Are they an > identical model? Design trade-offs are now often resulting in > larger capacity drives with reduced performance.Yes, the disks are the same, no problems there. On Aug 4, 2010, at 2:11 PM, Bob Friesenhahn wrote:> On Wed, 4 Aug 2010, Eduardo Bragatto wrote: >> >> Checking with iostat, I noticed the average wait time to be between >> 40ms and 50ms for all disks. Which doesn''t seem too bad. > > Actually, this is quite high. I would not expect such long wait > times except for when under extreme load such as a benchmark. If > the wait times are this long under normal use, then there is > something wrong.That''s a backup server, I usually have 10 rsync instances running simultaneously so there''s a lot of random disk access going on -- I think that explains the high average time. Also, I recently enabled graphing of the IOPS per disk (reading it using net-snmp) and I see most disks are operating near their limit -- except for some disks from the older VDEVs which is what I''m trying to address here.>> However I did have CPU spikes at 100% where the kernel was taking >> all cpu time. >> >> I have reduced my zfs_arc_max parameter as it seemed the >> applications were struggling for RAM and things are looking better >> now > > Odd. What type of applications are you running on this system? Are > applications running on the server competing with client accesses?I noticed some of those rsync processes were using almost 1GB of RAM each and the server has only 8GB. I started seeing the server swapping a bit during the cpu spikes at 100%, so I figured it would be better to cap ARC and leave some room for the rsync processes. I will also start using rsync v3 to reduce the memory foot print, so I might be able to give back some RAM to ARC, and I''m thinking maybe going to 16GB RAM, as the pool is quite large and I''m sure more ARC wouldn''t hurt. Thanks, Eduardo Bragatto.
On Wed, 4 Aug 2010, Eduardo Bragatto wrote:> > I will also start using rsync v3 to reduce the memory foot print, so I might > be able to give back some RAM to ARC, and I''m thinking maybe going to 16GB > RAM, as the pool is quite large and I''m sure more ARC wouldn''t hurt.It is definitely a wise idea to use rsync v3. Previous versions had to recurse the whole tree on both sides (storing what was learned in memory) before doing anything. Bob -- Bob Friesenhahn bfriesen at simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
On Aug 4, 2010, at 9:03 AM, Eduardo Bragatto wrote:> On Aug 4, 2010, at 12:26 AM, Richard Elling wrote: > >> The tipping point for the change in the first fit/best fit allocation algorithm is >> now 96%. Previously, it was 70%. Since you don''t specify which OS, build, >> or zpool version, I''ll assume you are on something modern. > > I''m running Solaris 10 10/09 s10x_u8wos_08a, ZFS Pool version 15.Then the first fit/best fit threshold is 96%.>> NB, "zdb -m" will show the pool''s metaslab allocations. If there are no 100% >> free metaslabs, then it is a clue that the allocator might be working extra hard. > > On the first two VDEVs there are no allocations 100% free (most are nearly full)... The two newer ones, however, do have several allocations of 128GB each, 100% free. > > If I understand correctly in that scenario the allocator will work extra, is that correct?Yes, and this can be measured, but...>> OK, so how long are they waiting? Try "iostat -zxCn" and look at the >> asvc_t column. This will show how the disk is performing, though it >> won''t show the performance delivered by the file system to the >> application. To measure the latter, try "fsstat zfs" (assuming you are >> on a Solaris distro) > > Checking with iostat, I noticed the average wait time to be between 40ms and 50ms for all disks. Which doesn''t seem too bad.... actually, that is pretty bad. Look for an average around 10 ms and peaks around 20ms. Solve this problem first -- the system can do a huge amount of allocations for any algorithm in 1ms.> And this is the output of fsstat: > > # fsstat zfs > new name name attr attr lookup rddir read read write write > file remov chng get set ops ops ops bytes ops bytes > 3.26M 1.34M 3.22M 161M 13.4M 1.36G 9.6M 10.5M 899G 22.0M 625G zfsUnfortunately, the first line is useless, it is the summary since boot. Try adding a sample interval to see how things are moving now.> > However I did have CPU spikes at 100% where the kernel was taking all cpu time.Again, this can be analyzed using baseline performance analysis techniques. The "prstat" command should show how CPU is being used. I''m not running Solaris 10 10/09, but IIRC, it has the ZFS enhancement where CPU time is attributed to the pool, as seen in prstat. -- richard> > I have reduced my zfs_arc_max parameter as it seemed the applications were struggling for RAM and things are looking better now > > Thanks for your time, > Eduardo Bragatto. > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss-- Richard Elling richard at nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com