Hi, We have a zfs file system configured using a Sunfire 280R with a 10T Raidweb array bash-3.00# zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT filer 9.44T 6.97T 2.47T 73% ONLINE - bash-3.00# zpool status pool: backup state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM filer ONLINE 0 0 0 c1t2d1 ONLINE 0 0 0 c1t2d2 ONLINE 0 0 0 c1t2d3 ONLINE 0 0 0 c1t2d4 ONLINE 0 0 0 c1t2d5 ONLINE 0 0 0 the file system is shared via nfs. Off late we have seen that the file system access slows down considerably. Running commands like find, du on the zfs system did slow it down, but the intermittent slowdowns cannot be explained. Is there a way to trace the I/O on the zfs so that we can list out heavy read/writes to the file system to be responsible for the slowness. Thanks, --Walter
Hi, I think your problem is filesystem fragmentation. When available space is less than 40% ZFS might have problems with finding free blocks. Use this script to check it: #!/usr/sbin/dtrace -s fbt::space_map_alloc:entry { self->s = arg1; } fbt::space_map_alloc:return /arg1 != -1/ { self->s = 0; } fbt::space_map_alloc:return /self->s && (arg1 == -1)/ { @s = quantize(self->s); self->s = 0; } tick-10s { printa(@s); } Run script for few minutes. You might also have problems with space map size. This script will show you size of space map on disk: #!/bin/sh echo ''::spa'' | mdb -k | grep ACTIVE \ | while read pool_ptr state pool_name do echo "checking pool map size [B]: $pool_name" echo "${pool_ptr}::walk metaslab|::print -d struct metaslab ms_smo.smo_objsize" \ | mdb -k \ | nawk ''{sub("^0t","",$3);sum+=$3}END{print sum}'' done In memory space map takes 5 times more. All space map is loaded into memory all the time, but for example during snapshot remove all space map might be loaded, so check if you have enough RAM available on machine. Check ::kmastat in mdb. Space map uses kmem_alloc_40 ( on thumpers this is a real problem ) Workaround: 1. first you can change pool recordsize zfs set recordsize=64K POOL Maybe you wil have to use 32K or even 16K 2. You will have to disable ZIL, becuase ZIL always takes 128kB blocks. 3. Try to disable cache, tune vdev cache. Check: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide Lukas Karwacki Dnia 7-11-2007 o godz. 1:49 Walter Faleiro napisa?(a):> Hi, > We have a zfs file system configured using a Sunfire 280R with a 10T > Raidweb array > > bash-3.00# zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > filer 9.44T 6.97T 2.47T 73% ONLINE - > > > bash-3.00# zpool status > pool: backup > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > filer ONLINE 0 0 0 > c1t2d1 ONLINE 0 0 0 > c1t2d2 ONLINE 0 0 0 > c1t2d3 ONLINE 0 0 0 > c1t2d4 ONLINE 0 0 0 > c1t2d5 ONLINE 0 0 0 > > > the file system is shared via nfs. Off late we have seen that the file > system access slows down considerably. Running commands like find, du > on the zfs system did slow it down, but the intermittent slowdowns > cannot be explained. > > Is there a way to trace the I/O on the zfs so that we can list out > heavy read/writes to the file system to be responsible for the > slowness. > > Thanks, > --Walter > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss---------------------------------------------------- Wojna z terrorem wkracza w decyduj?c? faz?: Robert Redford, Meryl Streep i Tom Cruise w filmie UKRYTA STRATEGIA - w kinach od 9 listopada! http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fstrategia.html&sid=90
Hi Lukasz, The output of the first sript gives bash-3.00# ./test.sh dtrace: script ''./test.sh'' matched 4 probes CPU ID FUNCTION:NAME 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s and it goes on. The second script gives: checking pool map size [B]: filer mdb: failed to dereference symbol: unknown symbol name 423917216903435 Regards, --Walter On 11/7/07, ?ukasz K <kangurek_pl at wp.pl> wrote:> > Hi, > > I think your problem is filesystem fragmentation. > When available space is less than 40% ZFS might have problems with > finding free blocks. Use this script to check it: > > #!/usr/sbin/dtrace -s > > fbt::space_map_alloc:entry > { > self->s = arg1; > } > > fbt::space_map_alloc:return > /arg1 != -1/ > { > self->s = 0; > } > > fbt::space_map_alloc:return > /self->s && (arg1 == -1)/ > { > @s = quantize(self->s); > self->s = 0; > } > > tick-10s > { > printa(@s); > } > > Run script for few minutes. > > > You might also have problems with space map size. > This script will show you size of space map on disk: > #!/bin/sh > > echo ''::spa'' | mdb -k | grep ACTIVE \ > | while read pool_ptr state pool_name > do > echo "checking pool map size [B]: $pool_name" > > echo "${pool_ptr}::walk metaslab|::print -d struct metaslab > ms_smo.smo_objsize" \ > | mdb -k \ > | nawk ''{sub("^0t","",$3);sum+=$3}END{print sum}'' > done > > In memory space map takes 5 times more. > All space map is loaded into memory all the time, but for example > during snapshot remove all space map might be loaded, so check > if you have enough RAM available on machine. > Check ::kmastat in mdb. > Space map uses kmem_alloc_40 ( on thumpers this is a real problem ) > > Workaround: > 1. first you can change pool recordsize > zfs set recordsize=64K POOL > > Maybe you wil have to use 32K or even 16K > > 2. You will have to disable ZIL, becuase ZIL always takes 128kB > blocks. > > 3. Try to disable cache, tune vdev cache. Check: > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > Lukas Karwacki > > Dnia 7-11-2007 o godz. 1:49 Walter Faleiro napisa?(a): > > Hi, > > We have a zfs file system configured using a Sunfire 280R with a 10T > > Raidweb array > > > > bash-3.00# zpool list > > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > > filer 9.44T 6.97T 2.47T 73% ONLINE > - > > > > > > bash-3.00# zpool status > > pool: backup > > state: ONLINE > > scrub: none requested > > config: > > > > NAME STATE READ WRITE CKSUM > > filer ONLINE 0 0 0 > > c1t2d1 ONLINE 0 0 0 > > c1t2d2 ONLINE 0 0 0 > > c1t2d3 ONLINE 0 0 0 > > c1t2d4 ONLINE 0 0 0 > > c1t2d5 ONLINE 0 0 0 > > > > > > the file system is shared via nfs. Off late we have seen that the file > > system access slows down considerably. Running commands like find, du > > on the zfs system did slow it down, but the intermittent slowdowns > > cannot be explained. > > > > Is there a way to trace the I/O on the zfs so that we can list out > > heavy read/writes to the file system to be responsible for the > > slowness. > > > > Thanks, > > --Walter > > _______________________________________________ > > zfs-discuss mailing list > > zfs-discuss at opensolaris.org > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ---------------------------------------------------- > Wojna z terrorem wkracza w decyduj?c? faz?: > Robert Redford, Meryl Streep i Tom Cruise w filmie > UKRYTA STRATEGIA - w kinach od 9 listopada! > > http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fstrategia.html&sid=90 > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20071107/ec116b1a/attachment.html>
how is the performance on the zfs directly without nfs? i have experienced big problems running nfs on large volumes (independent on the underlaying fs) This message posted from opensolaris.org
Dnia 8-11-2007 o godz. 7:58 Walter Faleiro napisał(a): Hi Lukasz, The output of the first sript gives bash-3.00# ./test.sh dtrace: script ''./test.sh'' matched 4 probes CPU ID FUNCTION:NAME 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s 0 42681 :tick-10s and it goes on. It means that you have free blocks :) , or you do not have any I/O writes. run: #zpool iostat 1 and #iostat -zxc 1 The second script gives: checking pool map size [B]: filer mdb: failed to dereference symbol: unknown symbol name 423917216903435 Which Solaris version do you use ? Maybe you should patch kernel. Also you can check if there are problems with zfs sync phase. Run #dtrace -n fbt::txg_wait_open:entry''{ stack(); ustack(); }'' and wait 10 minutes also give more information about pool #zfs get all filer I assume ''filer'' is you pool name. Regards Lukas On 11/7/07, Łukasz K <kangurek_pl@wp.pl> wrote: Hi, I think your problem is filesystem fragmentation. When available space is less than 40% ZFS might have problems with finding free blocks. Use this script to check it: #!/usr/sbin/dtrace -s fbt::space_map_alloc:entry { self->s = arg1; } fbt::space_map_alloc:return /arg1 != -1/ { self->s = 0; } fbt::space_map_alloc:return /self->s && (arg1 == -1)/ { @s = quantize(self->s); self->s = 0; } tick-10s { printa(@s); } Run script for few minutes. You might also have problems with space map size. This script will show you size of space map on disk: #!/bin/sh echo ''::spa'' | mdb -k | grep ACTIVE \ | while read pool_ptr state pool_name do echo "checking pool map size [B]: $pool_name" echo "${pool_ptr}::walk metaslab|::print -d struct metaslab ms_smo.smo_objsize" \ | mdb -k \ | nawk ''{sub("^0t","",$3);sum+=$3}END{print sum}'' done In memory space map takes 5 times more. All space map is loaded into memory all the time, but for example during snapshot remove all space map might be loaded, so check if you have enough RAM available on machine. Check ::kmastat in mdb. Space map uses kmem_alloc_40 ( on thumpers this is a real problem ) Workaround: 1. first you can change pool recordsize zfs set recordsize=64K POOL Maybe you wil have to use 32K or even 16K 2. You will have to disable ZIL, becuase ZIL always takes 128kB blocks. 3. Try to disable cache, tune vdev cache. Check: http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide Lukas Karwacki Dnia 7-11-2007 o godz. 1:49 Walter Faleiro napisał(a): > Hi, > We have a zfs file system configured using a Sunfire 280R with a 10T > Raidweb array > > bash-3.00# zpool list > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > filer 9.44T 6.97T 2.47T 73% ONLINE - > > > bash-3.00# zpool status > pool: backup > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > filer ONLINE 0 0 0 > c1t2d1 ONLINE 0 0 0 > c1t2d2 ONLINE 0 0 0 > c1t2d3 ONLINE 0 0 0 > c1t2d4 ONLINE 0 0 0 > c1t2d5 ONLINE 0 0 0 > > > the file system is shared via nfs. Off late we have seen that the file > system access slows down considerably. Running commands like find, du > on the zfs system did slow it down, but the intermittent slowdowns > cannot be explained. > > Is there a way to trace the I/O on the zfs so that we can list out > heavy read/writes to the file system to be responsible for the > slowness. > > Thanks, > --Walter > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ---------------------------------------------------- Wojna z terrorem wkracza w decydującą fazę: Robert Redford, Meryl Streep i Tom Cruise w filmie UKRYTA STRATEGIA - w kinach od 9 listopada! http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fstrategia.html&sid=90 ---------------------------------------------------- Wojna z terrorem wkracza w decydującą fazę: Robert Redford, Meryl Streep i Tom Cruise w filmie UKRYTA STRATEGIA - w kinach od 9 listopada! http://klik.wp.pl/?adr=http://corto.www.wp.pl/as/strategia.html&sid=90 _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hi Lukas, The system that we use for zfs is Solaris 10 on Sparc Update 3. I assume all the scripts you gave have to be run on the nfs/zfs server and not any client. Thanks, --Walter On Nov 8, 2007 2:34 AM, ?ukasz K <kangurek_pl at wp.pl> wrote:> Dnia 8-11-2007 o godz. 7:58 Walter Faleiro napisa?(a): > Hi Lukasz, > The output of the first sript gives > bash-3.00# ./test.sh > dtrace: script ''./test.sh'' matched 4 probes > CPU ID FUNCTION:NAME > 0 42681 :tick-10s > > 0 42681 :tick-10s > > 0 42681 :tick-10s > > 0 42681 :tick-10s > > 0 42681 :tick-10s > > 0 42681 :tick-10s > > 0 42681 :tick-10s > > > > and it goes on. > > It means that you have free blocks :) , or you do not have any I/O writes. > run: > #zpool iostat 1 > and > #iostat -zxc 1 > > > > The second script gives: > > checking pool map size [B]: filer > mdb: failed to dereference symbol: unknown symbol name > 423917216903435 > > Which Solaris version do you use ? > Maybe you should patch kernel. > > Also you can check if there are problems with zfs sync phase. > Run > #dtrace -n fbt::txg_wait_open:entry''{ stack(); ustack(); }'' > and wait 10 minutes > > also give more information about pool > #zfs get all filer > > I assume ''filer'' is you pool name. > > Regards > > Lukas > > > > On 11/7/07, ?ukasz K <kangurek_pl at wp.pl> wrote: > > Hi, > > > > I think your problem is filesystem fragmentation. > > When available space is less than 40% ZFS might have problems with > > finding free blocks. Use this script to check it: > > > > #!/usr/sbin/dtrace -s > > > > fbt::space_map_alloc:entry > > { > > self->s = arg1; > > } > > > > fbt::space_map_alloc:return > > /arg1 != -1/ > > { > > self->s = 0; > > } > > > > fbt::space_map_alloc:return > > /self->s && (arg1 == -1)/ > > { > > @s = quantize(self->s); > > self->s = 0; > > } > > > > tick-10s > > { > > printa(@s); > > } > > > > Run script for few minutes. > > > > > > You might also have problems with space map size. > > This script will show you size of space map on disk: > > #!/bin/sh > > > > echo ''::spa'' | mdb -k | grep ACTIVE \ > > | while read pool_ptr state pool_name > > do > > echo "checking pool map size [B]: $pool_name" > > > > echo "${pool_ptr}::walk metaslab|::print -d struct metaslab > > ms_smo.smo_objsize" \ > > | mdb -k \ > > | nawk ''{sub("^0t","",$3);sum+=$3}END{print sum}'' > > done > > > > In memory space map takes 5 times more. > > All space map is loaded into memory all the time, but for example > > during snapshot remove all space map might be loaded, so check > > if you have enough RAM available on machine. > > Check ::kmastat in mdb. > > Space map uses kmem_alloc_40 ( on thumpers this is a real problem ) > > > > Workaround: > > 1. first you can change pool recordsize > > zfs set recordsize=64K POOL > > > > Maybe you wil have to use 32K or even 16K > > > > 2. You will have to disable ZIL, becuase ZIL always takes 128kB > > blocks. > > > > 3. Try to disable cache, tune vdev cache. Check: > > http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide > > > > Lukas Karwacki > > > > Dnia 7-11-2007 o godz. 1:49 Walter Faleiro napisa?(a): > > > Hi, > > > We have a zfs file system configured using a Sunfire 280R with a 10T > > > Raidweb array > > > > > > bash-3.00# zpool list > > > NAME SIZE USED AVAIL CAP HEALTH ALTROOT > > > filer 9.44T 6.97T 2.47T 73% ONLINE > - > > > > > > > > > bash-3.00# zpool status > > > pool: backup > > > state: ONLINE > > > scrub: none requested > > > config: > > > > > > NAME STATE READ WRITE CKSUM > > > filer ONLINE 0 0 0 > > > c1t2d1 ONLINE 0 0 0 > > > c1t2d2 ONLINE 0 0 0 > > > c1t2d3 ONLINE 0 0 0 > > > c1t2d4 ONLINE 0 0 0 > > > c1t2d5 ONLINE 0 0 0 > > > > > > > > > the file system is shared via nfs. Off late we have seen that the file > > > system access slows down considerably. Running commands like find, du > > > on the zfs system did slow it down, but the intermittent slowdowns > > > cannot be explained. > > > > > > Is there a way to trace the I/O on the zfs so that we can list out > > > heavy read/writes to the file system to be responsible for the > > > slowness. > > > > > > Thanks, > > > --Walter > > > _______________________________________________ > > > zfs-discuss mailing list > > > zfs-discuss at opensolaris.org > > > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > > > ---------------------------------------------------- > > Wojna z terrorem wkracza w decyduj?c? faz?: > > Robert Redford, Meryl Streep i Tom Cruise w filmie > > UKRYTA STRATEGIA - w kinach od 9 listopada! > > > http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fstrategia.html&sid=90 > > > > > > > > > > > > > > > ---------------------------------------------------- > Wojna z terrorem wkracza w decyduj?c? faz?: > Robert Redford, Meryl Streep i Tom Cruise w filmie > UKRYTA STRATEGIA - w kinach od 9 listopada! > http://klik.wp.pl/?adr=http://corto.www.wp.pl/as/strategia.html&sid=90