Group, We are running a benchmark with 4000 users simulating a hospital management system running on Solaris 10 6/06 on USIV+ based SunFire 6900 with 6540 storage array. Are there any tools for measuring internal ZFS activity to help us understand what is going on during slowdowns? We have 192GB of RAM and while ZFS runs well most of the time, there are times where the system time jumps up to 25-40% as measured by vmstat and iostat. These times coincide with slowdowns in file access as measured by a side program that simply reads a random block in a file... these response times can exceed 1 second or longer. Any pointers greatly appreaciated! Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061212/76d9ad26/attachment.html>
Tom Duell wrote On 12/12/06 17:11,:> Group, > > We are running a benchmark with 4000 users > simulating a hospital management system > running on Solaris 10 6/06 on USIV+ based > SunFire 6900 with 6540 storage array. > > Are there any tools for measuring internal > ZFS activity to help us understand what is going > on during slowdowns?dtrace can be used in numerous ways to examine every part of ZFS and Solaris. lockstat(1M) (which actually uses dtrace underneath) can also be used to see the cpu activity (try lockstat -kgIW -D 20 sleep 10). You can also use iostat (eg iostat -xnpcz) to look at disk activity.> > We have 192GB of RAM and while ZFS runs > well most of the time, there are times where > the system time jumps up to 25-40% > as measured by vmstat and iostat. These > times coincide with slowdowns in file access > as measured by a side program that simply > reads a random block in a file... these response > times can exceed 1 second or longer.ZFS commits transaction groups every 5 seconds. I suspect this flurry of activity is due to that. Commiting can indeed take longer than a second. You might be able to show this by changing it with: # echo txg_time/W 10 | mdb -kw then the activity should be longer but less frequent. I don''t however recommend you keep it at that value.> > Any pointers greatly appreaciated! > > Tom > > > > ------------------------------------------------------------------------ > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Thanks, Neil, for the assistance. Tom Neil Perrin wrote On 12/12/06 19:59,:>Tom Duell wrote On 12/12/06 17:11,: > > >>Group, >> >>We are running a benchmark with 4000 users >>simulating a hospital management system >>running on Solaris 10 6/06 on USIV+ based >>SunFire 6900 with 6540 storage array. >> >>Are there any tools for measuring internal >>ZFS activity to help us understand what is going >>on during slowdowns? >> >> > >dtrace can be used in numerous ways to examine >every part of ZFS and Solaris. lockstat(1M) (which actually >uses dtrace underneath) can also be used to see the cpu activity >(try lockstat -kgIW -D 20 sleep 10). > >You can also use iostat (eg iostat -xnpcz) to look at disk activity. > >Yes, we are doing this and the disks are performing extremely well.> > >>We have 192GB of RAM and while ZFS runs >>well most of the time, there are times where >>the system time jumps up to 25-40% >>as measured by vmstat and iostat. These >>times coincide with slowdowns in file access >>as measured by a side program that simply >>reads a random block in a file... these response >>times can exceed 1 second or longer. >> >> > >ZFS commits transaction groups every 5 seconds. >I suspect this flurry of activity is due to that. >Commiting can indeed take longer than a second. > >You might be able to show this by changing it with: > ># echo txg_time/W 10 | mdb -kw > >then the activity should be longer but less frequent. >I don''t however recommend you keep it at that value. > > >Thanks, we may try that to see what effects it might have.> > >>Any pointers greatly appreaciated! >> >>Tom >> >> >> >>------------------------------------------------------------------------ >> >>_______________________________________________ >>zfs-discuss mailing list >>zfs-discuss at opensolaris.org >>http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20061212/90d6e402/attachment.html>
The latency issue might improve with this rfe 6471212 need reserved I/O scheduler slots to improve I/O latency of critical ops -r Tom Duell writes: > Group, > > We are running a benchmark with 4000 users > simulating a hospital management system > running on Solaris 10 6/06 on USIV+ based > SunFire 6900 with 6540 storage array. > > Are there any tools for measuring internal > ZFS activity to help us understand what is going > on during slowdowns? > > We have 192GB of RAM and while ZFS runs > well most of the time, there are times where > the system time jumps up to 25-40% > as measured by vmstat and iostat. These > times coincide with slowdowns in file access > as measured by a side program that simply > reads a random block in a file... these response > times can exceed 1 second or longer. > > Any pointers greatly appreaciated! > > Tom > > > <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> > <html> > <head> > <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"> > <title></title> > </head> > <body text="#000000" bgcolor="#ffffff"> > <font face="Helvetica, Arial, sans-serif">Group,<br> > <br> > We are running a benchmark with 4000 users<br> > simulating a hospital management system<br> > running on Solaris 10 6/06 on USIV+ based<br> > SunFire 6900 with 6540 storage array.<br> > <br> > Are there any tools for measuring internal<br> > ZFS activity to help us understand what is going <br> > on during slowdowns?<br> > <br> > We have 192GB of RAM and while ZFS runs<br> > well most of the time, there are times where<br> > the system time jumps up to 25-40%<br> > as measured by vmstat and iostat. These<br> > times coincide with slowdowns in file access<br> > as measured by a side program that simply<br> > reads a random block in a file... these response<br> > times can exceed 1 second or longer.<br> > <br> > Any pointers greatly appreaciated!<br> > <br> > Tom<br> > <br> > <br> > </font> > </body> > </html> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Hello Neil, Wednesday, December 13, 2006, 1:59:15 AM, you wrote: NP> Tom Duell wrote On 12/12/06 17:11,:>> Group, >> >> We are running a benchmark with 4000 users >> simulating a hospital management system >> running on Solaris 10 6/06 on USIV+ based >> SunFire 6900 with 6540 storage array. >> >> Are there any tools for measuring internal >> ZFS activity to help us understand what is going >> on during slowdowns?NP> dtrace can be used in numerous ways to examine NP> every part of ZFS and Solaris. lockstat(1M) (which actually NP> uses dtrace underneath) can also be used to see the cpu activity NP> (try lockstat -kgIW -D 20 sleep 10). NP> You can also use iostat (eg iostat -xnpcz) to look at disk activity. It''s bad that IO provider doesn''t work with ZFS :( -- Best regards, Robert mailto:rmilkowski at task.gda.pl http://milek.blogspot.com