I''m using Neelakanth''s arcstat tool to troubleshoot performance problems with a ZFS filer we have, sharing home directories to a CentOS frontend Samba box. Output shows an arc target size of 1G, which I find odd, since I haven''t tuned the arc, and the system has 4G of RAM. prstat -a tells me that userland processes are only using about 200-300mb of RAM, and even if Solaris is eating 1GB, that still leaves quite a lot of RAM not being used by the arc. I would believe that this was due to low workload, but I see that ''arcsz'' matches ''c'', which makes me think the system is hitting a bottleneck/wall of some kind. Any thoughts on further troubleshooting appreciated. Blake -- This message posted from opensolaris.org
Blake Irvin wrote:> I''m using Neelakanth''s arcstat tool to troubleshoot performance problems with a ZFS filer we have, sharing home directories to a CentOS frontend Samba box. > > Output shows an arc target size of 1G, which I find odd, since I haven''t tuned the arc, and the system has 4G of RAM. prstat -a tells me that userland processes are only using about 200-300mb of RAM, and even if Solaris is eating 1GB, that still leaves quite a lot of RAM not being used by the arc. > > I would believe that this was due to low workload, but I see that ''arcsz'' matches ''c'', which makes me think the system is hitting a bottleneck/wall of some kind. > > Any thoughts on further troubleshooting appreciated. >It doesn''t sound like you have a memory shortfall. Please start with the ZFS best practices guide http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide Many of the recommendations for NFS will also apply to other file sharing protocols, such as CIFS. -- richard
I think I need to clarify a bit. I''m wondering why arc size is staying so low, when i have 10 nfs clients and about 75 smb clients accessing the store via resharing (on one of the 10 linux nfs clients) of the zfs/nfs export. Or is it normal for the arc target and arc size to match? Of note, I didn''t see these performance issues until the box had been up for about a week, probably enough time for weekly (roughly) windows reboots and profile syncs across multiple clients to force the arc to fill. I have read through and follow the advice on the tuning guide, but still see Windows users with roaming profiles getting very slow profile syncs. This makes me think that zfs isn''t handling the random i/o generated by a profile sync very well. Well, at least that''s what I''m thinking when I see an arc size of 1G, there is at least another free gig of memory, and the clients syncing more than a gig of data fairly often. I will return to studying the tuning guide, though, to make sure I''ve not missed some key bit. It''s not unlikely that I''m missing something fundamental about how zfs should behave in this scenario. cheers, Blake -- This message posted from opensolaris.org
Blake Irvin wrote:> I think I need to clarify a bit. > > I''m wondering why arc size is staying so low, when i have 10 nfs > clients and about 75 smb clients accessing the store via resharing (on > one of the 10 linux nfs clients) of the zfs/nfs export. Or is it > normal for the arc target and arc size to match? Of note, I didn''t see > these performance issues until the box had been up for about a week, > probably enough time for weekly (roughly) windows reboots and profile > syncs across multiple clients to force the arc to fill.In any case, the ARC size is not an indicator of a memory shortfall. The next time it happens, look a the scan rate in vmstat for an indication of memory shortfall. Then proceed to debug accordingly. An excellent book on this topic is the Solaris Performance and Tools companion to Solaris Internals.> > I have read through and follow the advice on the tuning guide, but > still see Windows users with roaming profiles getting very slow > profile syncs. This makes me think that zfs isn''t handling the random > i/o generated by a profile sync very well. Well, at least that''s what > I''m thinking when I see an arc size of 1G, there is at least another > free gig of memory, and the clients syncing more than a gig of data > fairly often.By default, the ARC leaves 1 GByte of memory free. This may or may not be appropriate for your system, which is why there are some tuning suggestions in various places. There is also an issue with the decision to cache versus flush for writes, and the interaction with write throttles. Roch did a nice writeup on changes in this area. You may be running into this, but IMHO it shouldn''t appear to be a memory shortfall. Check Roch''s blog to see if the symptoms are similar. http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle -- richard
For posterity, I''d like to point out the following: neel''s original arcstat.pl uses a crude scaling routine that results in a large loss of precision as numbers cross from Kilobytes to Megabytes to Gigabytes. The 1G reported arc size case described here, could actually be anywhere between 1,000,000MB and 1,999,999MB. Use ''kstat zfs::arcstats'' to read the arc size directly from the kstats (for comparison). I''ve updated arcstat.pl with a better scaling routine that returns more appropriate results (similar to df -h human-readable output). I''ve also added support for L2ARC stats. The updated version can be found here: http://github.com/mharsch/arcstat -- This message posted from opensolaris.org
Hello Mike, thank you for your update. root at S0011 # ./arcstat.pl 3 time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c 11:23:38 197K 7.8K 3 5.7K 3 2.1K 4 6.1K 5 511M 1.5G 11:23:41 70 0 0 0 0 0 0 0 0 511M 1.5G 11:23:44 76 0 0 0 0 0 0 0 0 511M 1.5G 11:23:47 76 0 0 0 0 0 0 0 *1.4210854715202e-14* 511M 1.5G 11:23:50 71 0 0 0 0 0 0 0 0 511M 1.5G 11:23:53 74 0 0 0 0 0 0 0 *1.4210854715202e-14 * 511M 1.5G 11:23:56 74 0 0 0 0 0 0 0 0 511M 1.5G 11:23:59 79 0 0 0 0 0 0 0 0 511M 1.5G 11:24:02 76 0 0 0 0 0 0 0 0 511M 1.5G 11:24:05 74 0 0 0 0 0 0 0 *1.4210854715202e-14* 511M 1.5G 11:24:08 93 0 1.4210854715202e-14 0 1.4210854715202e-14 0 0 0 0 511M 1.5G 11:24:11 75 0 0 0 0 0 1.4210854715202e-14 0 0 511M 1.5G 11:24:14 77 0 0 0 0 0 0 0 1.4210854715202e-14 511M 1.5G would be nice, if the highlighted values are also "human" readable. thank you Christian> For posterity, I''d like to point out the following: > > neel''s original arcstat.pl uses a crude scaling routine that results in a large loss of precision as numbers cross from Kilobytes to Megabytes to Gigabytes. The 1G reported arc size case described here, could actually be anywhere between 1,000,000MB and 1,999,999MB. Use ''kstat zfs::arcstats'' to read the arc size directly from the kstats (for comparison). > > I''ve updated arcstat.pl with a better scaling routine that returns more appropriate results (similar to df -h human-readable output). I''ve also added support for L2ARC stats. The updated version can be found here: > > http://github.com/mharsch/arcstat-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/zfs-discuss/attachments/20101001/8d6b34dd/attachment.html>
przemolicc at poczta.fm
2010-Oct-01 10:18 UTC
[zfs-discuss] making sense of arcstat.pl output
Hello, I have the follwowing message: # ./arcstat.pl 1 time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c Use of uninitialized value in division (/) at ./arcstat.pl line 262. Use of uninitialized value in division (/) at ./arcstat.pl line 263. Use of uninitialized value in division (/) at ./arcstat.pl line 268. 12:17:34 6.0G 492M 8 220M 4 271M 31 402M 9 1.8G 1.8G Use of uninitialized value in division (/) at ./arcstat.pl line 262. Use of uninitialized value in division (/) at ./arcstat.pl line 263. Use of uninitialized value in division (/) at ./arcstat.pl line 268. 12:17:35 40 3 7 3 7 0 0 1 5 1.8G 1.8G Use of uninitialized value in division (/) at ./arcstat.pl line 262. Use of uninitialized value in division (/) at ./arcstat.pl line 263. Use of uninitialized value in division (/) at ./arcstat.pl line 268. 12:17:36 7 1 14 1 14 0 0 1 33 1.8G 1.8G Regards Przemek On Fri, Oct 01, 2010 at 11:29:34AM +0200, Christian Meier wrote:> Hello Mike, > thank you for your update. > > root at S0011 # ./arcstat.pl 3 > time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c > 11:23:38 197K 7.8K 3 5.7K 3 2.1K 4 6.1K 5 511M 1.5G > 11:23:41 70 0 0 0 0 0 0 0 0 511M 1.5G > 11:23:44 76 0 0 0 0 0 0 0 0 511M 1.5G > 11:23:47 76 0 0 0 0 0 0 0 *1.4210854715202e-14* > 511M 1.5G > 11:23:50 71 0 0 0 0 0 0 0 0 511M 1.5G > 11:23:53 74 0 0 0 0 0 0 0 *1.4210854715202e-14 * > 511M 1.5G > 11:23:56 74 0 0 0 0 0 0 0 0 511M 1.5G > 11:23:59 79 0 0 0 0 0 0 0 0 511M 1.5G > 11:24:02 76 0 0 0 0 0 0 0 0 511M 1.5G > 11:24:05 74 0 0 0 0 0 0 0 *1.4210854715202e-14* > 511M 1.5G > 11:24:08 93 0 1.4210854715202e-14 0 1.4210854715202e-14 0 > 0 0 0 511M 1.5G > 11:24:11 75 0 0 0 0 0 1.4210854715202e-14 0 0 > 511M 1.5G > 11:24:14 77 0 0 0 0 0 0 0 1.4210854715202e-14 > 511M 1.5G > > would be nice, if the highlighted values are also "human" readable. > > thank you > Christian > > For posterity, I''d like to point out the following: > > > > neel''s original arcstat.pl uses a crude scaling routine that results in a large loss of precision as numbers cross from Kilobytes to Megabytes to Gigabytes. The 1G reported arc size case described here, could actually be anywhere between 1,000,000MB and 1,999,999MB. Use ''kstat zfs::arcstats'' to read the arc size directly from the kstats (for comparison). > > > > I''ve updated arcstat.pl with a better scaling routine that returns more appropriate results (similar to df -h human-readable output). I''ve also added support for L2ARC stats. The updated version can be found here: > > > > http://github.com/mharsch/arcstat >> _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discussRegards Przemyslaw Bak (przemol) -- http://przemol.blogspot.com/ ---------------------------------------------------------------------- Duzi chlopcy lubia wampiry i krew. http://linkint.pl/f2718
Hello Christian, Thanks for bringing this to my attention. I believe I''ve fixed the rounding error in the latest version. http://github.com/mharsch/arcstat -- This message posted from opensolaris.org
przemol, Thanks for the feedback. I had incorrectly assumed that any machine running the script would have L2ARC implemented (which is not the case with Solaris 10). I''ve added a check for this that allows the script to work on non-L2ARC machines as long as you don''t specify L2ARC stats on the command line. http://github.com/mharsch/arcstat -- This message posted from opensolaris.org