For those who didn''t follow down the thread this afternoon, I have posted a tool call zilstat which will help you to answer the question of whether a separate log might help your workload. Details start here: http://richardelling.blogspot.com/2009/01/zilstat.html Enjoy! -- richard
I''m already using it. This could be really useful for my Windows roaming-profile application of ZFS/NFS/SMB On Fri, Jan 30, 2009 at 9:35 PM, Richard Elling <richard.elling at gmail.com> wrote:> For those who didn''t follow down the thread this afternoon, > I have posted a tool call zilstat which will help you to answer > the question of whether a separate log might help your > workload. Details start here: > http://richardelling.blogspot.com/2009/01/zilstat.html > > Enjoy! > -- richard > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >
The zilstat tool is very helpful, thanks! I tried it on an X4500 NFS server, while extracting a 14MB tar archive, both via an NFS client, and locally on the X4500 itself. Over NFS, said extract took ~2 minutes, and showed peaks of 4MB/sec buffer-bytes going through the ZIL. When run locally on the X4500, the extract took about 1 second, with zilstat showing all zeroes. I wonder if this is a case where that ZIL bypass kicks in for >32K writes, in the local tar extraction. Does zilstat''s underlying dtrace include these bypass-writes in the totals it displays? I think if it''s possible to get stats on this bypassed data, I''d like to see it as another column (or set of columns) in the zilstat output. Regards, Marion
Interesting, but what does it mean :) The x4500 for mail (NFS vers=3 on ufs on zpool with quotas): # ./zilstat.ksh N-Bytes N-Bytes/s N-Max-Bytes/s B-Bytes B-Bytes/s B-Max-Bytes/s 376720 376720 376720 1286144 1286144 1286144 419608 419608 419608 1368064 1368064 1368064 555256 555256 555256 1732608 1732608 1732608 538808 538808 538808 1679360 1679360 1679360 626048 626048 626048 1773568 1773568 1773568 753824 753824 753824 2105344 2105344 2105344 652632 652632 652632 1716224 1716224 1716224 Fairly constantly between 1-2MB/s. That doesn''t sound too bad though. It''s only got 400 nfsd threads at the moment, but peaks at 1024. Incidentally, what is the highest recommended nfsd_threads for a x4500 anyway? Lund Marion Hakanson wrote:> The zilstat tool is very helpful, thanks! > > I tried it on an X4500 NFS server, while extracting a 14MB tar archive, > both via an NFS client, and locally on the X4500 itself. Over NFS, > said extract took ~2 minutes, and showed peaks of 4MB/sec buffer-bytes > going through the ZIL. > > When run locally on the X4500, the extract took about 1 second, with > zilstat showing all zeroes. I wonder if this is a case where that > ZIL bypass kicks in for >32K writes, in the local tar extraction. > Does zilstat''s underlying dtrace include these bypass-writes in the > totals it displays? > > I think if it''s possible to get stats on this bypassed data, I''d like > to see it as another column (or set of columns) in the zilstat output. > > Regards, > > Marion > > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >-- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Jorgen Lundman wrote:> Interesting, but what does it mean :) > > > The x4500 for mail (NFS vers=3 on ufs on zpool with quotas): > > # ./zilstat.ksh > N-Bytes N-Bytes/s N-Max-Bytes/s B-Bytes B-Bytes/s B-Max-Bytes/s > 376720 376720 376720 1286144 1286144 1286144 > 419608 419608 419608 1368064 1368064 1368064 > 555256 555256 555256 1732608 1732608 1732608 > 538808 538808 538808 1679360 1679360 1679360 > 626048 626048 626048 1773568 1773568 1773568 > 753824 753824 753824 2105344 2105344 2105344 > 652632 652632 652632 1716224 1716224 1716224 > > Fairly constantly between 1-2MB/s. That doesn''t sound too bad though. >I think your workload would benefit from a fast, separate log device.> It''s only got 400 nfsd threads at the moment, but peaks at 1024. > Incidentally, what is the highest recommended nfsd_threads for a x4500 > anyway? >Highest recommended is what you need to get the job done. For the most part, the defaults work well. But you can experiment with them and see if you can get better results. I''ve got some ideas about how to implement some more features for zilstat, but might not be able to get to it over the next few days. So there still time to accept recommendations :-) -- richard
Marion Hakanson wrote:> The zilstat tool is very helpful, thanks! > > I tried it on an X4500 NFS server, while extracting a 14MB tar archive, > both via an NFS client, and locally on the X4500 itself. Over NFS, > said extract took ~2 minutes, and showed peaks of 4MB/sec buffer-bytes > going through the ZIL. > > When run locally on the X4500, the extract took about 1 second, with > zilstat showing all zeroes. I wonder if this is a case where that > ZIL bypass kicks in for >32K writes, in the local tar extraction. > Does zilstat''s underlying dtrace include these bypass-writes in the > totals it displays? >This is what I would expect. What you are seeing is the affect of the NFS protocol and how the server commits data to disk on behalf of the client -- by using sync writes.> I think if it''s possible to get stats on this bypassed data, I''d like > to see it as another column (or set of columns) in the zilstat output. >Yes. I''ve got a few more columns in mind, too. Does anyone still use a VT100? :-) -- richard
Richard Elling wrote:>> >> # ./zilstat.ksh >> N-Bytes N-Bytes/s N-Max-Bytes/s B-Bytes B-Bytes/s B-Max-Bytes/s >> 376720 376720 376720 1286144 1286144 1286144 >> 419608 419608 419608 1368064 1368064 1368064 >> 555256 555256 555256 1732608 1732608 1732608 >> 538808 538808 538808 1679360 1679360 1679360 >> 626048 626048 626048 1773568 1773568 1773568 >> 753824 753824 753824 2105344 2105344 2105344 >> 652632 652632 652632 1716224 1716224 1716224 >> >> Fairly constantly between 1-2MB/s. That doesn''t sound too bad though. > > I think your workload would benefit from a fast, separate log device.Interesting. Today is the first I''ve heard about it.. one of the x4500 is really really slow, something like 15 seconds to do an unlink. But I assumed it was because the ufs inside zvol is _really_ bloated. Maybe we need to experiment with it on the test x4500.> > Highest recommended is what you need to get the job done. > For the most part, the defaults work well. But you can experiment > with them and see if you can get better results.It came shipped with 16. And I''m sorry but 16 didn''t cut it at all :) We set it at 1024 as it was the highest number I found via Google. Lund -- Jorgen Lundman | <lundman at lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)
Hi Richard, Richard Elling schrieb:> > Yes. I''ve got a few more columns in mind, too. Does anyone still use > a VT100? :-)Only when using ILOM ;) (anyone using 72 char/line MUA, sorry to them, the following lines are longer): Thanks for the great tool, it showed something very interesting yesterday: s06: TIME N-MBytes N-MBytes/s N-Max-Rate B-MBytes B-MBytes/s B-Max-Rate s06: 2009 Feb 4 14:37:11 5 0 0 10 0 1 s06: 2009 Feb 4 14:37:26 6 0 1 12 0 1 s06: 2009 Feb 4 14:37:41 4 0 0 10 0 1 s06: 2009 Feb 4 14:37:56 5 0 1 11 0 1 s06: 2009 Feb 4 14:38:11 6 0 1 11 0 2 s06: 2009 Feb 4 14:38:26 7 0 1 13 0 2 s06: 2009 Feb 4 14:38:41 10 0 2 17 1 3 s06: 2009 Feb 4 14:38:56 4 0 0 9 0 1 s06: 2009 Feb 4 14:39:11 5 0 1 11 0 1 s06: 2009 Feb 4 14:39:26 7 0 0 13 0 1 s06: 2009 Feb 4 14:39:41 7 0 2 13 0 3 s06: 2009 Feb 4 14:39:56 6 0 1 11 0 2 s06: 2009 Feb 4 14:40:11 6 0 1 12 0 1 s06: 2009 Feb 4 14:40:26 6 0 0 13 0 1 s06: 2009 Feb 4 14:40:41 5 0 0 10 0 1 s06: 2009 Feb 4 14:40:56 6 0 1 12 0 1 s06: 2009 Feb 4 14:41:11 4 0 0 9 0 1 [..] so far, the box was almost idle, a little bit later: s06: 2009 Feb 4 14:53:41 2 0 0 5 0 0 s06: 2009 Feb 4 14:53:56 1 0 0 3 0 0 s06: 2009 Feb 4 14:54:11 1 0 0 4 0 0 s06: 2009 Feb 4 14:54:26 1 0 0 3 0 0 s06: 2009 Feb 4 14:54:41 2 0 0 5 0 0 s06: 2009 Feb 4 14:54:56 604 40 171 702 46 198 s06: 2009 Feb 4 14:55:11 816 54 130 939 62 154 s06: 2009 Feb 4 14:55:26 2 0 0 4 0 0 s06: 2009 Feb 4 14:55:41 2 0 0 4 0 0 s06: 2009 Feb 4 14:55:56 1 0 0 3 0 0 s06: 2009 Feb 4 14:56:11 3 0 0 6 0 1 s06: 2009 Feb 4 14:56:26 1 0 0 3 0 0 [...] s06: 2009 Feb 4 16:13:11 1 0 0 3 0 0 s06: 2009 Feb 4 16:13:26 2 0 0 5 0 0 s06: 2009 Feb 4 16:13:41 389 25 97 477 31 119 s06: 2009 Feb 4 16:13:56 505 33 193 599 39 218 s06: 2009 Feb 4 16:14:11 2 0 0 4 0 0 s06: 2009 Feb 4 16:14:26 3 0 0 5 0 1 s06: 2009 Feb 4 16:14:41 1 0 0 3 0 0 s06: 2009 Feb 4 16:14:56 2 0 0 6 0 1 s06: 2009 Feb 4 16:15:11 4 0 2 10 0 4 s06: 2009 Feb 4 16:15:26 0 0 0 1 0 0 s06: 2009 Feb 4 16:15:41 128 8 94 168 11 123 s06: 2009 Feb 4 16:15:56 1081 72 212 1305 87 279 s06: 2009 Feb 4 16:16:11 262 17 99 317 21 122 s06: 2009 Feb 4 16:16:26 0 0 0 0 0 0 just showing a few bursts... Given that this is the output of ''zilstat.ksh -M -t 15'' I guess we should really look into a fast device for it, right? Do you have any hint, which numbers are reasonable on a X4500 and which are approaching serious problems? Cheers Carsten