Kevin Maguire
2009-Jan-29 12:13 UTC
[zfs-discuss] strange performance drop of solaris 10/zfs
Hi We have been using a Solaris 10 system (Sun-Fire-V245) for a while as our primary file server. This is based on Solaris 10 06/06, plus patches up to approx May 2007. It is a production machine, and until about a week ago has had few problems. Attached to the V245 is a SCSI RAID array, which presents one LUN to the OS. On this lun is a zpool (tank), and within that 300+ zfs file systems (one per user for automounted home directories). The system is connected to our LAN via gigabit Ethernet,. most of our NFS clients have just 100FD network connection. In recent days performance of the file server seems to have gone off a cliff. I don''t know how to troubleshoot what might be wrong? Typical "zpool iostat 120" output is shown below. If I run "truss -D df" I see each call to statvfs64("/tank/bla) takes 2-3 seconds. The RAID itself is healthy, and all disks are reporting as OK. I have tried to establish if some client or clients are thrashing the server via nfslogd, but without seeing anything obvious. Is there some kind of per-zfs-filesystem iostat? End users are reporting just saving small files can take 5-30 seconds? prstat/top shows no process using significant CPU load. The system has 8GB of RAM, vmstat shows nothing interesting. I have another V245, with the same SCSI/RAID/zfs setup, and a similar (though a bit less) load of data and users where this problem is NOT apparent there? Suggestions? Kevin Thu Jan 29 11:32:29 CET 2009 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- tank 2.09T 640G 10 66 825K 1.89M tank 2.09T 640G 39 5 4.80M 126K tank 2.09T 640G 38 8 4.73M 191K tank 2.09T 640G 40 5 4.79M 126K tank 2.09T 640G 39 5 4.73M 170K tank 2.09T 640G 40 3 4.88M 43.8K tank 2.09T 640G 40 3 4.87M 54.7K tank 2.09T 640G 39 4 4.81M 111K tank 2.09T 640G 39 9 4.78M 134K tank 2.09T 640G 37 5 4.61M 313K tank 2.09T 640G 39 3 4.89M 32.8K tank 2.09T 640G 35 7 4.31M 629K tank 2.09T 640G 28 13 3.47M 1.43M tank 2.09T 640G 5 51 433K 4.27M tank 2.09T 640G 6 51 450K 4.23M tank 2.09T 639G 5 52 543K 4.23M tank 2.09T 640G 26 57 3.00M 1.15M tank 2.09T 640G 39 6 4.82M 107K tank 2.09T 640G 39 3 4.80M 119K tank 2.09T 640G 38 8 4.64M 295K tank 2.09T 640G 40 7 4.82M 102K tank 2.09T 640G 43 5 4.79M 103K tank 2.09T 640G 39 4 4.73M 193K tank 2.09T 640G 39 5 4.87M 62.1K tank 2.09T 640G 40 3 4.88M 49.3K tank 2.09T 640G 40 3 4.80M 122K tank 2.09T 640G 42 4 4.83M 82.0K tank 2.09T 640G 40 3 4.89M 42.0K ...
Mike Gerdts
2009-Jan-29 14:01 UTC
[zfs-discuss] strange performance drop of solaris 10/zfs
On Thu, Jan 29, 2009 at 6:13 AM, Kevin Maguire <k.c.f.maguire at gmail.com> wrote:> I have tried to establish if some client or clients are thrashing the > server via nfslogd, but without seeing anything obvious. Is there > some kind of per-zfs-filesystem iostat?The following should work in bash or ksh, so long as the list of zfs mount points does not overflow the maximum command line length. $ fsstat $(zfs list -H -o mountpoint | nawk ''$1 !~ /^(\/|-|legacy)$/'') 5 -- Mike Gerdts http://mgerdts.blogspot.com/
Kevin, Looking at the stats I think the tank pool is about 80% full. And at this point you are possibly hitting the bug : 6596237 - "Stop looking and start ganging Also, there is another ZIL related bug which worsens the case by fragmenting the space : 6683293 concurrent O_DSYNC writes to a fileset can be much improved over NFS You could compare the disk usage of the other machine that you have. Also, it would be useful to know what patch levels you are running. We do have IDRs for the bug#6596237 and the other bug has been fixed in the official patches. Hope that helps. Thanks and regards, Sanjeev. On Thu, Jan 29, 2009 at 01:13:29PM +0100, Kevin Maguire wrote:> Hi > > We have been using a Solaris 10 system (Sun-Fire-V245) for a while as > our primary file server. This is based on Solaris 10 06/06, plus > patches up to approx May 2007. It is a production machine, and until > about a week ago has had few problems. > > Attached to the V245 is a SCSI RAID array, which presents one LUN to > the OS. On this lun is a zpool (tank), and within that 300+ zfs file > systems (one per user for automounted home directories). The system is > connected to our LAN via gigabit Ethernet,. most of our NFS clients > have just 100FD network connection. > > In recent days performance of the file server seems to have gone off a > cliff. I don''t know how to troubleshoot what might be wrong? Typical > "zpool iostat 120" output is shown below. If I run "truss -D df" I see > each call to statvfs64("/tank/bla) takes 2-3 seconds. The RAID itself > is healthy, and all disks are reporting as OK. > > I have tried to establish if some client or clients are thrashing the > server via nfslogd, but without seeing anything obvious. Is there > some kind of per-zfs-filesystem iostat? > > End users are reporting just saving small files can take 5-30 seconds? > prstat/top shows no process using significant CPU load. The system > has 8GB of RAM, vmstat shows nothing interesting. > > I have another V245, with the same SCSI/RAID/zfs setup, and a similar > (though a bit less) load of data and users where this problem is NOT > apparent there? > > Suggestions? > Kevin > > Thu Jan 29 11:32:29 CET 2009 > capacity operations bandwidth > pool used avail read write read write > ---------- ----- ----- ----- ----- ----- ----- > tank 2.09T 640G 10 66 825K 1.89M > tank 2.09T 640G 39 5 4.80M 126K > tank 2.09T 640G 38 8 4.73M 191K > tank 2.09T 640G 40 5 4.79M 126K > tank 2.09T 640G 39 5 4.73M 170K > tank 2.09T 640G 40 3 4.88M 43.8K > tank 2.09T 640G 40 3 4.87M 54.7K > tank 2.09T 640G 39 4 4.81M 111K > tank 2.09T 640G 39 9 4.78M 134K > tank 2.09T 640G 37 5 4.61M 313K > tank 2.09T 640G 39 3 4.89M 32.8K > tank 2.09T 640G 35 7 4.31M 629K > tank 2.09T 640G 28 13 3.47M 1.43M > tank 2.09T 640G 5 51 433K 4.27M > tank 2.09T 640G 6 51 450K 4.23M > tank 2.09T 639G 5 52 543K 4.23M > tank 2.09T 640G 26 57 3.00M 1.15M > tank 2.09T 640G 39 6 4.82M 107K > tank 2.09T 640G 39 3 4.80M 119K > tank 2.09T 640G 38 8 4.64M 295K > tank 2.09T 640G 40 7 4.82M 102K > tank 2.09T 640G 43 5 4.79M 103K > tank 2.09T 640G 39 4 4.73M 193K > tank 2.09T 640G 39 5 4.87M 62.1K > tank 2.09T 640G 40 3 4.88M 49.3K > tank 2.09T 640G 40 3 4.80M 122K > tank 2.09T 640G 42 4 4.83M 82.0K > tank 2.09T 640G 40 3 4.89M 42.0K > ... > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
So....granted, tank is about 77% full (not to split hairs ;^), but in this case, 23% is 640GB of free space. I mean, it''s not like 15 years ago when a file system was 2GB total, and 23% free meant a measely 460MB to allocate from. 640GB is a lot of space, and our largest writes are less than 5MB. I would hope we''re not tripping over 6593293 yet. But I guess it''s worth looking at. We should be able to figure this out with a DTrace script. Two DTrace scripts from CR6495013 should help us determine if this is the case (not 6495013 was fixed in s10u4 and nv61). -------------------- metaslab.d -------------------------------------- #pragma D option quiet BEGIN { self->in_metaslab = 0; } fbt::metaslab_ff_alloc:entry /self->in_metaslab == 0/ { self->in_metaslab = 1; self->loopcount = 0; } fbt::avl_walk:entry /self->in_metaslab/ { self->loopcount++; } fbt::metaslab_ff_alloc:return /self->in_metaslab/ { self->in_metaslab = 0; @loops["Loops count"] = quantize(self->loopcount); self->loopcount = 0; } ----- metaslab_size.d ------------------------- #!/usr/sbin/dtrace -s #pragma D option quiet BEGIN { self->in_metaslab = 0; } fbt::metaslab_ff_alloc:entry /self->in_metaslab == 0/ { self->in_metaslab = 1; @sizes["metaslab sizes"] = quantize(arg1); } fbt::metaslab_ff_alloc:return /self->in_metaslab/ { self->in_metaslab = 0; } Sanjeev wrote:> Kevin, > > Looking at the stats I think the tank pool is about 80% full. > And at this point you are possibly hitting the bug : > 6596237 - "Stop looking and start ganging > > Also, there is another ZIL related bug which worsens the case > by fragmenting the space : > 6683293 concurrent O_DSYNC writes to a fileset can be much improved over NFS > > You could compare the disk usage of the other machine that you have. > > Also, it would be useful to know what patch levels you are running. > > We do have IDRs for the bug#6596237 and the other bug has been > fixed in the official patches. > > Hope that helps. > > Thanks and regards, > Sanjeev. > > On Thu, Jan 29, 2009 at 01:13:29PM +0100, Kevin Maguire wrote: > >> Hi >> >> We have been using a Solaris 10 system (Sun-Fire-V245) for a while as >> our primary file server. This is based on Solaris 10 06/06, plus >> patches up to approx May 2007. It is a production machine, and until >> about a week ago has had few problems. >> >> Attached to the V245 is a SCSI RAID array, which presents one LUN to >> the OS. On this lun is a zpool (tank), and within that 300+ zfs file >> systems (one per user for automounted home directories). The system is >> connected to our LAN via gigabit Ethernet,. most of our NFS clients >> have just 100FD network connection. >> >> In recent days performance of the file server seems to have gone off a >> cliff. I don''t know how to troubleshoot what might be wrong? Typical >> "zpool iostat 120" output is shown below. If I run "truss -D df" I see >> each call to statvfs64("/tank/bla) takes 2-3 seconds. The RAID itself >> is healthy, and all disks are reporting as OK. >> >> I have tried to establish if some client or clients are thrashing the >> server via nfslogd, but without seeing anything obvious. Is there >> some kind of per-zfs-filesystem iostat? >> >> End users are reporting just saving small files can take 5-30 seconds? >> prstat/top shows no process using significant CPU load. The system >> has 8GB of RAM, vmstat shows nothing interesting. >> >> I have another V245, with the same SCSI/RAID/zfs setup, and a similar >> (though a bit less) load of data and users where this problem is NOT >> apparent there? >> >> Suggestions? >> Kevin >> >> Thu Jan 29 11:32:29 CET 2009 >> capacity operations bandwidth >> pool used avail read write read write >> ---------- ----- ----- ----- ----- ----- ----- >> tank 2.09T 640G 10 66 825K 1.89M >> tank 2.09T 640G 39 5 4.80M 126K >> tank 2.09T 640G 38 8 4.73M 191K >> tank 2.09T 640G 40 5 4.79M 126K >> tank 2.09T 640G 39 5 4.73M 170K >> tank 2.09T 640G 40 3 4.88M 43.8K >> tank 2.09T 640G 40 3 4.87M 54.7K >> tank 2.09T 640G 39 4 4.81M 111K >> tank 2.09T 640G 39 9 4.78M 134K >> tank 2.09T 640G 37 5 4.61M 313K >> tank 2.09T 640G 39 3 4.89M 32.8K >> tank 2.09T 640G 35 7 4.31M 629K >> tank 2.09T 640G 28 13 3.47M 1.43M >> tank 2.09T 640G 5 51 433K 4.27M >> tank 2.09T 640G 6 51 450K 4.23M >> tank 2.09T 639G 5 52 543K 4.23M >> tank 2.09T 640G 26 57 3.00M 1.15M >> tank 2.09T 640G 39 6 4.82M 107K >> tank 2.09T 640G 39 3 4.80M 119K >> tank 2.09T 640G 38 8 4.64M 295K >> tank 2.09T 640G 40 7 4.82M 102K >> tank 2.09T 640G 43 5 4.79M 103K >> tank 2.09T 640G 39 4 4.73M 193K >> tank 2.09T 640G 39 5 4.87M 62.1K >> tank 2.09T 640G 40 3 4.88M 49.3K >> tank 2.09T 640G 40 3 4.80M 122K >> tank 2.09T 640G 42 4 4.83M 82.0K >> tank 2.09T 640G 40 3 4.89M 42.0K >> ... >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss at opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >> > _______________________________________________ > zfs-discuss mailing list > zfs-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >