I have been running a pair of X4540''s for almost 2 years now, the usual spec (Quad core, 64GB RAM, 48x 1TB). I have a pair of mirrored drives for rpool, and a Raidz set with 5-6 disks in each vdev for the rest of the disks. I am running snv_132 on both systems. I noticed an oddity on one particular system, that when running a scrub, or a zfs list -t snapshot, the results take forever. Mind you, these are identical systems in hardware, and software. The primary system replicates all data sets to the secondary nightly, so there isn''t much of a discrepancy of space used. Primary system: # time zfs list -t snapshot | wc -l 979 real 1m23.995s user 0m0.360s sys 0m4.911s Secondary system: # time zfs list -t snapshot | wc -l 979 real 0m1.534s user 0m0.223s sys 0m0.663s At the time of running both of those, no other activity was happening, load average of .05 or so. Subsequent runs also take just as long on the primary, no matter how many times I run it, it will take about 1 minute and 25 seconds each time, very little drift (+- 1 second if that) Both systems are at about 77% used space on the storage pool, no other distinguishing factors that I can discern. Upon a reboot, performance is respectable for a little while, but within days, it will sink back to those levels. I suspect a memory leak, but both systems run the same software versions and packages, so I can''t envision that. Would anyone have any ideas what may cause this? -- Brent Jones brent at servuhome.net
Giovanni Tirloni
2010-Jul-13 19:03 UTC
[zfs-discuss] [osol-help] ZFS list snapshots incurs large delay
On Tue, Jul 13, 2010 at 2:44 PM, Brent Jones <brent at servuhome.net> wrote:> I have been running a pair of X4540''s for almost 2 years now, the > usual spec (Quad core, 64GB RAM, 48x 1TB). > I have a pair of mirrored drives for rpool, and a Raidz set with 5-6 > disks in each vdev for the rest of the disks. > I am running snv_132 on both systems. > > I noticed an oddity on one particular system, that when running a > scrub, or a zfs list -t snapshot, the results take forever. > Mind you, these are identical systems in hardware, and software. The > primary system replicates all data sets to the secondary nightly, so > there isn''t much of a discrepancy of space used. > > Primary system: > # time zfs list -t snapshot | wc -l > 979 > > real ? ?1m23.995s > user ? ?0m0.360s > sys ? ? 0m4.911s > > Secondary system: > # time zfs list -t snapshot | wc -l > 979 > > real ? ?0m1.534s > user ? ?0m0.223s > sys ? ? 0m0.663s > > > At the time of running both of those, no other activity was happening, > load average of .05 or so. Subsequent runs also take just as long on > the primary, no matter how many times I run it, it will take about 1 > minute and 25 seconds each time, very little drift (+- 1 second if > that) > > Both systems are at about 77% used space on the storage pool, no other > distinguishing factors that I can discern. > Upon a reboot, performance is respectable for a little while, but > within days, it will sink back to those levels. I suspect a memory > leak, but both systems run the same software versions and packages, so > I can''t envision that. > > Would anyone have any ideas what may cause this?It could be a disk failing and dragging I/O down with it. Try to check for high asvc_t with `iostat -XCn 1` and errors in `iostat -En` Any timeouts or retries in /var/adm/messages ? -- Giovanni Tirloni gtirloni at sysdroid.com
Brent Jones
2010-Jul-13 19:15 UTC
[zfs-discuss] [osol-help] ZFS list snapshots incurs large delay
> > It could be a disk failing and dragging I/O down with it. > > Try to check for high asvc_t with `iostat -XCn 1` and errors in `iostat -En` > > Any timeouts or retries in /var/adm/messages ? > > -- > Giovanni Tirloni > gtirloni at sysdroid.com >I checked for high service times during a scrub, and all disks are pretty equal.During a scrub, each disks peaks about 350 reads/sec, with an asvc time of up to 30 during those read spikes (I assume it means 30ms, which isn''t terrible for a highly loaded SATA disk). No errors reported by smartctl, iostat, or adm/messages I opened a case on Sunsolve, but I fear since I am running a dev build that I will be out of luck. I cannot run 2009.06 due to CIFS segfaults, and problems with zfs send/recv hanging pools (well documented issues). I''d run Solaris proper, but not having in-kernel CIFS or COMSTAR would be a major setback for me. -- Brent Jones brent at servuhome.net