thr3ads.net - zfs discuss - [zfs-discuss] Odd behaviour with heavy workloads. [Jun 2007]

If this information is useful, please help other people find it:
Share via:
Dickon Hood
2007-Jun-25 18:11 UTC
[zfs-discuss] Odd behaviour with heavy workloads.

I''m seeing some odd behaviour with ZFS and a reasonably heavy workload.
I''m currently on contract to BBC R&D to build what is effectively a
network-based personal video recorder.  To that end, I have a rather large
collection of discs, arranged very poorly as it''s something of a hack
at
present, and a T1000 capturing the data.

The data is about 20 separate streams of multicast content (anyone in the
UK multicast peering with the BBC should be able to pick it up; mail me
for details if you''re interested) ranging from a couple of hundred Kb/s
for the radio stations, via 6Mb/s or so for the standard definition
channels, up to 17Mb/s for the HD stuff.  This little lot totals about
75Mb/s into the machine, which equates to about 9MB/s to the media.  Not
exactly a lot.

We''re using Sun''s Solaris 10 rather than OpenSolaris for
historical
reasons.  Specifically:

SunOS cr0.kw.bbc.co.uk 5.10 Generic_118833-17 sun4v sparc SUNW,Sun-Fire-T1000

My problem is this: with 20 processes recording data to a single pool,
everything is fine.  Copying that data off again, via a separate network
interface connected to a separate switch (so we can rule out network
hardware being the problem), it appears that the read operations cripple
the writes, causing the inbound buffers to fill and packets be dropped.
This makes for unwatchable telly.

The data being compressed video, there isn''t much point in attempting
to
compress it further at the filesystem level, so we''re not.

My pool is badly configured, granted, in the following configuration.
This is by no means final, and is something of an error.  It''s like
this
as we needed to get more drives onto the thing (it''s a staging post;
the
fileservers are elsewhere) as we hadn''t got the content stores up and
running yet.

bash-3.00# zpool status -v
  pool: content
 state: ONLINE
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        content      ONLINE       0     0     0
          c1t0d0     ONLINE       0     0     0
          c1t0d1     ONLINE       0     0     0
          c1t0d2     ONLINE       0     0     0
          c1t0d3     ONLINE       0     0     0
          raidz      ONLINE       0     0     0
            c2t0d0   ONLINE       0     0     0
            c2t0d1   ONLINE       0     0     0
            c2t0d2   ONLINE       0     0     0
            c2t0d3   ONLINE       0     0     0
            c2t0d4   ONLINE       0     0     0
            c2t0d5   ONLINE       0     0     0
            c2t0d6   ONLINE       0     0     0
            c2t0d7   ONLINE       0     0     0
          raidz      ONLINE       0     0     0
            c2t0d8   ONLINE       0     0     0
            c2t0d9   ONLINE       0     0     0
            c2t0d10  ONLINE       0     0     0
            c2t0d11  ONLINE       0     0     0
            c2t0d12  ONLINE       0     0     0
            c2t0d13  ONLINE       0     0     0
            c2t0d14  ONLINE       0     0     0
            c2t0d15  ONLINE       0     0     0

errors: No known data errors
bash-3.00# 

where each of the drives is a 500GB SATA disc in a SATA <-> SCSI RAID
device set to JBOD.  iostat -x:

                  extended device statistics                   
device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
md0          0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
md1          0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
md10         0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
md11         0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
sd1          0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
sd2          0.0   25.6    0.0 1052.7  0.0  0.1    4.7   0   3 
sd3          0.2   24.4   12.8  923.7  0.0  0.1    4.9   0   3 
sd4          0.0   13.0    0.0  880.2  0.0  0.1    8.3   0   3 
sd17         0.8   10.4   51.1 1303.1  0.0  0.1   10.7   0   4 
sd30        79.6    6.8 2669.4  103.4  0.0  2.0   23.7   0  32 
sd34        90.4    6.8 2564.3  103.0  0.0  1.9   19.8   0  30 
sd36        77.1    6.8 2675.7  103.8  0.0  1.9   23.0   0  30 
sd39        96.0    6.8 2632.2  103.0  0.0  2.1   20.3   0  31 
sd42        76.9    6.8 2666.9  103.4  0.0  2.0   23.5   0  31 
sd46        95.6    6.8 2608.7  102.6  0.0  2.0   19.8   0  30 
sd48        77.7    6.8 2648.5  103.4  0.0  2.0   23.6   0  31 
sd51        94.0    6.8 2588.9  102.6  0.0  1.9   19.0   0  30 
sd53         0.0    4.0    0.0  143.0  0.0  0.1   13.7   0   2 
sd56         0.0    4.2    0.0  142.0  0.0  0.1   15.4   0   2 
sd58         0.0    4.0    0.0  143.0  0.0  0.1   13.4   0   2 
sd60         0.0    4.2    0.0  142.1  0.0  0.1   14.8   0   2 
sd62         0.0    4.0    0.0  143.1  0.0  0.1   12.7   0   2 
sd65         0.0    4.2    0.0  142.1  0.0  0.1   14.4   0   2 
sd67         0.0    3.8    0.0  143.0  0.0  0.1   14.3   0   2 
sd70         0.0    4.2    0.0  142.0  0.0  0.1   15.2   0   2 

Each recording process takes about 0.4% CPU according to prstat.  This is
an 8-core T1000.  The reading process (Perl) chews 3.5% or so, which I
make to be one thread plus a bit that can be parallelised automatically.
The reader appears to be CPU-bound, which also concerns me; when we unbind
it, I''m expecting this problem to get worse.

My questions are:

  *  Is this what I should expect?

  *  Why?  I''d''ve thought the extensive caching the filesystem
does to
     sort this out for me?

  *  Is there any way around it that doesn''t involve editing the code?

Thankyou for your time.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn''t bother looking.
zfs discuss - Jun 2007 - Odd behaviour with heavy workloads.

[zfs-discuss] Odd behaviour with heavy workloads.