I''m seeing some odd behaviour with ZFS and a reasonably heavy workload.
I''m currently on contract to BBC R&D to build what is effectively a
network-based personal video recorder. To that end, I have a rather large
collection of discs, arranged very poorly as it''s something of a hack
at
present, and a T1000 capturing the data.
The data is about 20 separate streams of multicast content (anyone in the
UK multicast peering with the BBC should be able to pick it up; mail me
for details if you''re interested) ranging from a couple of hundred Kb/s
for the radio stations, via 6Mb/s or so for the standard definition
channels, up to 17Mb/s for the HD stuff. This little lot totals about
75Mb/s into the machine, which equates to about 9MB/s to the media. Not
exactly a lot.
We''re using Sun''s Solaris 10 rather than OpenSolaris for
historical
reasons. Specifically:
SunOS cr0.kw.bbc.co.uk 5.10 Generic_118833-17 sun4v sparc SUNW,Sun-Fire-T1000
My problem is this: with 20 processes recording data to a single pool,
everything is fine. Copying that data off again, via a separate network
interface connected to a separate switch (so we can rule out network
hardware being the problem), it appears that the read operations cripple
the writes, causing the inbound buffers to fill and packets be dropped.
This makes for unwatchable telly.
The data being compressed video, there isn''t much point in attempting
to
compress it further at the filesystem level, so we''re not.
My pool is badly configured, granted, in the following configuration.
This is by no means final, and is something of an error. It''s like
this
as we needed to get more drives onto the thing (it''s a staging post;
the
fileservers are elsewhere) as we hadn''t got the content stores up and
running yet.
bash-3.00# zpool status -v
pool: content
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
content ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c1t0d1 ONLINE 0 0 0
c1t0d2 ONLINE 0 0 0
c1t0d3 ONLINE 0 0 0
raidz ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t0d1 ONLINE 0 0 0
c2t0d2 ONLINE 0 0 0
c2t0d3 ONLINE 0 0 0
c2t0d4 ONLINE 0 0 0
c2t0d5 ONLINE 0 0 0
c2t0d6 ONLINE 0 0 0
c2t0d7 ONLINE 0 0 0
raidz ONLINE 0 0 0
c2t0d8 ONLINE 0 0 0
c2t0d9 ONLINE 0 0 0
c2t0d10 ONLINE 0 0 0
c2t0d11 ONLINE 0 0 0
c2t0d12 ONLINE 0 0 0
c2t0d13 ONLINE 0 0 0
c2t0d14 ONLINE 0 0 0
c2t0d15 ONLINE 0 0 0
errors: No known data errors
bash-3.00#
where each of the drives is a 500GB SATA disc in a SATA <-> SCSI RAID
device set to JBOD. iostat -x:
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
md0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
md1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
md10 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
md11 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
sd2 0.0 25.6 0.0 1052.7 0.0 0.1 4.7 0 3
sd3 0.2 24.4 12.8 923.7 0.0 0.1 4.9 0 3
sd4 0.0 13.0 0.0 880.2 0.0 0.1 8.3 0 3
sd17 0.8 10.4 51.1 1303.1 0.0 0.1 10.7 0 4
sd30 79.6 6.8 2669.4 103.4 0.0 2.0 23.7 0 32
sd34 90.4 6.8 2564.3 103.0 0.0 1.9 19.8 0 30
sd36 77.1 6.8 2675.7 103.8 0.0 1.9 23.0 0 30
sd39 96.0 6.8 2632.2 103.0 0.0 2.1 20.3 0 31
sd42 76.9 6.8 2666.9 103.4 0.0 2.0 23.5 0 31
sd46 95.6 6.8 2608.7 102.6 0.0 2.0 19.8 0 30
sd48 77.7 6.8 2648.5 103.4 0.0 2.0 23.6 0 31
sd51 94.0 6.8 2588.9 102.6 0.0 1.9 19.0 0 30
sd53 0.0 4.0 0.0 143.0 0.0 0.1 13.7 0 2
sd56 0.0 4.2 0.0 142.0 0.0 0.1 15.4 0 2
sd58 0.0 4.0 0.0 143.0 0.0 0.1 13.4 0 2
sd60 0.0 4.2 0.0 142.1 0.0 0.1 14.8 0 2
sd62 0.0 4.0 0.0 143.1 0.0 0.1 12.7 0 2
sd65 0.0 4.2 0.0 142.1 0.0 0.1 14.4 0 2
sd67 0.0 3.8 0.0 143.0 0.0 0.1 14.3 0 2
sd70 0.0 4.2 0.0 142.0 0.0 0.1 15.2 0 2
Each recording process takes about 0.4% CPU according to prstat. This is
an 8-core T1000. The reading process (Perl) chews 3.5% or so, which I
make to be one thread plus a bit that can be parallelised automatically.
The reader appears to be CPU-bound, which also concerns me; when we unbind
it, I''m expecting this problem to get worse.
My questions are:
* Is this what I should expect?
* Why? I''d''ve thought the extensive caching the filesystem
does to
sort this out for me?
* Is there any way around it that doesn''t involve editing the code?
Thankyou for your time.
--
Dickon Hood
Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible. We apologise for the
inconvenience in the meantime.
No virus was found in this outgoing message as I didn''t bother looking.