Hello everyone,
looking closer and comparing our servers, we infact have 2 servers
that behave differently, but they also have a different workload (mail
instead of file storage). They have identical hardware, but were
installed at a later time, compared to the servers which do have the issue.
They have a stable l2arc cache size, not like the one i described
previously, where the l2arc size is bigger than the dataset in the pool.
Nonleaking server:
L2 ARC Summary: (HEALTHY)
Passed Headroom: 63.05m
Tried Lock Failures: 198.57m
IO In Progress: 53.57k
Low Memory Aborts: 32
Free on Write: 21.40k
Writes While Full: 16.50k
R/W Clashes: 3.50k
Bad Checksums: 0
IO Errors: 0
SPA Mismatch: 613.80m
L2 ARC Size: (Adaptive) 443.42 GiB
Header Size: 0.27% 1.22 GiB
L2 ARC Evicts:
Lock Retries: 1.33k
Upon Reading: 0
L2 ARC Breakdown: 191.36m
Hit Ratio: 28.27% 54.09m
Miss Ratio: 71.73% 137.27m
Feeds: 1.68m
L2 ARC Buffer:
Bytes Scanned: 4.28 PiB
Buffer Iterations: 1.68m
List Iterations: 107.55m
NULL List Iterations: 1.16m
L2 ARC Writes:
Writes Sent: 100.00% 915.04k
No bad checksums or IO Errors. The l2arc size of 443gb is sensible
compared to it's actual size (373gb).
Yet aside of the workload I cannot find any difference between the
2 "mail" fileservers and the other 6+ "web" fileservers
doing storage
for websites. They're identical in every regard, aside of the workload
and the time of installation. The mail fileservers were installed much
later in comparison.
I just wanted to add this, as it maybe very relevant.
With kind regards,
Daniel
On 08/11/2015 04:42 PM, Daniel Genis wrote:> Dear FreeBSD community,
>
> We're facing a somewhat odd issue, perhaps similar to what is discussed
> here: https://forums.freebsd.org/threads/l2arc-degraded.47540/
>
> The issue is that the L2ARC header seems to grow without limit, similar
> to a memory leak, pressuring more and more memory over time out of the ARC.
>
> For example, the output of "zpool iostat -v 1"
>
> capacity operations bandwidth
> pool alloc free read write read write
> ------------ ----- ----- ----- ----- ----- -----
> syspool 1.15G 275G 0 0 0 0
> mirror 1.15G 275G 0 0 0 0
> gpt/zfs0 - - 0 0 0 0
> gpt/zfs1 - - 0 0 0 0
> ------------ ----- ----- ----- ----- ----- -----
> tank 1.21T 1.51T 229 1.99K 3.67M 9.48M
> mirror 124G 154G 67 125 787K 503K
> da0 - - 20 27 440K 503K
> da1 - - 45 28 379K 503K
> [...]
> mirror 124G 154G 34 164 454K 612K
> da18 - - 26 12 417K 612K
> da19 - - 6 13 58.8K 612K
> logs - - - - - -
> mirror 117M 74.4G 0 109 0 1.75M
> da21 - - 0 109 0 1.75M
> da22 - - 0 109 0 1.75M
> cache - - - - - -
> da23 1.67T 16.0E 302 7 2.85M 223K
> ------------ ----- ----- ----- ----- ----- -----
>
>
> Here the cache shows 1.67T, in use and 16.0E free.
> The cache is a 373GB Intel SSD.
>
> # diskinfo -v da23
> da23
> 512 # sectorsize
> 400088457216 # mediasize in bytes (373G)
> 781422768 # mediasize in sectors
> 4096 # stripesize
> 0 # stripeoffset
> 48641 # Cylinders according to firmware.
> 255 # Heads according to firmware.
> 63 # Sectors according to firmware.
> BTTV4234089C400HGN # Disk ident.
> id1,enc at n500e004aaaaaaa3e/type at 0/slot at 18 # Physical path
>
>
>
> The L2ARC stats section from "zfs-stats -a":
>
> L2 ARC Summary: (DEGRADED)
> Passed Headroom: 133.33m
> Tried Lock Failures: 4.90b
> IO In Progress: 313.63k
> Low Memory Aborts: 1.52k
> Free on Write: 589.79k
> Writes While Full: 34.57k
> R/W Clashes: 46.95k
> Bad Checksums: 408.40m
> IO Errors: 151.99m
> SPA Mismatch: 632.00m
>
> L2 ARC Size: (Adaptive) 1.89 TiB
> Header Size: 0.88% 16.98 GiB
>
> L2 ARC Evicts:
> Lock Retries: 1.27k
> Upon Reading: 2
>
> L2 ARC Breakdown: 2.10b
> Hit Ratio: 32.89% 691.15m
> Miss Ratio: 67.11% 1.41b
> Feeds: 3.70m
>
> L2 ARC Buffer:
> Bytes Scanned: 10.70 PiB
> Buffer Iterations: 3.70m
> List Iterations: 236.30m
> NULL List Iterations: 24.86m
>
> L2 ARC Writes:
> Writes Sent: 100.00% 3.38m
>
>
> Here we can see that currently the Header Size is almost 17gb.
> This header size grows continuously without (apparent) limit.
> Also zfs appears to think it's holding 1.89 TiB inside the L2ARC, which
> seems very very unlikely.
>
> # freebsd-version
> 10.1-RELEASE-p13
>
> # uname -a
> FreeBSD servername 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May
> 13 06:54:13 UTC 2015
> root at amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64
>
> # uptime
> 4:35PM up 42 days, 15:24, 1 user, load averages: 1.35, 0.96, 0.84
>
>
> Does anyone know how we can alleviate the issue?
> We originally thought the issue was caused by
> https://www.freebsd.org/security/advisories/FreeBSD-EN-15:07.zfs.asc
>
> We have updated our Servers since but the header size seems to keep
> growing still. For reference, we have multiple bsd fileservers which are
> used mostly over NFS, all with identical configuration (but varying
> workload). They all still show these symptoms.
>
> Any tips/hints/pointers are appreciated!
>
> With kind regards,
>
> Daniel
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>
--
Met vriendelijke groeten,
Daniel Genis
Medewerker Techniek
Byte Internet
W http://www.byte.nl/
E daniel at byte.nl
T 020 521 6226
F 020 521 6227