Hi,
(I'm not on this list, please CC me in future replies.
My apologies to those who get this message twice, I had a typo in the
To: header, and have to re-send it to the list.)
sorry for reviving such an old thread, but I've run into this problem
lately as well, on a 11.1-RELEASE-p1 jailhost mounting NFSv4.0 mounts
into jails.
On 24.09.2015 23:17, Rick Macklem wrote:> Frank de Bot wrote:
>> Rick Macklem wrote:
>>> Frank de Bot wrote:
>>>> Rick Macklem wrote:
>>>>> Frank de Bot wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On a 10.1-RELEASE-p9 server I have several NFS mounts
used for a
>>>>>> jail.
>>>>>> Because it's a server only to test, there is a low
load. But the
>>>>>> [nfscl]
>>>>>> process is hogging a CPU after a while. This happens
pretty fast,
>>>>>> within
>>>>>> 1 or 2 days. I'm noticing the high CPU of the
process when I want to
>>>>>> do
>>>>>> some test after a little while (those 1 or 2 days).
Here's my ps ax | grep nfscl:
# ps ax | grep nfscl
11111 - DL 932:08.74 [nfscl]
11572 - DL 442:27.42 [nfscl]
30396 - DL 933:44.13 [nfscl]
35902 - DL 442:08.70 [nfscl]
40881 - DL 938:56.04 [nfscl]
43276 - DL 932:38.88 [nfscl]
49178 - DL 934:24.77 [nfscl]
56314 - DL 935:21.55 [nfscl]
60085 - DL 936:37.11 [nfscl]
71788 - DL 933:10.96 [nfscl]
82001 - DL 934:45.76 [nfscl]
86222 - DL 931:42.94 [nfscl]
92353 - DL 1186:53.38 [nfscl]
21105 20 S+ 0:00.00 grep nfscl
And this is on a 12-core with Hyperthreading:
# uptime
7:28PM up 11 days, 4:50, 4 users, load averages: 25.49, 21.91, 20.25
Most of this load is being generated by the nfscl threads.
>>>>>> My jail.conf look like:
>>>>>>
>>>>>> exec.start = "/bin/sh /etc/rc";
>>>>>> exec.stop = "/bin/sh /etc/rc.shutdown";
>>>>>> exec.clean;
>>>>>> mount.devfs;
>>>>>> exec.consolelog = "/var/log/jail.$name.log";
>>>>>> #mount.fstab =
"/usr/local/etc/jail.fstab.$name";
>>>>>>
>>>>>> test01 {
>>>>>> host.hostname = "test01_hosting";
>>>>>> ip4.addr = somepublicaddress;
>>>>>> ip4.addr += someprivateaddress;
>>>>>>
>>>>>> mount = "10.13.37.2:/tank/hostingbase
/opt/jails/test01
>>>>>> nfs nfsv4,minorversion=1,pnfs,ro,noatime
0 0";
>>>>>> mount += "10.13.37.2:/tank/hosting/test
>>>>>> /opt/jails/test01/opt nfs
nfsv4,minorversion=1,pnfs,noatime
>>>>>> 0 0";
>>>>>>
>>>>>> path = "/opt/jails/test01";
>>>>>> }
>>>>>>
>>>>>> Last test was with NFS 4.1, I also worked with NFS
4.(0) with the
>>>>>> same
>>>>>> result. In the readonly nfs share there are symbolic
links point to
>>>>>> the
>>>>>> read-write share for logging, storing .run files, etc.
When I monitor
>>>>>> my
>>>>>> network interface with tcpdump, there is little nfs
traffic, only
>>>>>> when I
>>>>>> do try to access the shares there is activity.
>>>>>>
>>>>>> What is causing nfscl to run around in circles, hogging
the CPU (it
>>>>>> makes the system slow to respond too) or how can I
found out what's
>>>>>> the
>>>>>> cause?
>>>>>>
>>>>> Well, the nfscl does server->client RPCs referred to as
callbacks. I
>>>>> have no idea what the implications of running it in a jail
is, but I'd
>>>>> guess that these server->client RPCs get blocked
somehow, etc...
>>>>> (The NFSv4.0 mechanism requires a separate IP address that
the server
>>>>> can connect to on the client. For NFSv4.1, it should use
the same
>>>>> TCP connection as is used for the client->server RPCs.
The latter
>>>>> seems like it should work, but there is probably some
glitch.)
>>>>>
>>>>> ** Just run without the nfscl daemon (it is only needed for
delegations
>>>>> or
>>>>> pNFS).
>>>>
>>>> How can I disable the nfscl daemon?
>>>>
>>> Well, the daemon for the callbacks is called nfscbd.
>>> You should check via "ps ax", to see if you have it
running.
>>> (For NFSv4.0 you probably don't want it running, but for
NFSv4.1 you
>>> do need it. pNFS won't work at all without it, but unless you
have a
>>> server that supports pNFS, it won't work anyhow. Unless your
server is
>>> a clustered Netapp Filer, you should probably not have the
"pnfs" option.)
>>>
>>> To run the "nfscbd" daemon you can set:
>>> nfscbd_enable="TRUE"
>>> in your /etc/rc.conf will start it on boot.
>>> Alternately, just type "nfscbd" as root.
>>>
>>> The "nfscl" thread is always started when an NFSv4 mount
is done. It does
>>> an assortment of housekeeping things, including a Renew op to make
sure the
>>> lease doesn't expire. If for some reason the jail blocks these
Renew RPCs,
>>> it will try to do them over and over and ... because having the
lease
>>> expire is bad news for NFSv4. How could you tell?
>>> Well, capturing packets between the client and server, then looking
at them
>>> in wireshark is probably the only way. (Or maybe a large count for
Renew
>>> in the output from "nfsstat -e".)
>>>
>>> "nfscbd" is optional for NFSv4.0. Without it, you simply
don't do
>>> callbacks/delegations.
>>> For NFSv4.1 it is pretty much required, but doesn't need a
separate
>>> server->client TCP
>>> connection.
>>> --> I'd enable it for NFSv4.1, but disable it for NFSv4.0 at
least as a
>>> starting point.
>>>
>>> And as I said before, none of this is tested within jails, so I
have no
>>> idea
>>> what effect the jails have. Someone who understands jails might
have some
>>> insight
>>> w.r.t. this?
>>>
>>> rick
>>>
>>
>> Since last time I haven't tried to use pnfs and just sticked with
>> nfsv4.0. nfscbd is not running. The server is now running 10.2. The
>> number of renews is not very high (56k, getattr is for example 283M)
>> View with wireshark, renew calls look good ,the nfs status is ok.
>>
>> Is there a way to know what [nfscl] is active with?
>>
> Not that I can think of. When I do "ps axHl" I see it in DL state
and not
> doing much of anything. (You could try setting "sysctl
vfs.nfs.debuglevel=4",
> but I don't think you'll see anything syslog'd that is useful?)
> This is what I'd expect for an NFSv4.0 mount without the nfscbd
running.
nfscbd is not running.
I've done a bit of digging with kgdb, by attatching to a thread of one
of the nfscl's and grabbed a backtrace:
--8<---- snip
(kgdb) info threads
[...]
678 Thread 101151 (PID=82001: nfscl) 0xffffffff80a9780a in
sched_switch ()
[...]
(kgdb) thread 678
[Switching to thread 678 (Thread 101151)]#0 0xffffffff80a9780a in
sched_switch ()
(kgdb) bt
#0 0xffffffff80a9780a in sched_switch ()
#1 0xffffffff80a75a35 in mi_switch ()
#2 0xffffffff80abaa82 in sleepq_timedwait ()
#3 0xffffffff80a75435 in _sleep ()
#4 0xffffffff8095eb7d in nfscl_renewthread ()
#5 0xffffffff80985204 in start_nfscl ()
#6 0xffffffff80a2f815 in fork_exit ()
#7 0xffffffff80ec3b7e in fork_trampoline ()
#8 0x0000000000000000 in ?? ()
--8<---- snap
So as you said, the nfscl thread is mostly hanging in a sleep. However,
every approx. 30-50 seconds (this corresponds quite well with the timing
of the renew calls on the wire) the CPU usage spikes to 100% on all 24
threads.
dtrace(1m)'ing during those spikes show a huge number (order of 10^5)
calls to nfscl_procdoesntexist and nfscl_cleanup_common:
Calls
nfscl_postop_attr 12
nfscl_loadattrcache 17
nfscl_request 22
nfscl_deleggetmodtime 26
nfscl_reqstart 26
nfscl_mustflush 30
nfscl_nodeleg 34
nfscl_cleanup_common 12005
nfscl_procdoesntexist 12315
Times
nfscl_loadattrcache 41949
nfscl_deleggetmodtime 63014
nfscl_postop_attr 65046
nfscl_reqstart 77851
nfscl_procdoesntexist 30187728
nfscl_request 423855128
nfscl_nodeleg 841033772
nfscl_mustflush 4838119753
nfscl_cleanup_common 6436207841
For reference, here's the dtrace(1m) script I'm using:
--8<---- snip
#!/usr/sbin/dtrace -Cs
#pragma D option quiet
fbt::nfscl_*:entry {
@entries[probefunc] = count();
traceme[probefunc] = 1;
ts[probefunc] = timestamp;
}
fbt::nfscl_*:return
/traceme[probefunc]/
{
@times[probefunc] = sum(timestamp - ts[probefunc]);
}
tick-5sec
{
printa(@times);
clear(@times);
printa(@entries);
clear(@entries);
}
--8<---- snap
For completeness, I guess I *should* synchronize this with the for(;;)
loop in nfscl_renewthread and count the actual number of process
terminations, but OTOH, I am extremely sure (just by observing the
number of running processes and PIDs) that it's nowhere near those counts.
Digging a little deeper, I see pfind(9), or more closely pfind_locked is
used in nfscl_procdoesntexist. What I do not understand yet is how
pfind(9) interacts with jails. Maybe someone(TM) could shed some light
on this?
> Basically, when the nfscbd isn't running the server shouldn't issue
any
> delegations, because it shouldn't see a callback path
(server->client TCP
> connection). Also, if you are using a FreeBSD NFS server, it won't
issue
> delegations unless you've enabled that, which isn't the default.
>
> Check to see your Delegs in "nfsstat -e" is 0.
It is.
> If it is, then all the nfscl should be doing is waking up once per second
> and doing very little except a Renew RPC once every 30-60sec. (A fraction
of
> what the server's lease duration is.)
A tcpdump shows at least one renew call/reply pair per minute.
> The only thing I can think of that might cause it to run a lot would be
some
> weirdness related to the TOD clock. It msleep()s for hz and also checks for
> time_uptime (which should be in resolution of seconds) != the previous
time.
> (If the msleep()s were waking up too fequently, then it would loop around
> doing not much of anything, over and over and over again...)
>
>
>> I do understand nfs + jails could have issues, but I like to understand
>> them.
>>
Fabian
[1]
https://lists.freebsd.org/pipermail/freebsd-hackers/2017-June/thread.html#51200