Rick Macklem wrote:> Frank de Bot wrote: >> Rick Macklem wrote: >>> Frank de Bot wrote: >>>> Hi, >>>> >>>> On a 10.1-RELEASE-p9 server I have several NFS mounts used for a >>>> jail. >>>> Because it's a server only to test, there is a low load. But the >>>> [nfscl] >>>> process is hogging a CPU after a while. This happens pretty fast, >>>> within >>>> 1 or 2 days. I'm noticing the high CPU of the process when I want to >>>> do >>>> some test after a little while (those 1 or 2 days). >>>> >>>> My jail.conf look like: >>>> >>>> exec.start = "/bin/sh /etc/rc"; >>>> exec.stop = "/bin/sh /etc/rc.shutdown"; >>>> exec.clean; >>>> mount.devfs; >>>> exec.consolelog = "/var/log/jail.$name.log"; >>>> #mount.fstab = "/usr/local/etc/jail.fstab.$name"; >>>> >>>> test01 { >>>> host.hostname = "test01_hosting"; >>>> ip4.addr = somepublicaddress; >>>> ip4.addr += someprivateaddress; >>>> >>>> mount = "10.13.37.2:/tank/hostingbase /opt/jails/test01 >>>> nfs nfsv4,minorversion=1,pnfs,ro,noatime 0 0"; >>>> mount += "10.13.37.2:/tank/hosting/test >>>> /opt/jails/test01/opt nfs nfsv4,minorversion=1,pnfs,noatime >>>> 0 0"; >>>> >>>> path = "/opt/jails/test01"; >>>> } >>>> >>>> Last test was with NFS 4.1, I also worked with NFS 4.(0) with the >>>> same >>>> result. In the readonly nfs share there are symbolic links point to >>>> the >>>> read-write share for logging, storing .run files, etc. When I monitor >>>> my >>>> network interface with tcpdump, there is little nfs traffic, only >>>> when I >>>> do try to access the shares there is activity. >>>> >>>> What is causing nfscl to run around in circles, hogging the CPU (it >>>> makes the system slow to respond too) or how can I found out what's >>>> the >>>> cause? >>>> >>> Well, the nfscl does server->client RPCs referred to as callbacks. I >>> have no idea what the implications of running it in a jail is, but I'd >>> guess that these server->client RPCs get blocked somehow, etc... >>> (The NFSv4.0 mechanism requires a separate IP address that the server >>> can connect to on the client. For NFSv4.1, it should use the same >>> TCP connection as is used for the client->server RPCs. The latter >>> seems like it should work, but there is probably some glitch.) >>> >>> ** Just run without the nfscl daemon (it is only needed for delegations or >>> pNFS). >> >> How can I disable the nfscl daemon? >> > Well, the daemon for the callbacks is called nfscbd. > You should check via "ps ax", to see if you have it running. > (For NFSv4.0 you probably don't want it running, but for NFSv4.1 you > do need it. pNFS won't work at all without it, but unless you have a > server that supports pNFS, it won't work anyhow. Unless your server is > a clustered Netapp Filer, you should probably not have the "pnfs" option.) > > To run the "nfscbd" daemon you can set: > nfscbd_enable="TRUE" > in your /etc/rc.conf will start it on boot. > Alternately, just type "nfscbd" as root. > > The "nfscl" thread is always started when an NFSv4 mount is done. It does > an assortment of housekeeping things, including a Renew op to make sure the > lease doesn't expire. If for some reason the jail blocks these Renew RPCs, > it will try to do them over and over and ... because having the lease > expire is bad news for NFSv4. How could you tell? > Well, capturing packets between the client and server, then looking at them > in wireshark is probably the only way. (Or maybe a large count for Renew > in the output from "nfsstat -e".) > > "nfscbd" is optional for NFSv4.0. Without it, you simply don't do callbacks/delegations. > For NFSv4.1 it is pretty much required, but doesn't need a separate server->client TCP > connection. > --> I'd enable it for NFSv4.1, but disable it for NFSv4.0 at least as a starting point. > > And as I said before, none of this is tested within jails, so I have no idea > what effect the jails have. Someone who understands jails might have some insight > w.r.t. this? > > rick >Since last time I haven't tried to use pnfs and just sticked with nfsv4.0. nfscbd is not running. The server is now running 10.2. The number of renews is not very high (56k, getattr is for example 283M) View with wireshark, renew calls look good ,the nfs status is ok. Is there a way to know what [nfscl] is active with? I do understand nfs + jails could have issues, but I like to understand them. Frank
Frank de Bot wrote:> Rick Macklem wrote: > > Frank de Bot wrote: > >> Rick Macklem wrote: > >>> Frank de Bot wrote: > >>>> Hi, > >>>> > >>>> On a 10.1-RELEASE-p9 server I have several NFS mounts used for a > >>>> jail. > >>>> Because it's a server only to test, there is a low load. But the > >>>> [nfscl] > >>>> process is hogging a CPU after a while. This happens pretty fast, > >>>> within > >>>> 1 or 2 days. I'm noticing the high CPU of the process when I want to > >>>> do > >>>> some test after a little while (those 1 or 2 days). > >>>> > >>>> My jail.conf look like: > >>>> > >>>> exec.start = "/bin/sh /etc/rc"; > >>>> exec.stop = "/bin/sh /etc/rc.shutdown"; > >>>> exec.clean; > >>>> mount.devfs; > >>>> exec.consolelog = "/var/log/jail.$name.log"; > >>>> #mount.fstab = "/usr/local/etc/jail.fstab.$name"; > >>>> > >>>> test01 { > >>>> host.hostname = "test01_hosting"; > >>>> ip4.addr = somepublicaddress; > >>>> ip4.addr += someprivateaddress; > >>>> > >>>> mount = "10.13.37.2:/tank/hostingbase /opt/jails/test01 > >>>> nfs nfsv4,minorversion=1,pnfs,ro,noatime 0 0"; > >>>> mount += "10.13.37.2:/tank/hosting/test > >>>> /opt/jails/test01/opt nfs nfsv4,minorversion=1,pnfs,noatime > >>>> 0 0"; > >>>> > >>>> path = "/opt/jails/test01"; > >>>> } > >>>> > >>>> Last test was with NFS 4.1, I also worked with NFS 4.(0) with the > >>>> same > >>>> result. In the readonly nfs share there are symbolic links point to > >>>> the > >>>> read-write share for logging, storing .run files, etc. When I monitor > >>>> my > >>>> network interface with tcpdump, there is little nfs traffic, only > >>>> when I > >>>> do try to access the shares there is activity. > >>>> > >>>> What is causing nfscl to run around in circles, hogging the CPU (it > >>>> makes the system slow to respond too) or how can I found out what's > >>>> the > >>>> cause? > >>>> > >>> Well, the nfscl does server->client RPCs referred to as callbacks. I > >>> have no idea what the implications of running it in a jail is, but I'd > >>> guess that these server->client RPCs get blocked somehow, etc... > >>> (The NFSv4.0 mechanism requires a separate IP address that the server > >>> can connect to on the client. For NFSv4.1, it should use the same > >>> TCP connection as is used for the client->server RPCs. The latter > >>> seems like it should work, but there is probably some glitch.) > >>> > >>> ** Just run without the nfscl daemon (it is only needed for delegations > >>> or > >>> pNFS). > >> > >> How can I disable the nfscl daemon? > >> > > Well, the daemon for the callbacks is called nfscbd. > > You should check via "ps ax", to see if you have it running. > > (For NFSv4.0 you probably don't want it running, but for NFSv4.1 you > > do need it. pNFS won't work at all without it, but unless you have a > > server that supports pNFS, it won't work anyhow. Unless your server is > > a clustered Netapp Filer, you should probably not have the "pnfs" option.) > > > > To run the "nfscbd" daemon you can set: > > nfscbd_enable="TRUE" > > in your /etc/rc.conf will start it on boot. > > Alternately, just type "nfscbd" as root. > > > > The "nfscl" thread is always started when an NFSv4 mount is done. It does > > an assortment of housekeeping things, including a Renew op to make sure the > > lease doesn't expire. If for some reason the jail blocks these Renew RPCs, > > it will try to do them over and over and ... because having the lease > > expire is bad news for NFSv4. How could you tell? > > Well, capturing packets between the client and server, then looking at them > > in wireshark is probably the only way. (Or maybe a large count for Renew > > in the output from "nfsstat -e".) > > > > "nfscbd" is optional for NFSv4.0. Without it, you simply don't do > > callbacks/delegations. > > For NFSv4.1 it is pretty much required, but doesn't need a separate > > server->client TCP > > connection. > > --> I'd enable it for NFSv4.1, but disable it for NFSv4.0 at least as a > > starting point. > > > > And as I said before, none of this is tested within jails, so I have no > > idea > > what effect the jails have. Someone who understands jails might have some > > insight > > w.r.t. this? > > > > rick > > > > Since last time I haven't tried to use pnfs and just sticked with > nfsv4.0. nfscbd is not running. The server is now running 10.2. The > number of renews is not very high (56k, getattr is for example 283M) > View with wireshark, renew calls look good ,the nfs status is ok. > > Is there a way to know what [nfscl] is active with? >Not that I can think of. When I do "ps axHl" I see it in DL state and not doing much of anything. (You could try setting "sysctl vfs.nfs.debuglevel=4", but I don't think you'll see anything syslog'd that is useful?) This is what I'd expect for an NFSv4.0 mount without the nfscbd running. Basically, when the nfscbd isn't running the server shouldn't issue any delegations, because it shouldn't see a callback path (server->client TCP connection). Also, if you are using a FreeBSD NFS server, it won't issue delegations unless you've enabled that, which isn't the default. Check to see your Delegs in "nfsstat -e" is 0. If it is, then all the nfscl should be doing is waking up once per second and doing very little except a Renew RPC once every 30-60sec. (A fraction of what the server's lease duration is.) The only thing I can think of that might cause it to run a lot would be some weirdness related to the TOD clock. It msleep()s for hz and also checks for time_uptime (which should be in resolution of seconds) != the previous time. (If the msleep()s were waking up too fequently, then it would loop around doing not much of anything, over and over and over again...)> I do understand nfs + jails could have issues, but I like to understand > them. > > > Frank > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >
Frank de Bot wrote:> Rick Macklem wrote: > > Frank de Bot wrote: > >> Rick Macklem wrote: > >>> Frank de Bot wrote: > >>>> Hi, > >>>> > >>>> On a 10.1-RELEASE-p9 server I have several NFS mounts used for a > >>>> jail. > >>>> Because it's a server only to test, there is a low load. But the > >>>> [nfscl] > >>>> process is hogging a CPU after a while. This happens pretty fast, > >>>> within > >>>> 1 or 2 days. I'm noticing the high CPU of the process when I want to > >>>> do > >>>> some test after a little while (those 1 or 2 days). > >>>> > >>>> My jail.conf look like: > >>>> > >>>> exec.start = "/bin/sh /etc/rc"; > >>>> exec.stop = "/bin/sh /etc/rc.shutdown"; > >>>> exec.clean; > >>>> mount.devfs; > >>>> exec.consolelog = "/var/log/jail.$name.log"; > >>>> #mount.fstab = "/usr/local/etc/jail.fstab.$name"; > >>>> > >>>> test01 { > >>>> host.hostname = "test01_hosting"; > >>>> ip4.addr = somepublicaddress; > >>>> ip4.addr += someprivateaddress; > >>>> > >>>> mount = "10.13.37.2:/tank/hostingbase /opt/jails/test01 > >>>> nfs nfsv4,minorversion=1,pnfs,ro,noatime 0 0"; > >>>> mount += "10.13.37.2:/tank/hosting/test > >>>> /opt/jails/test01/opt nfs nfsv4,minorversion=1,pnfs,noatime > >>>> 0 0"; > >>>> > >>>> path = "/opt/jails/test01"; > >>>> } > >>>> > >>>> Last test was with NFS 4.1, I also worked with NFS 4.(0) with the > >>>> same > >>>> result. In the readonly nfs share there are symbolic links point to > >>>> the > >>>> read-write share for logging, storing .run files, etc. When I monitor > >>>> my > >>>> network interface with tcpdump, there is little nfs traffic, only > >>>> when I > >>>> do try to access the shares there is activity. > >>>> > >>>> What is causing nfscl to run around in circles, hogging the CPU (it > >>>> makes the system slow to respond too) or how can I found out what's > >>>> the > >>>> cause? > >>>> > >>> Well, the nfscl does server->client RPCs referred to as callbacks. I > >>> have no idea what the implications of running it in a jail is, but I'd > >>> guess that these server->client RPCs get blocked somehow, etc... > >>> (The NFSv4.0 mechanism requires a separate IP address that the server > >>> can connect to on the client. For NFSv4.1, it should use the same > >>> TCP connection as is used for the client->server RPCs. The latter > >>> seems like it should work, but there is probably some glitch.) > >>> > >>> ** Just run without the nfscl daemon (it is only needed for delegations > >>> or > >>> pNFS). > >> > >> How can I disable the nfscl daemon? > >> > > Well, the daemon for the callbacks is called nfscbd. > > You should check via "ps ax", to see if you have it running. > > (For NFSv4.0 you probably don't want it running, but for NFSv4.1 you > > do need it. pNFS won't work at all without it, but unless you have a > > server that supports pNFS, it won't work anyhow. Unless your server is > > a clustered Netapp Filer, you should probably not have the "pnfs" option.) > > > > To run the "nfscbd" daemon you can set: > > nfscbd_enable="TRUE" > > in your /etc/rc.conf will start it on boot. > > Alternately, just type "nfscbd" as root. > > > > The "nfscl" thread is always started when an NFSv4 mount is done. It does > > an assortment of housekeeping things, including a Renew op to make sure the > > lease doesn't expire. If for some reason the jail blocks these Renew RPCs, > > it will try to do them over and over and ... because having the lease > > expire is bad news for NFSv4. How could you tell? > > Well, capturing packets between the client and server, then looking at them > > in wireshark is probably the only way. (Or maybe a large count for Renew > > in the output from "nfsstat -e".) > > > > "nfscbd" is optional for NFSv4.0. Without it, you simply don't do > > callbacks/delegations. > > For NFSv4.1 it is pretty much required, but doesn't need a separate > > server->client TCP > > connection. > > --> I'd enable it for NFSv4.1, but disable it for NFSv4.0 at least as a > > starting point. > > > > And as I said before, none of this is tested within jails, so I have no > > idea > > what effect the jails have. Someone who understands jails might have some > > insight > > w.r.t. this? > > > > rick > > > > Since last time I haven't tried to use pnfs and just sticked with > nfsv4.0. nfscbd is not running. The server is now running 10.2. The > number of renews is not very high (56k, getattr is for example 283M) > View with wireshark, renew calls look good ,the nfs status is ok. > > Is there a way to know what [nfscl] is active with? >Btw, I'm an old-school debugger, which means I'd add a bunch of "printf()s" to the function called nfscl_renewthread() in sys/fs/nfsclient/nfs_clstate.c. (That's the nfscl thread. It should only do the "for(;;)" loop once/sec, but if you get lots of loop iterations, you might be able to isolate why via printf()s.) You did say it was a test system. Good luck with it, rick> I do understand nfs + jails could have issues, but I like to understand > them. > > > Frank > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >
Frank de Bot wrote:> Rick Macklem wrote: > > Frank de Bot wrote: > >> Rick Macklem wrote: > >>> Frank de Bot wrote: > >>>> Hi, > >>>> > >>>> On a 10.1-RELEASE-p9 server I have several NFS mounts used for a > >>>> jail. > >>>> Because it's a server only to test, there is a low load. But the > >>>> [nfscl] > >>>> process is hogging a CPU after a while. This happens pretty fast, > >>>> within > >>>> 1 or 2 days. I'm noticing the high CPU of the process when I want to > >>>> do > >>>> some test after a little while (those 1 or 2 days). > >>>> > >>>> My jail.conf look like: > >>>> > >>>> exec.start = "/bin/sh /etc/rc"; > >>>> exec.stop = "/bin/sh /etc/rc.shutdown"; > >>>> exec.clean; > >>>> mount.devfs; > >>>> exec.consolelog = "/var/log/jail.$name.log"; > >>>> #mount.fstab = "/usr/local/etc/jail.fstab.$name"; > >>>> > >>>> test01 { > >>>> host.hostname = "test01_hosting"; > >>>> ip4.addr = somepublicaddress; > >>>> ip4.addr += someprivateaddress; > >>>> > >>>> mount = "10.13.37.2:/tank/hostingbase /opt/jails/test01 > >>>> nfs nfsv4,minorversion=1,pnfs,ro,noatime 0 0"; > >>>> mount += "10.13.37.2:/tank/hosting/test > >>>> /opt/jails/test01/opt nfs nfsv4,minorversion=1,pnfs,noatime > >>>> 0 0"; > >>>> > >>>> path = "/opt/jails/test01"; > >>>> } > >>>> > >>>> Last test was with NFS 4.1, I also worked with NFS 4.(0) with the > >>>> same > >>>> result. In the readonly nfs share there are symbolic links point to > >>>> the > >>>> read-write share for logging, storing .run files, etc. When I monitor > >>>> my > >>>> network interface with tcpdump, there is little nfs traffic, only > >>>> when I > >>>> do try to access the shares there is activity. > >>>> > >>>> What is causing nfscl to run around in circles, hogging the CPU (it > >>>> makes the system slow to respond too) or how can I found out what's > >>>> the > >>>> cause? > >>>> > >>> Well, the nfscl does server->client RPCs referred to as callbacks. I > >>> have no idea what the implications of running it in a jail is, but I'd > >>> guess that these server->client RPCs get blocked somehow, etc... > >>> (The NFSv4.0 mechanism requires a separate IP address that the server > >>> can connect to on the client. For NFSv4.1, it should use the same > >>> TCP connection as is used for the client->server RPCs. The latter > >>> seems like it should work, but there is probably some glitch.) > >>> > >>> ** Just run without the nfscl daemon (it is only needed for delegations > >>> or > >>> pNFS). > >> > >> How can I disable the nfscl daemon? > >> > > Well, the daemon for the callbacks is called nfscbd. > > You should check via "ps ax", to see if you have it running. > > (For NFSv4.0 you probably don't want it running, but for NFSv4.1 you > > do need it. pNFS won't work at all without it, but unless you have a > > server that supports pNFS, it won't work anyhow. Unless your server is > > a clustered Netapp Filer, you should probably not have the "pnfs" option.) > > > > To run the "nfscbd" daemon you can set: > > nfscbd_enable="TRUE" > > in your /etc/rc.conf will start it on boot. > > Alternately, just type "nfscbd" as root. > > > > The "nfscl" thread is always started when an NFSv4 mount is done. It does > > an assortment of housekeeping things, including a Renew op to make sure the > > lease doesn't expire. If for some reason the jail blocks these Renew RPCs, > > it will try to do them over and over and ... because having the lease > > expire is bad news for NFSv4. How could you tell? > > Well, capturing packets between the client and server, then looking at them > > in wireshark is probably the only way. (Or maybe a large count for Renew > > in the output from "nfsstat -e".) > > > > "nfscbd" is optional for NFSv4.0. Without it, you simply don't do > > callbacks/delegations. > > For NFSv4.1 it is pretty much required, but doesn't need a separate > > server->client TCP > > connection. > > --> I'd enable it for NFSv4.1, but disable it for NFSv4.0 at least as a > > starting point. > > > > And as I said before, none of this is tested within jails, so I have no > > idea > > what effect the jails have. Someone who understands jails might have some > > insight > > w.r.t. this? > > > > rick > > > > Since last time I haven't tried to use pnfs and just sticked with > nfsv4.0. nfscbd is not running. The server is now running 10.2. The > number of renews is not very high (56k, getattr is for example 283M) > View with wireshark, renew calls look good ,the nfs status is ok. > > Is there a way to know what [nfscl] is active with? > > I do understand nfs + jails could have issues, but I like to understand > them. >It is conceivable that this high load is caused by the problem identified in PR#205193, where jails can't talk to the nfsuserd because 127.0.0.1 gets translated to another ip address for the machine. The attached patches are the same ones as in the PR, which change the nfsuserd to use an AF_LOCAL socket instead. If it's convenient, it would be nice to try these patches (kernel + nfsuserd). rick ps: They are against head, so I'm not sure how easily they will apply to FreeBSD10.> > Frank > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org" >-------------- next part -------------- A non-text attachment was scrubbed... Name: nfsuserd-aflocal-kern.patch Type: text/x-patch Size: 4120 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20151215/e72fa938/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: nfsuserd-aflocal.patch Type: text/x-patch Size: 4683 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20151215/e72fa938/attachment-0001.bin>