Try changing bool_t do_tcp = FALSE; to TRUE in
/usr/src/sys/nlm/nlm_prot_impl.c, recompile the kernel and try again. I
think this makes it match Linux client behavior. I suspect I ran into
the same issue as you. I do think I used nolockd is a workaround
temporarily. I can provide some more details if it works.
On 12/19/19 9:21 AM, Daniel Braniss wrote:>
>
>> On 19 Dec 2019, at 16:09, Rick Macklem <rmacklem at uoguelph.ca>
wrote:
>>
>> Daniel Braniss wrote:
>> [stuff snipped]
>>> all mounts are nfsv3/tcp
>> This doesn't affect what the NLM code (rpc.lockd) uses. I honestly
don't know when
>> the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at
times.
> can the replay cache have any influence here? I tend to remember way back
issues
> with it,
>>
>> To me, it looks like a network configuration issue.
> that was/is my gut feelings too, but, as far as we can tell, nothing has
changed in the network infrastructure,
> the problems appeared after the NetAPP?s software was updated, it was
working fine till then.
>
> the problems are also happening on freebsd 12.1
>
>> You could capture packets (maybe when a client first starts rpc.statd
and rpc.lockd)
>> and then look at them in wireshark. I'd disable statup of rpc.lockd
and rpc.statd
>> at boot for a test client and then run something like:
>> # tcpdump -s 0 -s out.pcap host <netapp-host>
>> - and then start rpc.statd and rpc.lockd
>> Then I'd look at out.pcap in wireshark (much better at decoding
this stuff than
>> tcpdump). I'd look for things like different reply IP addresses
from the Netapp,
>> which might confuse this tired old NLM protocol Sun devised in the
mid-1980s.
>>
> it?s going to be an interesting week end :-(
>
>>> the error is also appearing on freebsd-11.2-stable, I?m now
checking if it?s also
>>> happening on 12.1
>>> btw, the NetApp version is 9.3P17
>> Yes. I wasn't the author of the NSM and NLM code (long ago I
refused to even
>> try to implement it, because I knew the protocol was badly broken) and
I avoid
>> fiddling with. As such, it won't have change much since around
FreeBSD7.
> and we haven?t had any issues with it for years, so you must have done
something good
>
> cheers,
> danny
>
>>
>> rick
>>
>> cheers,
>> danny
>>
>>> rick
>>>
>>> Cheers
>>>
>>> Richard
>>> (NetApp admin)
>>>
>>> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss <danny at
cs.huji.ac.il<mailto:danny at cs.huji.ac.il>> wrote:
>>>
>>>
>>>> On 18 Dec 2019, at 16:55, Rick Macklem <rmacklem at
uoguelph.ca<mailto:rmacklem at uoguelph.ca>> wrote:
>>>>
>>>> Daniel Braniss wrote:
>>>>
>>>>> Hi,
>>>>> The server with the problems is running FreeBSD 11.1
stable, it was working fine for >several months,
>>>>> but after a software upgrade of our NetAPP server it?s
reporting many lockd errors >and becomes catatonic,
>>>>> ...
>>>>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www:
lockd not responding
>>>>> Dec 18 13:11:45 moo-09 last message repeated 7 times
>>>>> Dec 18 13:12:55 moo-09 last message repeated 8 times
>>>>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www:
lockd is alive again
>>>>> Dec 18 13:13:10 moo-09 last message repeated 8 times
>>>>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb
0xfffff8004cc051d0: Listen queue >overflow: 194 already in queue awaiting
acceptance (1 occurrences)
>>>>> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb
0xfffff8004cc051d0: Listen queue >overflow: 193 already in queue awaiting
acceptance (3957 occurrences)
>>>>> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb
0xfffff8004cc051d0: Listen queue >overflow: 193 already in queue awaiting
acceptance ?
>>>> Seems like their software upgrade didn't improve handling
of NLM RPCs?
>>>> Appears to be handling RPCs slowly and/or intermittently. Note
that no one
>>>> tests it with IPv6, so at least make sure you are still using
IPv4 for the mounts and
>>>> try and make sure IP broadcast works between client and Netapp.
I think the NLM
>>>> and NSM (rpc.statd) still use IP broadcast sometimes.
>>>>
>>> we are ipv4 - we have our own class c :-)
>>>> Maybe the network guys can suggest more w.r.t. why, but as
I've stated before,
>>>> the NLM is a fundamentally broken protocol which was never
published by Sun,
>>>> so I suggest you avoid using it if at all possible.
>>> well, at the moment the ball is on NetAPP court, and switching to
NFSv4 at the moment is out of the question, it?s
>>> a production server used by several thousand students.
>>>
>>>>
>>>> - If the locks don't need to be seen by other clients, you
can just use the "nolockd"
>>>> mount option.
>>>> or
>>>> - If locks need to be seen by other clients, try NFSv4 mounts.
Netapp filers
>>>> should support NFSv4.1, which is a much better protocol that
NFSv4.0.
>>>>
>>>> Good luck with it, rick
>>> thanks
>>> danny
>>>
>>>> ?
>>>> any ideas?
>>>>
>>>> thanks,
>>>> danny
>>>>
>>>> _______________________________________________
>>>> freebsd-stable at freebsd.org<mailto:freebsd-stable at
freebsd.org> mailing list
>>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>>> To unsubscribe, send any mail to
"freebsd-stable-unsubscribe at
freebsd.org<mailto:freebsd-stable-unsubscribe at freebsd.org>"
>>>
>>> _______________________________________________
>>> freebsd-stable at freebsd.org<mailto:freebsd-stable at
freebsd.org> mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe
at freebsd.org<mailto:freebsd-stable-unsubscribe at freebsd.org>"
>>
>
> _______________________________________________
> freebsd-stable at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at
freebsd.org"
>