On 13 Oct 2020, at 10:58, Eugene M. Zheganin wrote:> I'm running a FreeBSD 12.1 server as a VM under Hyper-V. And although > this letter will make an impression of another lame post blaming > FreeBSD for all of the issues while the author should blame himselm, > I'm atm out of another explanation. The thing is: I'm getting loads of > sendmail errors like: > > > ===Cut==> > Oct 13 13:49:33 gw1 sm-mta[95760]: 09D8mN2P092173: SYSERR(root): > putbody: write error: Permission denied > Oct 13 13:49:33 gw1 sm-mta[95760]: 09D8mN2P092173: SYSERR(root): > timeout writing message to <whatever>.mail.protection.outlook.com.: > Permission denied > > ===Cut==>A ?Permission denied? on outbound packets can indeed happen when pf decides to block the packet.> The relay address is just random. The thing is, I can successfully > connect to it via telnet. Even send some commands. But when this is > done by senamil - and when it's actually sending messages, I get > random errors. Firstly I was blaming myself and trying to get the rule > that actually blocks something. I ended up having none of the block > rules without log clause, and in the same time tcpdump -netti pflog0 > shows no droppen packets, but sendmail still eventually complains. > > If it matters, I have relatively high rps on this interface, about 25 > Kpps. > > I've also found several posting mentionsing that hnX is badly handling > the TSO and LRO mode, so I switched it off. No luck however, with > vlanhwtag and vlanmtu, which for some reason just cannot be switched > off. the if_hn also lacks a man page for some reason, so it's unclear > how to tweak it right. >While it?s possible that there are issues with TSO/LRO those wouldn?t look like this. (As an aside, I am interested in any reproducible setups where pf has issues with TSO/LRO. As far as I?ve been able to see all such issues have been resolved.)> And the most mysterious part? - when I switch the pf off, the errors > stops to appear. This would clearly mean that pf blocks some packets, > but then again, this way the pflog0 would show them up, right (and yes > - it's "UP" )? >It?s possible for pf to drop packets without triggering log rules. For example, if pf decides to drop the packet before it matches any rule (e.g. it?s a corrupt packet) it won?t show up in pflog.> Is there some issue with pf and hn interfaces that I'm unaware about? >There?s no interface specific code in pf, so it wouldn?t be specific to hn interfaces.> Are these symptoms of a bug ? >Perhaps. It can also be a symptom of resource exhaustion. Are there any signs of memory allocation failures, or incrementing error counters (in netstat or in pfctl)? Best regards, Kristof
On 13/10/2020 11:19, Kristof Provost wrote:> On 13 Oct 2020, at 10:58, Eugene M. Zheganin wrote:>> Is there some issue with pf and hn interfaces that I'm unaware about? >> > There?s no interface specific code in pf, so it wouldn?t be specific to > hn interfaces. > >> Are these symptoms of a bug ? >> > Perhaps. It can also be a symptom of resource exhaustion. > Are there any signs of memory allocation failures, or incrementing error > counters (in netstat or in pfctl)?I have seen this kind of errors in VirtualBox with PF and emulated Intel interface (emX) Oct 1 22:42:19 bobik postfix/smtp[35330]: connect to aspmx.l.google.com[108.177.126.27]:25: Permission denied Oct 1 22:42:19 bobik postfix/smtp[36246]: connect to aspmx.l.google.com[108.177.126.27]:25: Permission denied Oct 1 22:42:19 bobik postfix/smtp[35330]: connect to alt2.aspmx.l.google.com[108.177.97.27]:25: Permission denied Oct 1 22:42:19 bobik postfix/smtp[36246]: connect to alt1.aspmx.l.google.com[172.253.118.27]:25: Permission denied Oct 1 22:42:19 bobik postfix/smtp[35330]: connect to alt1.aspmx.l.google.com[172.253.118.27]:25: Permission denied Oct 1 22:42:19 bobik postfix/smtp[36246]: connect to alt2.aspmx.l.google.com[108.177.97.27]:25: Permission denied I think it is related to states table exhaustion (reported in freebsd-pf@ mailing list about a week ago). My firewall rules are open for all outgoing traffic. So I think your problem is related to some resource exhaustion too. Kind regards Miroslav Lachman
Hello, On 13.10.2020 14:19, Kristof Provost wrote:> Are these symptoms of a bug ? >> > Perhaps. It can also be a symptom of resource exhaustion. > Are there any signs of memory allocation failures, or incrementing > error counters (in netstat or in pfctl)? > >Well, the only signs of resource exhaustion I know so far are: - "PF state limit reached" in /var/log/messages (none so far) - mbufs starvation in netstat -m (zero so far) - various queue failure counters in netstat -s -p tcp, but since this only applies to TCP this is hardly related (although it seems like there's also none). so, what should I take a look at ? Disabled PF shows in pfctl -s info: [root at gw1:/var/log]# pfctl -s info Status: Disabled for 0 days 00:41:42????????? Debug: Urgent State Table????????????????????????? Total???????????? Rate ? current entries???????????????????? 9634 ? searches???????????????????? 24212900618????? 9677418.3/s ? inserts??????????????????????? 222708269??????? 89012.1/s ? removals?????????????????????? 222698635??????? 89008.2/s Counters ? match????????????????????????? 583327668?????? 233144.6/s ? bad-offset???????????????????????????? 0??????????? 0.0/s ? fragment?????????????????????????????? 1??????????? 0.0/s ? short????????????????????????????????? 0??????????? 0.0/s ? normalize????????????????????????????? 0??????????? 0.0/s ? memory???????????????????????????????? 0??????????? 0.0/s ? bad-timestamp????????????????????????? 0??????????? 0.0/s ? congestion???????????????????????????? 0??????????? 0.0/s ? ip-option????????????????????????? 76057?????????? 30.4/s ? proto-cksum???????????????????????? 9669??????????? 3.9/s ? state-mismatch?????????????????? 3007108???????? 1201.9/s ? state-insert?????????????????????? 13236??????????? 5.3/s ? state-limit??????????????????????????? 0??????????? 0.0/s ? src-limit????????????????????????????? 0??????????? 0.0/s ? synproxy?????????????????????????????? 0??????????? 0.0/s ? map-failed???????????????????????????? 0??????????? 0.0/s And these gazzillions of searches kinda bother me a lot, although this seems just to be a counting bug after PF reloading last time, because it's constantly diminished from 20 millions. To be honest I doubt 10 millions of searches per second can be reached on a pps of 22Kpps. Definitely a math bug. Eugene.