Toke Høiland-Jørgensen
2021-Nov-26 18:47 UTC
[PATCH v2 net-next 21/26] ice: add XDP and XSK generic per-channel statistics
Jakub Kicinski <kuba at kernel.org> writes:> On Fri, 26 Nov 2021 13:30:16 +0100 Toke H?iland-J?rgensen wrote: >> >> TBH I wasn't following this thread too closely since I saw Daniel >> >> nacked it already. I do prefer rtnl xstats, I'd just report them >> >> in -s if they are non-zero. But doesn't sound like we have an agreement >> >> whether they should exist or not. >> > >> > Right, just -s is fine, if we drop the per-channel approach. >> >> I agree that adding them to -s is fine (and that resolves my "no one >> will find them" complain as well). If it crowds the output we could also >> default to only output'ing a subset, and have the more detailed >> statistics hidden behind a verbose switch (or even just in the JSON >> output)? >> >> >> Can we think of an approach which would make cloudflare and cilium >> >> happy? Feels like we're trying to make the slightly hypothetical >> >> admin happy while ignoring objections of very real users. >> > >> > The initial idea was to only uniform the drivers. But in general >> > you are right, 10 drivers having something doesn't mean it's >> > something good. >> >> I don't think it's accurate to call the admin use case "hypothetical". >> We're expending a significant effort explaining to people that XDP can >> "eat" your packets, and not having any standard statistics makes this >> way harder. We should absolutely cater to our "early adopters", but if >> we want XDP to see wider adoption, making it "less weird" is critical! > > Fair. In all honesty I said that hoping to push for a more flexible > approach hidden entirely in BPF, and not involving driver changes. > Assuming the XDP program has more fine grained stats we should be able > to extract those instead of double-counting. Hence my vague "let's work > with apps" comment. > > For example to a person familiar with the workload it'd be useful to > know if program returned XDP_DROP because of configured policy or > failure to parse a packet. I don't think that sort distinction is > achievable at the level of standard stats. > > The information required by the admin is higher level. As you say the > primary concern there is "how many packets did XDP eat".Right, sure, I am also totally fine with having only a somewhat restricted subset of stats available at the interface level and make everything else be BPF-based. I'm hoping we can converge of a common understanding of what this "minimal set" should be :)> Speaking of which, one thing that badly needs clarification is our > expectation around XDP packets getting counted towards the interface > stats.Agreed. My immediate thought is that "XDP packets are interface packets" but that is certainly not what we do today, so not sure if changing it at this point would break things?>> > Maciej, I think you were talking about Cilium asking for those stats >> > in Intel drivers? Could you maybe provide their exact usecases/needs >> > so I'll orient myself? I certainly remember about XSK Tx packets and >> > bytes. >> > And speaking of XSK Tx, we have per-socket stats, isn't that enough? >> >> IMO, as long as the packets are accounted for in the regular XDP stats, >> having a whole separate set of stats only for XSK is less important. >> >> >> Please leave the per-channel stats out. They make a precedent for >> >> channel stats which should be an attribute of a channel. Working for >> >> a large XDP user for a couple of years now I can tell you from my own >> >> experience I've not once found them useful. In fact per-queue stats are >> >> a major PITA as they crowd the output. >> > >> > Oh okay. My very first iterations were without this, but then I >> > found most of the drivers expose their XDP stats per-channel. Since >> > I didn't plan to degrade the functionality, they went that way. >> >> I personally find the per-channel stats quite useful. One of the primary >> reasons for not achieving full performance with XDP is broken >> configuration of packet steering to CPUs, and having per-channel stats >> is a nice way of seeing this. > > Right, that's about the only thing I use it for as well. "Is the load > evenly distributed?" But that's not XDP specific and not worth > standardizing for, yet, IMO, because.. > >> I can see the point about them being way too verbose in the default >> output, though, and I do generally filter the output as well when >> viewing them. But see my point above about only printing a subset of >> the stats by default; per-channel stats could be JSON-only, for >> instance? > > we don't even know what constitutes a channel today. And that will > become increasingly problematic as importance of application specific > queues increases (zctap etc). IMO until the ontological gaps around > queues are filled we should leave per-queue stats in ethtool -S.Hmm, right, I see. I suppose that as long as the XDP packets show up in one of the interface counters in ethtool -S, it's possible to answer the load distribution issue, and any further debugging (say, XDP drops on a certain queue due to CPU-based queue indexing on TX) can be delegated to BPF-based tools... -Toke