From: Eric Dumazet <edumazet at google.com> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race on 32bit arches. We must use separate syncp for rx and tx path as they can be run at the same time on different cpus. Thus one sequence increment can be lost and readers spin forever. Signed-off-by: Eric Dumazet <edumazet at google.com> Cc: Stephen Hemminger <shemminger at vyatta.com> Cc: Michael S. Tsirkin <mst at redhat.com> Cc: Jason Wang <jasowang at redhat.com> --- drivers/net/virtio_net.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 5214b1e..f18149a 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -42,7 +42,8 @@ module_param(gso, bool, 0444); #define VIRTNET_DRIVER_VERSION "1.0.0" struct virtnet_stats { - struct u64_stats_sync syncp; + struct u64_stats_sync tx_syncp; + struct u64_stats_sync rx_syncp; u64 tx_bytes; u64 tx_packets; @@ -300,10 +301,10 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len) hdr = skb_vnet_hdr(skb); - u64_stats_update_begin(&stats->syncp); + u64_stats_update_begin(&stats->rx_syncp); stats->rx_bytes += skb->len; stats->rx_packets++; - u64_stats_update_end(&stats->syncp); + u64_stats_update_end(&stats->rx_syncp); if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { pr_debug("Needs csum!\n"); @@ -565,10 +566,10 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi) while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { pr_debug("Sent skb %p\n", skb); - u64_stats_update_begin(&stats->syncp); + u64_stats_update_begin(&stats->tx_syncp); stats->tx_bytes += skb->len; stats->tx_packets++; - u64_stats_update_end(&stats->syncp); + u64_stats_update_end(&stats->tx_syncp); tot_sgs += skb_vnet_hdr(skb)->num_sg; dev_kfree_skb_any(skb); @@ -703,12 +704,16 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev, u64 tpackets, tbytes, rpackets, rbytes; do { - start = u64_stats_fetch_begin(&stats->syncp); + start = u64_stats_fetch_begin(&stats->tx_syncp); tpackets = stats->tx_packets; tbytes = stats->tx_bytes; + } while (u64_stats_fetch_retry(&stats->tx_syncp, start)); + + do { + start = u64_stats_fetch_begin(&stats->rx_syncp); rpackets = stats->rx_packets; rbytes = stats->rx_bytes; - } while (u64_stats_fetch_retry(&stats->syncp, start)); + } while (u64_stats_fetch_retry(&stats->rx_syncp, start)); tot->rx_packets += rpackets; tot->tx_packets += tpackets;
On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote:> From: Eric Dumazet <edumazet at google.com> > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race > on 32bit arches. > > We must use separate syncp for rx and tx path as they can be run at the > same time on different cpus. Thus one sequence increment can be lost and > readers spin forever. > > Signed-off-by: Eric Dumazet <edumazet at google.com> > Cc: Stephen Hemminger <shemminger at vyatta.com> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Jason Wang <jasowang at redhat.com> > ---Just to make clear : even using percpu stats/syncp, we have no guarantee that write_seqcount_begin() is done with one instruction. [1] It is OK on x86 if "incl" instruction is generated by the compiler, but on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can be interrupted. So if you are 100% sure all paths are safe against preemption/BH, then this patch is not needed, but a big comment in the code would avoid adding possible races in the future. [1] If done with one instruction, we still have a race, since a reader might see an even sequence and conclude no writer is inside the critical section. So read values could be wrong.
On 06/06/2012 04:45 PM, Eric Dumazet wrote:> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote: >> From: Eric Dumazet<edumazet at google.com> >> >> commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race >> on 32bit arches. >> >> We must use separate syncp for rx and tx path as they can be run at the >> same time on different cpus. Thus one sequence increment can be lost and >> readers spin forever. >> >> Signed-off-by: Eric Dumazet<edumazet at google.com> >> Cc: Stephen Hemminger<shemminger at vyatta.com> >> Cc: Michael S. Tsirkin<mst at redhat.com> >> Cc: Jason Wang<jasowang at redhat.com> >> --- > Just to make clear : even using percpu stats/syncp, we have no guarantee > that write_seqcount_begin() is done with one instruction. [1] > > It is OK on x86 if "incl" instruction is generated by the compiler, but > on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can > be interrupted. > > So if you are 100% sure all paths are safe against preemption/BH, then > this patch is not needed, but a big comment in the code would avoid > adding possible races in the future.Thanks for explaing, current virtio-net is safe I think. But the patch is still needed as my patch would update the statistics in irq.> > [1] If done with one instruction, we still have a race, since a reader > might see an even sequence and conclude no writer is inside the critical > section. So read values could be wrong. > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On Wed, Jun 06, 2012 at 10:45:41AM +0200, Eric Dumazet wrote:> On Wed, 2012-06-06 at 10:35 +0200, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet at google.com> > > > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race > > on 32bit arches. > > > > We must use separate syncp for rx and tx path as they can be run at the > > same time on different cpus. Thus one sequence increment can be lost and > > readers spin forever. > > > > Signed-off-by: Eric Dumazet <edumazet at google.com> > > Cc: Stephen Hemminger <shemminger at vyatta.com> > > Cc: Michael S. Tsirkin <mst at redhat.com> > > Cc: Jason Wang <jasowang at redhat.com> > > --- > > Just to make clear : even using percpu stats/syncp, we have no guarantee > that write_seqcount_begin() is done with one instruction. [1] > > It is OK on x86 if "incl" instruction is generated by the compiler, but > on a RISC cpu, the "load memory,%reg ; inc %reg ; store %reg,memory" can > be interrupted. > > So if you are 100% sure all paths are safe against preemption/BH, then > this patch is not needed, but a big comment in the code would avoid > adding possible races in the future.We currently do all stats either on napi callback or from start_xmit callback. This makes them safe, yes?> [1] If done with one instruction, we still have a race, since a reader > might see an even sequence and conclude no writer is inside the critical > section. So read values could be wrong. > >
On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote:> We currently do all stats either on napi callback or from > start_xmit callback. > This makes them safe, yes?Hmm, then _bh() variant is needed in virtnet_stats(), as explained in include/linux/u64_stats_sync.h section 6) * 6) If counter might be written by an interrupt, readers should block interrupts. * (On UP, there is no seqcount_t protection, a reader allowing interrupts could * read partial values) Yes, its tricky... diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 5214b1e..705aaa7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev, u64 tpackets, tbytes, rpackets, rbytes; do { - start = u64_stats_fetch_begin(&stats->syncp); + start = u64_stats_fetch_begin_bh(&stats->syncp); tpackets = stats->tx_packets; tbytes = stats->tx_bytes; rpackets = stats->rx_packets; rbytes = stats->rx_bytes; - } while (u64_stats_fetch_retry(&stats->syncp, start)); + } while (u64_stats_fetch_retry_bh(&stats->syncp, start)); tot->rx_packets += rpackets; tot->tx_packets += tpackets;
On Wed, Jun 06, 2012 at 03:10:10PM +0200, Eric Dumazet wrote:> On Wed, 2012-06-06 at 14:13 +0300, Michael S. Tsirkin wrote: > > > We currently do all stats either on napi callback or from > > start_xmit callback. > > This makes them safe, yes? > > Hmm, then _bh() variant is needed in virtnet_stats(), as explained in > include/linux/u64_stats_sync.h section 6) > > * 6) If counter might be written by an interrupt, readers should block interrupts. > * (On UP, there is no seqcount_t protection, a reader allowing interrupts could > * read partial values) > > Yes, its tricky...Sounds good, but I have a question: this realies on counters being atomic on 64 bit. Would not it be better to always use a seqlock even on 64 bit? This way counters would actually be correct and in sync. As it is if we want e.g. average packet size, we can not rely e.g. on it being bytes/packets.> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 5214b1e..705aaa7 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -703,12 +703,12 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev, > u64 tpackets, tbytes, rpackets, rbytes; > > do { > - start = u64_stats_fetch_begin(&stats->syncp); > + start = u64_stats_fetch_begin_bh(&stats->syncp); > tpackets = stats->tx_packets; > tbytes = stats->tx_bytes; > rpackets = stats->rx_packets; > rbytes = stats->rx_bytes; > - } while (u64_stats_fetch_retry(&stats->syncp, start)); > + } while (u64_stats_fetch_retry_bh(&stats->syncp, start)); > > tot->rx_packets += rpackets; > tot->tx_packets += tpackets; >
On Wed, 6 Jun 2012 17:49:42 +0300 "Michael S. Tsirkin" <mst at redhat.com> wrote:> Sounds good, but I have a question: this realies on counters > being atomic on 64 bit. > Would not it be better to always use a seqlock even on 64 bit? > This way counters would actually be correct and in sync. > As it is if we want e.g. average packet size, > we can not rely e.g. on it being bytes/packets.This has not been a requirement on real physical devices; therefore the added overhead is not really justified. Many network cards use counters in hardware to count packets/bytes and there is no expectation of atomic access there.
On Wed, Jun 06, 2012 at 08:14:32AM -0700, Stephen Hemminger wrote:> On Wed, 6 Jun 2012 17:49:42 +0300 > "Michael S. Tsirkin" <mst at redhat.com> wrote: > > > Sounds good, but I have a question: this realies on counters > > being atomic on 64 bit. > > Would not it be better to always use a seqlock even on 64 bit? > > This way counters would actually be correct and in sync. > > As it is if we want e.g. average packet size, > > we can not rely e.g. on it being bytes/packets. > > This has not been a requirement on real physical devices; therefore > the added overhead is not really justified. > > Many network cards use counters in hardware to count packets/bytes > and there is no expectation of atomic access there.BTW for cards that do implement the counters in software, under xmit lock, is anything wrong with simply taking the xmit lock when we get the stats instead of the per-cpu trick + seqlock?
On Wed, 2012-06-06 at 21:51 +0300, Michael S. Tsirkin wrote:> BTW for cards that do implement the counters in software, > under xmit lock, is anything wrong with simply taking the xmit lock > when we get the stats instead of the per-cpu trick + seqlock? >I still dont understand why you would do that. Most modern machines are 64bits, so there is no seqlock overhead, nothing at all. If you focus on 32bit hardware, just stick on 32bit counters ? Note that most u64_stats_sync users are virtual drivers, without xmit lock (LLTX drivers)
On Wed, Jun 06, 2012 at 10:35:24AM +0200, Eric Dumazet wrote:> From: Eric Dumazet <edumazet at google.com> > > commit 3fa2a1df909 (virtio-net: per cpu 64 bit stats (v2)) added a race > on 32bit arches. > > We must use separate syncp for rx and tx path as they can be run at the > same time on different cpus. Thus one sequence increment can be lost and > readers spin forever. > > Signed-off-by: Eric Dumazet <edumazet at google.com> > Cc: Stephen Hemminger <shemminger at vyatta.com> > Cc: Michael S. Tsirkin <mst at redhat.com> > Cc: Jason Wang <jasowang at redhat.com>I'm still thinking about moving tx to take a xmit lock long term, meanwhile this fix appears appropriate for 3.5. Acked-by: Michael S. Tsirkin <mst at redhat.com> Dave, can you pick this up pls?> --- > drivers/net/virtio_net.c | 19 ++++++++++++------- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 5214b1e..f18149a 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -42,7 +42,8 @@ module_param(gso, bool, 0444); > #define VIRTNET_DRIVER_VERSION "1.0.0" > > struct virtnet_stats { > - struct u64_stats_sync syncp; > + struct u64_stats_sync tx_syncp; > + struct u64_stats_sync rx_syncp; > u64 tx_bytes; > u64 tx_packets; > > @@ -300,10 +301,10 @@ static void receive_buf(struct net_device *dev, void *buf, unsigned int len) > > hdr = skb_vnet_hdr(skb); > > - u64_stats_update_begin(&stats->syncp); > + u64_stats_update_begin(&stats->rx_syncp); > stats->rx_bytes += skb->len; > stats->rx_packets++; > - u64_stats_update_end(&stats->syncp); > + u64_stats_update_end(&stats->rx_syncp); > > if (hdr->hdr.flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { > pr_debug("Needs csum!\n"); > @@ -565,10 +566,10 @@ static unsigned int free_old_xmit_skbs(struct virtnet_info *vi) > while ((skb = virtqueue_get_buf(vi->svq, &len)) != NULL) { > pr_debug("Sent skb %p\n", skb); > > - u64_stats_update_begin(&stats->syncp); > + u64_stats_update_begin(&stats->tx_syncp); > stats->tx_bytes += skb->len; > stats->tx_packets++; > - u64_stats_update_end(&stats->syncp); > + u64_stats_update_end(&stats->tx_syncp); > > tot_sgs += skb_vnet_hdr(skb)->num_sg; > dev_kfree_skb_any(skb); > @@ -703,12 +704,16 @@ static struct rtnl_link_stats64 *virtnet_stats(struct net_device *dev, > u64 tpackets, tbytes, rpackets, rbytes; > > do { > - start = u64_stats_fetch_begin(&stats->syncp); > + start = u64_stats_fetch_begin(&stats->tx_syncp); > tpackets = stats->tx_packets; > tbytes = stats->tx_bytes; > + } while (u64_stats_fetch_retry(&stats->tx_syncp, start)); > + > + do { > + start = u64_stats_fetch_begin(&stats->rx_syncp); > rpackets = stats->rx_packets; > rbytes = stats->rx_bytes; > - } while (u64_stats_fetch_retry(&stats->syncp, start)); > + } while (u64_stats_fetch_retry(&stats->rx_syncp, start)); > > tot->rx_packets += rpackets; > tot->tx_packets += tpackets; >
Reasonably Related Threads
- [PATCH] virtio-net: fix a race on 32bit arches
- [PATCH] virtio-net: per cpu 64 bit stats
- [PATCH] virtio-net: per cpu 64 bit stats
- [V2 RFC net-next PATCH 1/2] virtio_net: convert the statistics into array
- [V2 RFC net-next PATCH 1/2] virtio_net: convert the statistics into array