Feng Liu
2023-May-08 15:00 UTC
[PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
On 2023-05-07 p.m.9:45, Xuan Zhuo wrote:> External email: Use caution opening links or attachments > > > On Sat, 6 May 2023 08:08:02 -0400, Feng Liu <feliu at nvidia.com> wrote: >> >> >> On 2023-05-05 p.m.10:33, Xuan Zhuo wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Tue, 2 May 2023 20:35:25 -0400, Feng Liu <feliu at nvidia.com> wrote: >>>> When initializing XDP in virtnet_open(), some rq xdp initialization >>>> may hit an error causing net device open failed. However, previous >>>> rqs have already initialized XDP and enabled NAPI, which is not the >>>> expected behavior. Need to roll back the previous rq initialization >>>> to avoid leaks in error unwinding of init code. >>>> >>>> Also extract a helper function of disable queue pairs, and use newly >>>> introduced helper function in error unwinding and virtnet_close; >>>> >>>> Issue: 3383038 >>>> Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") >>>> Signed-off-by: Feng Liu <feliu at nvidia.com> >>>> Reviewed-by: William Tu <witu at nvidia.com> >>>> Reviewed-by: Parav Pandit <parav at nvidia.com> >>>> Reviewed-by: Simon Horman <simon.horman at corigine.com> >>>> Acked-by: Michael S. Tsirkin <mst at redhat.com> >>>> Change-Id: Ib4c6a97cb7b837cfa484c593dd43a435c47ea68f >>>> --- >>>> drivers/net/virtio_net.c | 30 ++++++++++++++++++++---------- >>>> 1 file changed, 20 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>> index 8d8038538fc4..3737cf120cb7 100644 >>>> --- a/drivers/net/virtio_net.c >>>> +++ b/drivers/net/virtio_net.c >>>> @@ -1868,6 +1868,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) >>>> return received; >>>> } >>>> >>>> +static void virtnet_disable_qp(struct virtnet_info *vi, int qp_index) >>>> +{ >>>> + virtnet_napi_tx_disable(&vi->sq[qp_index].napi); >>>> + napi_disable(&vi->rq[qp_index].napi); >>>> + xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq); >>>> +} >>>> + >>>> static int virtnet_open(struct net_device *dev) >>>> { >>>> struct virtnet_info *vi = netdev_priv(dev); >>>> @@ -1883,20 +1890,26 @@ static int virtnet_open(struct net_device *dev) >>>> >>>> err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id); >>>> if (err < 0) >>>> - return err; >>>> + goto err_xdp_info_reg; >>>> >>>> err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, >>>> MEM_TYPE_PAGE_SHARED, NULL); >>>> - if (err < 0) { >>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); >>>> - return err; >>>> - } >>>> + if (err < 0) >>>> + goto err_xdp_reg_mem_model; >>>> >>>> virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); >>>> virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); >>>> } >>>> >>>> return 0; >>>> + >>>> +err_xdp_reg_mem_model: >>>> + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); >>>> +err_xdp_info_reg: >>>> + for (i = i - 1; i >= 0; i--) >>>> + virtnet_disable_qp(vi, i); >>> >>> >>> I would to know should we handle for these: >>> >>> disable_delayed_refill(vi); >>> cancel_delayed_work_sync(&vi->refill); >>> >>> >>> Maybe we should call virtnet_close() with "i" directly. >>> >>> Thanks. >>> >>> >> Can?t use i directly here, because if xdp_rxq_info_reg fails, napi has >> not been enabled for current qp yet, I should roll back from the queue >> pairs where napi was enabled before(i--), otherwise it will hang at napi >> disable api > > This is not the point, the key is whether we should handle with: > > disable_delayed_refill(vi); > cancel_delayed_work_sync(&vi->refill); > > Thanks. > >OK, get the point. Thanks for your careful review. And I check the code again. There are two points that I need to explain: 1. All refill delay work calls(vi->refill, vi->refill_enabled) are based on that the virtio interface is successfully opened, such as virtnet_receive, virtnet_rx_resize, _virtnet_set_queues, etc. If there is an error in the xdp reg here, it will not trigger these subsequent functions. There is no need to call disable_delayed_refill() and cancel_delayed_work_sync(). The logic here is different from that of virtnet_close. virtnet_close is based on the success of virtnet_open and the tx and rx has been carried out normally. For error unwinding, only disable qp is needed. Also encapuslated a helper function of disable qp, which is used ing error unwinding and virtnet close 2. The current error qp, which has not enabled NAPI, can only call xdp unreg, and cannot call the interface of disable NAPI, otherwise the kernel will be stuck. So for i-- the reason for calling disable qp on the previous queue Thanks>> >>>> + >>>> + return err; >>>> } >>>> >>>> static int virtnet_poll_tx(struct napi_struct *napi, int budget) >>>> @@ -2305,11 +2318,8 @@ static int virtnet_close(struct net_device *dev) >>>> /* Make sure refill_work doesn't re-enable napi! */ >>>> cancel_delayed_work_sync(&vi->refill); >>>> >>>> - for (i = 0; i < vi->max_queue_pairs; i++) { >>>> - virtnet_napi_tx_disable(&vi->sq[i].napi); >>>> - napi_disable(&vi->rq[i].napi); >>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); >>>> - } >>>> + for (i = 0; i < vi->max_queue_pairs; i++) >>>> + virtnet_disable_qp(vi, i); >>>> >>>> return 0; >>>> } >>>> -- >>>> 2.37.1 (Apple Git-137.1) >>>>
Xuan Zhuo
2023-May-09 01:43 UTC
[PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
On Mon, 8 May 2023 11:00:10 -0400, Feng Liu <feliu at nvidia.com> wrote:> > > On 2023-05-07 p.m.9:45, Xuan Zhuo wrote: > > External email: Use caution opening links or attachments > > > > > > On Sat, 6 May 2023 08:08:02 -0400, Feng Liu <feliu at nvidia.com> wrote: > >> > >> > >> On 2023-05-05 p.m.10:33, Xuan Zhuo wrote: > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Tue, 2 May 2023 20:35:25 -0400, Feng Liu <feliu at nvidia.com> wrote: > >>>> When initializing XDP in virtnet_open(), some rq xdp initialization > >>>> may hit an error causing net device open failed. However, previous > >>>> rqs have already initialized XDP and enabled NAPI, which is not the > >>>> expected behavior. Need to roll back the previous rq initialization > >>>> to avoid leaks in error unwinding of init code. > >>>> > >>>> Also extract a helper function of disable queue pairs, and use newly > >>>> introduced helper function in error unwinding and virtnet_close; > >>>> > >>>> Issue: 3383038 > >>>> Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") > >>>> Signed-off-by: Feng Liu <feliu at nvidia.com> > >>>> Reviewed-by: William Tu <witu at nvidia.com> > >>>> Reviewed-by: Parav Pandit <parav at nvidia.com> > >>>> Reviewed-by: Simon Horman <simon.horman at corigine.com> > >>>> Acked-by: Michael S. Tsirkin <mst at redhat.com> > >>>> Change-Id: Ib4c6a97cb7b837cfa484c593dd43a435c47ea68f > >>>> --- > >>>> drivers/net/virtio_net.c | 30 ++++++++++++++++++++---------- > >>>> 1 file changed, 20 insertions(+), 10 deletions(-) > >>>> > >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>> index 8d8038538fc4..3737cf120cb7 100644 > >>>> --- a/drivers/net/virtio_net.c > >>>> +++ b/drivers/net/virtio_net.c > >>>> @@ -1868,6 +1868,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) > >>>> return received; > >>>> } > >>>> > >>>> +static void virtnet_disable_qp(struct virtnet_info *vi, int qp_index) > >>>> +{ > >>>> + virtnet_napi_tx_disable(&vi->sq[qp_index].napi); > >>>> + napi_disable(&vi->rq[qp_index].napi); > >>>> + xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq); > >>>> +} > >>>> + > >>>> static int virtnet_open(struct net_device *dev) > >>>> { > >>>> struct virtnet_info *vi = netdev_priv(dev); > >>>> @@ -1883,20 +1890,26 @@ static int virtnet_open(struct net_device *dev) > >>>> > >>>> err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id); > >>>> if (err < 0) > >>>> - return err; > >>>> + goto err_xdp_info_reg; > >>>> > >>>> err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, > >>>> MEM_TYPE_PAGE_SHARED, NULL); > >>>> - if (err < 0) { > >>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>> - return err; > >>>> - } > >>>> + if (err < 0) > >>>> + goto err_xdp_reg_mem_model; > >>>> > >>>> virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); > >>>> virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); > >>>> } > >>>> > >>>> return 0; > >>>> + > >>>> +err_xdp_reg_mem_model: > >>>> + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>> +err_xdp_info_reg: > >>>> + for (i = i - 1; i >= 0; i--) > >>>> + virtnet_disable_qp(vi, i); > >>> > >>> > >>> I would to know should we handle for these: > >>> > >>> disable_delayed_refill(vi); > >>> cancel_delayed_work_sync(&vi->refill); > >>> > >>> > >>> Maybe we should call virtnet_close() with "i" directly. > >>> > >>> Thanks. > >>> > >>> > >> Can?t use i directly here, because if xdp_rxq_info_reg fails, napi has > >> not been enabled for current qp yet, I should roll back from the queue > >> pairs where napi was enabled before(i--), otherwise it will hang at napi > >> disable api > > > > This is not the point, the key is whether we should handle with: > > > > disable_delayed_refill(vi); > > cancel_delayed_work_sync(&vi->refill); > > > > Thanks. > > > > > > OK, get the point. Thanks for your careful review. And I check the code > again. > > There are two points that I need to explain: > > 1. All refill delay work calls(vi->refill, vi->refill_enabled) are based > on that the virtio interface is successfully opened, such as > virtnet_receive, virtnet_rx_resize, _virtnet_set_queues, etc. If there > is an error in the xdp reg here, it will not trigger these subsequent > functions. There is no need to call disable_delayed_refill() and > cancel_delayed_work_sync().Maybe something is wrong. I think these lines may call delay work. static int virtnet_open(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); int i, err; enable_delayed_refill(vi); for (i = 0; i < vi->max_queue_pairs; i++) { if (i < vi->curr_queue_pairs) /* Make sure we have some buffers: if oom use wq. */ --> if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) --> schedule_delayed_work(&vi->refill, 0); err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id); if (err < 0) return err; err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, MEM_TYPE_PAGE_SHARED, NULL); if (err < 0) { xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); return err; } virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); } return 0; } And I think, if we virtnet_open() return error, then the status of virtnet should like the status after virtnet_close(). Or someone has other opinion. Thanks.> The logic here is different from that of > virtnet_close. virtnet_close is based on the success of virtnet_open and > the tx and rx has been carried out normally. For error unwinding, only > disable qp is needed. Also encapuslated a helper function of disable qp, > which is used ing error unwinding and virtnet close > 2. The current error qp, which has not enabled NAPI, can only call xdp > unreg, and cannot call the interface of disable NAPI, otherwise the > kernel will be stuck. So for i-- the reason for calling disable qp on > the previous queue > > Thanks > > >> > >>>> + > >>>> + return err; > >>>> } > >>>> > >>>> static int virtnet_poll_tx(struct napi_struct *napi, int budget) > >>>> @@ -2305,11 +2318,8 @@ static int virtnet_close(struct net_device *dev) > >>>> /* Make sure refill_work doesn't re-enable napi! */ > >>>> cancel_delayed_work_sync(&vi->refill); > >>>> > >>>> - for (i = 0; i < vi->max_queue_pairs; i++) { > >>>> - virtnet_napi_tx_disable(&vi->sq[i].napi); > >>>> - napi_disable(&vi->rq[i].napi); > >>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>> - } > >>>> + for (i = 0; i < vi->max_queue_pairs; i++) > >>>> + virtnet_disable_qp(vi, i); > >>>> > >>>> return 0; > >>>> } > >>>> -- > >>>> 2.37.1 (Apple Git-137.1) > >>>>
Apparently Analagous Threads
- [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
- [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
- [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
- [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
- [PATCH net v6] virtio_net: Fix error unwinding of XDP initialization