Xuan Zhuo
2023-May-09 01:43 UTC
[PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
On Mon, 8 May 2023 11:00:10 -0400, Feng Liu <feliu at nvidia.com> wrote:> > > On 2023-05-07 p.m.9:45, Xuan Zhuo wrote: > > External email: Use caution opening links or attachments > > > > > > On Sat, 6 May 2023 08:08:02 -0400, Feng Liu <feliu at nvidia.com> wrote: > >> > >> > >> On 2023-05-05 p.m.10:33, Xuan Zhuo wrote: > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> On Tue, 2 May 2023 20:35:25 -0400, Feng Liu <feliu at nvidia.com> wrote: > >>>> When initializing XDP in virtnet_open(), some rq xdp initialization > >>>> may hit an error causing net device open failed. However, previous > >>>> rqs have already initialized XDP and enabled NAPI, which is not the > >>>> expected behavior. Need to roll back the previous rq initialization > >>>> to avoid leaks in error unwinding of init code. > >>>> > >>>> Also extract a helper function of disable queue pairs, and use newly > >>>> introduced helper function in error unwinding and virtnet_close; > >>>> > >>>> Issue: 3383038 > >>>> Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") > >>>> Signed-off-by: Feng Liu <feliu at nvidia.com> > >>>> Reviewed-by: William Tu <witu at nvidia.com> > >>>> Reviewed-by: Parav Pandit <parav at nvidia.com> > >>>> Reviewed-by: Simon Horman <simon.horman at corigine.com> > >>>> Acked-by: Michael S. Tsirkin <mst at redhat.com> > >>>> Change-Id: Ib4c6a97cb7b837cfa484c593dd43a435c47ea68f > >>>> --- > >>>> drivers/net/virtio_net.c | 30 ++++++++++++++++++++---------- > >>>> 1 file changed, 20 insertions(+), 10 deletions(-) > >>>> > >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>> index 8d8038538fc4..3737cf120cb7 100644 > >>>> --- a/drivers/net/virtio_net.c > >>>> +++ b/drivers/net/virtio_net.c > >>>> @@ -1868,6 +1868,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) > >>>> return received; > >>>> } > >>>> > >>>> +static void virtnet_disable_qp(struct virtnet_info *vi, int qp_index) > >>>> +{ > >>>> + virtnet_napi_tx_disable(&vi->sq[qp_index].napi); > >>>> + napi_disable(&vi->rq[qp_index].napi); > >>>> + xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq); > >>>> +} > >>>> + > >>>> static int virtnet_open(struct net_device *dev) > >>>> { > >>>> struct virtnet_info *vi = netdev_priv(dev); > >>>> @@ -1883,20 +1890,26 @@ static int virtnet_open(struct net_device *dev) > >>>> > >>>> err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id); > >>>> if (err < 0) > >>>> - return err; > >>>> + goto err_xdp_info_reg; > >>>> > >>>> err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, > >>>> MEM_TYPE_PAGE_SHARED, NULL); > >>>> - if (err < 0) { > >>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>> - return err; > >>>> - } > >>>> + if (err < 0) > >>>> + goto err_xdp_reg_mem_model; > >>>> > >>>> virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); > >>>> virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); > >>>> } > >>>> > >>>> return 0; > >>>> + > >>>> +err_xdp_reg_mem_model: > >>>> + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>> +err_xdp_info_reg: > >>>> + for (i = i - 1; i >= 0; i--) > >>>> + virtnet_disable_qp(vi, i); > >>> > >>> > >>> I would to know should we handle for these: > >>> > >>> disable_delayed_refill(vi); > >>> cancel_delayed_work_sync(&vi->refill); > >>> > >>> > >>> Maybe we should call virtnet_close() with "i" directly. > >>> > >>> Thanks. > >>> > >>> > >> Can?t use i directly here, because if xdp_rxq_info_reg fails, napi has > >> not been enabled for current qp yet, I should roll back from the queue > >> pairs where napi was enabled before(i--), otherwise it will hang at napi > >> disable api > > > > This is not the point, the key is whether we should handle with: > > > > disable_delayed_refill(vi); > > cancel_delayed_work_sync(&vi->refill); > > > > Thanks. > > > > > > OK, get the point. Thanks for your careful review. And I check the code > again. > > There are two points that I need to explain: > > 1. All refill delay work calls(vi->refill, vi->refill_enabled) are based > on that the virtio interface is successfully opened, such as > virtnet_receive, virtnet_rx_resize, _virtnet_set_queues, etc. If there > is an error in the xdp reg here, it will not trigger these subsequent > functions. There is no need to call disable_delayed_refill() and > cancel_delayed_work_sync().Maybe something is wrong. I think these lines may call delay work. static int virtnet_open(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); int i, err; enable_delayed_refill(vi); for (i = 0; i < vi->max_queue_pairs; i++) { if (i < vi->curr_queue_pairs) /* Make sure we have some buffers: if oom use wq. */ --> if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) --> schedule_delayed_work(&vi->refill, 0); err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id); if (err < 0) return err; err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, MEM_TYPE_PAGE_SHARED, NULL); if (err < 0) { xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); return err; } virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); } return 0; } And I think, if we virtnet_open() return error, then the status of virtnet should like the status after virtnet_close(). Or someone has other opinion. Thanks.> The logic here is different from that of > virtnet_close. virtnet_close is based on the success of virtnet_open and > the tx and rx has been carried out normally. For error unwinding, only > disable qp is needed. Also encapuslated a helper function of disable qp, > which is used ing error unwinding and virtnet close > 2. The current error qp, which has not enabled NAPI, can only call xdp > unreg, and cannot call the interface of disable NAPI, otherwise the > kernel will be stuck. So for i-- the reason for calling disable qp on > the previous queue > > Thanks > > >> > >>>> + > >>>> + return err; > >>>> } > >>>> > >>>> static int virtnet_poll_tx(struct napi_struct *napi, int budget) > >>>> @@ -2305,11 +2318,8 @@ static int virtnet_close(struct net_device *dev) > >>>> /* Make sure refill_work doesn't re-enable napi! */ > >>>> cancel_delayed_work_sync(&vi->refill); > >>>> > >>>> - for (i = 0; i < vi->max_queue_pairs; i++) { > >>>> - virtnet_napi_tx_disable(&vi->sq[i].napi); > >>>> - napi_disable(&vi->rq[i].napi); > >>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > >>>> - } > >>>> + for (i = 0; i < vi->max_queue_pairs; i++) > >>>> + virtnet_disable_qp(vi, i); > >>>> > >>>> return 0; > >>>> } > >>>> -- > >>>> 2.37.1 (Apple Git-137.1) > >>>>
Jason Wang
2023-May-10 05:00 UTC
[PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
? 2023/5/9 09:43, Xuan Zhuo ??:> On Mon, 8 May 2023 11:00:10 -0400, Feng Liu <feliu at nvidia.com> wrote: >> >> On 2023-05-07 p.m.9:45, Xuan Zhuo wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> On Sat, 6 May 2023 08:08:02 -0400, Feng Liu <feliu at nvidia.com> wrote: >>>> >>>> On 2023-05-05 p.m.10:33, Xuan Zhuo wrote: >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> On Tue, 2 May 2023 20:35:25 -0400, Feng Liu <feliu at nvidia.com> wrote: >>>>>> When initializing XDP in virtnet_open(), some rq xdp initialization >>>>>> may hit an error causing net device open failed. However, previous >>>>>> rqs have already initialized XDP and enabled NAPI, which is not the >>>>>> expected behavior. Need to roll back the previous rq initialization >>>>>> to avoid leaks in error unwinding of init code. >>>>>> >>>>>> Also extract a helper function of disable queue pairs, and use newly >>>>>> introduced helper function in error unwinding and virtnet_close; >>>>>> >>>>>> Issue: 3383038 >>>>>> Fixes: 754b8a21a96d ("virtio_net: setup xdp_rxq_info") >>>>>> Signed-off-by: Feng Liu <feliu at nvidia.com> >>>>>> Reviewed-by: William Tu <witu at nvidia.com> >>>>>> Reviewed-by: Parav Pandit <parav at nvidia.com> >>>>>> Reviewed-by: Simon Horman <simon.horman at corigine.com> >>>>>> Acked-by: Michael S. Tsirkin <mst at redhat.com> >>>>>> Change-Id: Ib4c6a97cb7b837cfa484c593dd43a435c47ea68f >>>>>> --- >>>>>> drivers/net/virtio_net.c | 30 ++++++++++++++++++++---------- >>>>>> 1 file changed, 20 insertions(+), 10 deletions(-) >>>>>> >>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>>> index 8d8038538fc4..3737cf120cb7 100644 >>>>>> --- a/drivers/net/virtio_net.c >>>>>> +++ b/drivers/net/virtio_net.c >>>>>> @@ -1868,6 +1868,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget) >>>>>> return received; >>>>>> } >>>>>> >>>>>> +static void virtnet_disable_qp(struct virtnet_info *vi, int qp_index) >>>>>> +{ >>>>>> + virtnet_napi_tx_disable(&vi->sq[qp_index].napi); >>>>>> + napi_disable(&vi->rq[qp_index].napi); >>>>>> + xdp_rxq_info_unreg(&vi->rq[qp_index].xdp_rxq); >>>>>> +} >>>>>> + >>>>>> static int virtnet_open(struct net_device *dev) >>>>>> { >>>>>> struct virtnet_info *vi = netdev_priv(dev); >>>>>> @@ -1883,20 +1890,26 @@ static int virtnet_open(struct net_device *dev) >>>>>> >>>>>> err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id); >>>>>> if (err < 0) >>>>>> - return err; >>>>>> + goto err_xdp_info_reg; >>>>>> >>>>>> err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, >>>>>> MEM_TYPE_PAGE_SHARED, NULL); >>>>>> - if (err < 0) { >>>>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); >>>>>> - return err; >>>>>> - } >>>>>> + if (err < 0) >>>>>> + goto err_xdp_reg_mem_model; >>>>>> >>>>>> virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); >>>>>> virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); >>>>>> } >>>>>> >>>>>> return 0; >>>>>> + >>>>>> +err_xdp_reg_mem_model: >>>>>> + xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); >>>>>> +err_xdp_info_reg: >>>>>> + for (i = i - 1; i >= 0; i--) >>>>>> + virtnet_disable_qp(vi, i); >>>>> >>>>> I would to know should we handle for these: >>>>> >>>>> disable_delayed_refill(vi); >>>>> cancel_delayed_work_sync(&vi->refill); >>>>> >>>>> >>>>> Maybe we should call virtnet_close() with "i" directly. >>>>> >>>>> Thanks. >>>>> >>>>> >>>> Can?t use i directly here, because if xdp_rxq_info_reg fails, napi has >>>> not been enabled for current qp yet, I should roll back from the queue >>>> pairs where napi was enabled before(i--), otherwise it will hang at napi >>>> disable api >>> This is not the point, the key is whether we should handle with: >>> >>> disable_delayed_refill(vi); >>> cancel_delayed_work_sync(&vi->refill); >>> >>> Thanks. >>> >>> >> OK, get the point. Thanks for your careful review. And I check the code >> again. >> >> There are two points that I need to explain: >> >> 1. All refill delay work calls(vi->refill, vi->refill_enabled) are based >> on that the virtio interface is successfully opened, such as >> virtnet_receive, virtnet_rx_resize, _virtnet_set_queues, etc. If there >> is an error in the xdp reg here, it will not trigger these subsequent >> functions. There is no need to call disable_delayed_refill() and >> cancel_delayed_work_sync(). > Maybe something is wrong. I think these lines may call delay work. > > static int virtnet_open(struct net_device *dev) > { > struct virtnet_info *vi = netdev_priv(dev); > int i, err; > > enable_delayed_refill(vi); > > for (i = 0; i < vi->max_queue_pairs; i++) { > if (i < vi->curr_queue_pairs) > /* Make sure we have some buffers: if oom use wq. */ > --> if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL)) > --> schedule_delayed_work(&vi->refill, 0); > > err = xdp_rxq_info_reg(&vi->rq[i].xdp_rxq, dev, i, vi->rq[i].napi.napi_id); > if (err < 0) > return err; > > err = xdp_rxq_info_reg_mem_model(&vi->rq[i].xdp_rxq, > MEM_TYPE_PAGE_SHARED, NULL); > if (err < 0) { > xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); > return err; > } > > virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi); > virtnet_napi_tx_enable(vi, vi->sq[i].vq, &vi->sq[i].napi); > } > > return 0; > } > > > And I think, if we virtnet_open() return error, then the status of virtnet > should like the status after virtnet_close(). > > Or someone has other opinion.I agree, we need to disable and sync with the refill work. Thanks> > Thanks. > >> The logic here is different from that of >> virtnet_close. virtnet_close is based on the success of virtnet_open and >> the tx and rx has been carried out normally. For error unwinding, only >> disable qp is needed. Also encapuslated a helper function of disable qp, >> which is used ing error unwinding and virtnet close >> 2. The current error qp, which has not enabled NAPI, can only call xdp >> unreg, and cannot call the interface of disable NAPI, otherwise the >> kernel will be stuck. So for i-- the reason for calling disable qp on >> the previous queue >> >> Thanks >> >>>>>> + >>>>>> + return err; >>>>>> } >>>>>> >>>>>> static int virtnet_poll_tx(struct napi_struct *napi, int budget) >>>>>> @@ -2305,11 +2318,8 @@ static int virtnet_close(struct net_device *dev) >>>>>> /* Make sure refill_work doesn't re-enable napi! */ >>>>>> cancel_delayed_work_sync(&vi->refill); >>>>>> >>>>>> - for (i = 0; i < vi->max_queue_pairs; i++) { >>>>>> - virtnet_napi_tx_disable(&vi->sq[i].napi); >>>>>> - napi_disable(&vi->rq[i].napi); >>>>>> - xdp_rxq_info_unreg(&vi->rq[i].xdp_rxq); >>>>>> - } >>>>>> + for (i = 0; i < vi->max_queue_pairs; i++) >>>>>> + virtnet_disable_qp(vi, i); >>>>>> >>>>>> return 0; >>>>>> } >>>>>> -- >>>>>> 2.37.1 (Apple Git-137.1) >>>>>>
Possibly Parallel Threads
- [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
- [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
- [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
- [PATCH net v3] virtio_net: Fix error unwinding of XDP initialization
- [PATCH net v6] virtio_net: Fix error unwinding of XDP initialization