On Mon, Aug 15, 2022 at 01:34:41AM -0700, Andres Freund wrote:> Hi, > > On August 15, 2022 1:28:29 AM PDT, "Michael S. Tsirkin" <mst at redhat.com> wrote: > >On Mon, Aug 15, 2022 at 01:15:27AM -0700, Andres Freund wrote: > >> Hi, > >> > >> On 2022-08-15 03:51:34 -0400, Michael S. Tsirkin wrote: > >> > It is possible that GCP gets confused if ring size is smaller than the > >> > device maximum simply because no one did it in the past. > >> > > >> > So I pushed just the revert of 762faee5a267 to the test branch. > >> > Could you give it a spin? > >> > >> Seems to fix the issue, at least to the extent I can determine at 1am... :) > >> > >> Greetings, > >> > >> Andres Freund > > > >So you tested this: > > > >commit 13df5a7eaeb22561d39354b576bc98a7e2c389f9 (HEAD, kernel.org/test) > >Author: Michael S. Tsirkin <mst at redhat.com> > >Date: Mon Aug 15 03:44:38 2022 -0400 > > > > Revert "virtio_net: set the default max ring size by find_vqs()" > > > > This reverts commit 762faee5a2678559d3dc09d95f8f2c54cd0466a7. > > > > Signed-off-by: Michael S. Tsirkin <mst at redhat.com> > > > >and it fixes both issues right? No crashes no networking issue? > > Correct. I only did limited testing, but it's survived far longer / more reboots than anything since the commit. > > Andres > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity.OK so this gives us a quick revert as a solution for now. Next, I would appreciate it if you just try this simple hack. If it crashes we either have a long standing problem in virtio code or more likely a gcp bug where it can't handle smaller rings than what device requestes. Thanks! diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index f7965c5dd36b..bdd5f481570b 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -314,6 +314,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, if (!size || size > num) size = num; + if (size > 1024) + size = 1024; + if (size & (size - 1)) { dev_warn(&vp_dev->pci_dev->dev, "bad queue size %u", size); return ERR_PTR(-EINVAL); -- MST
Hi, On 2022-08-15 11:40:59 -0400, Michael S. Tsirkin wrote:> OK so this gives us a quick revert as a solution for now. > Next, I would appreciate it if you just try this simple hack. > If it crashes we either have a long standing problem in virtio > code or more likely a gcp bug where it can't handle smaller > rings than what device requestes. > Thanks!I applied the below and the problem persists.> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c > index f7965c5dd36b..bdd5f481570b 100644 > --- a/drivers/virtio/virtio_pci_modern.c > +++ b/drivers/virtio/virtio_pci_modern.c > @@ -314,6 +314,9 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, > if (!size || size > num) > size = num; > > + if (size > 1024) > + size = 1024; > + > if (size & (size - 1)) { > dev_warn(&vp_dev->pci_dev->dev, "bad queue size %u", size); > return ERR_PTR(-EINVAL); > >[ 1.165162] virtio_net virtio1 enp0s4: renamed from eth0 [ 1.177815] general protection fault, probably for non-canonical address 0xffff000000000400: 0000 [#1] PREEMPT SMP PTI [ 1.179565] CPU: 1 PID: 125 Comm: systemd-udevd Not tainted 6.0.0-rc1-bisect14-dirty #14 [ 1.180785] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/29/2022 [ 1.182475] RIP: 0010:__kmalloc_node_track_caller+0x19e/0x380 [ 1.183365] Code: 2b 04 25 28 00 00 00 0f 85 f8 01 00 00 48 83 c4 18 48 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 8b 4d 28 48 8b 7d 00 <48> 8b 1c 08 48 8d 4a 40 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 0b ff [ 1.186208] RSP: 0018:ffff9c470021b860 EFLAGS: 00010246 [ 1.187194] RAX: ffff000000000000 RBX: 00000000000928c0 RCX: 0000000000000400 [ 1.188634] RDX: 0000000000005781 RSI: 00000000000928c0 RDI: 000000000002e0f0 [ 1.190177] RBP: ffff908380042c00 R08: 0000000000000600 R09: ffff908380b665e4 [ 1.191256] R10: 0000000000000003 R11: 0000000000000002 R12: 00000000000928c0 [ 1.192269] R13: 0000000000000740 R14: 00000000ffffffff R15: 0000000000000000 [ 1.193368] FS: 00007f746702a8c0(0000) GS:ffff9084b7d00000(0000) knlGS:0000000000000000 [ 1.194846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.195661] CR2: 00007ffc010df980 CR3: 0000000103826005 CR4: 00000000003706e0 [ 1.196912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1.198216] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1.199367] Call Trace: [ 1.199815] <TASK> [ 1.200138] ? netlink_trim+0x85/0xb0 [ 1.200754] pskb_expand_head+0x92/0x340 [ 1.202512] netlink_trim+0x85/0xb0 [ 1.203069] netlink_unicast+0x54/0x390 [ 1.203630] rtnl_getlink+0x366/0x410 [ 1.204155] ? __d_alloc+0x24/0x1d0 [ 1.204668] rtnetlink_rcv_msg+0x146/0x3b0 [ 1.205256] ? _raw_spin_unlock+0xd/0x30 [ 1.205867] ? __d_add+0xf2/0x1b0 [ 1.206600] ? rtnl_calcit.isra.0+0x130/0x130 [ 1.207221] netlink_rcv_skb+0x49/0xf0 [ 1.207904] netlink_unicast+0x23a/0x390 [ 1.208585] netlink_sendmsg+0x23b/0x4b0 [ 1.209203] sock_sendmsg+0x57/0x60 [ 1.210118] __sys_sendto+0x117/0x170 [ 1.210694] ? __wake_up_common_lock+0x83/0xc0 [ 1.211420] __x64_sys_sendto+0x1b/0x30 [ 1.211992] do_syscall_64+0x37/0x90 [ 1.212497] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 1.213407] RIP: 0033:0x7f74677404e6 [ 1.213973] Code: 69 0e 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 41 54 48 83 ec 30 44 89 4c 24 2c 4c [ 1.217098] RSP: 002b:00007ffc010daa78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 1.219539] RAX: ffffffffffffffda RBX: 000000000011bc98 RCX: 00007f74677404e6 [ 1.220552] RDX: 0000000000000020 RSI: 0000563160679570 RDI: 0000000000000005 [ 1.222378] RBP: 00005631606796b0 R08: 00007ffc010daaf0 R09: 0000000000000080 [ 1.223692] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 1.224793] R13: 0000000000000000 R14: 0000000000000000 R15: 00005631606794b0 [ 1.226228] </TASK> [ 1.226775] Modules linked in: [ 1.227414] ---[ end trace 0000000000000000 ]--- Greetings, Andres Freund