Hi! Recently I did routine source upgrade from 11.2-STABLE/amd64 to 11.3-STABLE r354667 that went without any problem. After less than 2 days of uptime it paniced and failed to reboot (hung), screenshot is here: http://www.grosbein.net/freebsd/zpanic.png It did not panic with 11.2-STABLE but had some performance problems with ZFS. Hardware: Dell PowerEdge R640 with 360G RAM, mrsas(4)-supported controller PERC H730/P Mini LSI MegaRAID SAS-3 3108 [Invader] and 7 SSD devices, two of them keep FreeBSD installation (distinct boot pool) and five others are GELI-encrypted and combined to another (RAIDZ1) pool 'sata' mentioned on screenshot. vfs.zfs.arc_max=160g The system runs several bhyve instances over ZVOls. There are many shapshots that are routinely created/destroyed so system generally issues many TRIM requests to underlying SSDs. After 1.5 day of uptime (before panic) I set kern.cam.da.[2-6].delete_max=262144 changing it from default 17179607040 hoping it would decrease latency of read-write operations like listing of snapshots. No other non-default settings for ZFS were done. What does it mean "panic: I/O to pool appears to be hung on vdev" provided hardware is healthy?
15.11.2019 13:08, Eugene Grosbein wrote:> Hi! > > Recently I did routine source upgrade from 11.2-STABLE/amd64 to 11.3-STABLE r354667 > that went without any problem. After less than 2 days of uptime it paniced and failed to reboot (hung), > screenshot is here: http://www.grosbein.net/freebsd/zpanic.png > > It did not panic with 11.2-STABLE but had some performance problems with ZFS. > > Hardware: Dell PowerEdge R640 with 360G RAM, mrsas(4)-supported controller PERC H730/P Mini LSI MegaRAID SAS-3 3108 [Invader] > and 7 SSD devices, two of them keep FreeBSD installation (distinct boot pool) and five others > are GELI-encrypted and combined to another (RAIDZ1) pool 'sata' mentioned on screenshot. > > vfs.zfs.arc_max=160g > > The system runs several bhyve instances over ZVOls. There are many shapshots that are routinely > created/destroyed so system generally issues many TRIM requests to underlying SSDs. > > After 1.5 day of uptime (before panic) I set kern.cam.da.[2-6].delete_max=262144 > changing it from default 17179607040 hoping it would decrease latency of read-write operations > like listing of snapshots. No other non-default settings for ZFS were done. > > What does it mean "panic: I/O to pool appears to be hung on vdev" provided hardware is healthy?I wonder also why did it panic instead of degrading the RAIDZ pool.
15.11.2019 13:08, Eugene Grosbein wrote:> Hi! > > Recently I did routine source upgrade from 11.2-STABLE/amd64 to 11.3-STABLE r354667 > that went without any problem. After less than 2 days of uptime it paniced and failed to reboot (hung), > screenshot is here: http://www.grosbein.net/freebsd/zpanic.png > > It did not panic with 11.2-STABLE but had some performance problems with ZFS.I have to correct myself: it did panic same way at least once with 11.2-STABLE r344922