Andrea Venturoli wrote:> Hello. > > Last week I upgraded a 9.3/amd64 box to 10.3: since then, it crashed and > rebooted at least once every night.Hi, I have quite similar issue, crash dumps every night, but then my stacktrace is different (crashing mostly in cam/scsi/scsi.c), and my env is also quite different (old i386, individual disks, extensive use of ZFS), so here is very likely a different reason. Also here the upgrade is not the only change, I also replaced a burnt powersupply recently and added an SSD cache. Basically You have two options: A) fire up kgdb, go into the code and try and understand what exactly is happening. This depends if You have clue enough to go that way; I found "man 4 gdb" and especially the "Debugging Kernel Problems" pdf by Greg Lehey quite helpful. B) systematically change parameters. Start by figuring from the logs the exact time of crash and what was happening then, try to reproduce that. Then change things and isolate the cause. Having a RAID controller is a bit ugly in this regard, as it is more or less a blackbox, and difficult to change parameters or swap components.> The only exception was on Friday, when it locked without rebooting: it > still answered ping request and logins through HTTP would half work; I'm > under the impression that the disk subsystem was hung, so ICMP would > work since it does no I/O and HTTP too worked as far as no disk access > was required.Yep. That tends to happen. It doesnt give much clue, except that there is a disk related problem.
On 10/20/16 22:12, Peter wrote: Hello.> Basically You have two options: A) fire up kgdb, go into the code and > try and understand what exactly is happening. This depends > if You have clue enough to go that way; I found "man 4 gdb" and > especially the "Debugging Kernel Problems" pdf by Greg Lehey quite > helpful.I've tried this way, but altough I'm quite proficient with [k]gdb I tend to get lost in FreeBSD's kernel's source code, which, unfortunately, I'm not familiar with. BTW, I had read that book years ago; I searched for it now, but a 2005 edition still comes up. Has it ever been updated?> B) systematically change parameters. Start by figuring from the logs > the exact time of crash and what was happening then, try to reproduce > that. Then change things and isolate the cause.Again, I already tried, but without luck. Since I had one hang one night during the creation of a snapshot, yesterday I tried creating/deleting around 40 of them: I hoped to get the system to hang again, but it all worked perfectly. Since backups are run at night (possibly at the time of the hangs/panics and doing snapshots), I launched several backup jobs, but they all worked perfectly. I checked that at the times of the panics there is usually no cron job, periodic job or whatever. At least not something I could identify. There was in fact once a periodic running, but that's not the rule. "ps -axl -M /var/crash/vmcore.x" showed nothing unusual. bye & Thanks av.